文献收藏与分享平台

81.

张德祥 (2023-03-12 09:48):

#paper https://doi.org/10.48550/arXiv.1806.08053 Semantic information, autonomous agency, and nonequilibrium statistical physics 论文尝试通过反事实对语义信息进行定义，通过个体跟环境的物理系，热力学的信息交换来实现，但后续工作不多，和自由能框架有些接近，

arXiv, 2018. DOI: 10.48550/arXiv.1806.08053

Semantic information, autonomous agency, and nonequilibrium statistical physics

翻译

Artemy Kolchinsky, David H. Wolpert

Abstract:

Shannon information theory provides various measures of so-called "syntactic information", which reflect the amount of statistical correlation between systems. In contrast, the concept of "semantic information" refers to those correlations … >>>

翻译

82.

尹志 (2023-02-28 21:51):

#paper https://doi.org/10.48550/arXiv.2203.17003 ICML, 2022, Equivariant Diffusion for Molecule Generation in 3D。扩散模型在各个领域发展极其迅速。除了图形图像，其触角已经扩展到生物制药、材料科学领域。本文就是一篇使用扩散模型进行3D分子生成的文章。作者提出了一种等变扩散模型，其中的等变网络能够很好的同时处理原子坐标这样的连续变量和原子类型这样的离散变量。该工作在QM9和GEOM两个典型的数据集上取得了sota的性能，是将等变性引入扩散模型的开篇工作之一。

arXiv, 2022. DOI: 10.48550/arXiv.2203.17003

Equivariant Diffusion for Molecule Generation in 3D

翻译

Emiel Hoogeboom, Victor Garcia Satorras, Clément Vignac, Max Welling

Abstract:

This work introduces a diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. Our E(3) Equivariant Diffusion Model (EDM) learns to denoise a diffusion process with … >>>

翻译

83.

张德祥 (2023-02-10 20:03):

#paper https://doi.org/10.48550/arXiv.2210.15889 Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing 神经符号计算 (NeSy) 追求认知的符号和统计范式的整合，多年来一直是人工智能 (AI) 的活跃研究领域。由于 NeSy 有望调和符号表示的推理和可解释性优势以及神经网络中的稳健学习，它可能会成为下一代 AI 的催化剂。在本文中，我们系统地概述了 NeSy AI 研究的重要和最新进展。首先，我们介绍了这一领域的研究历史，涵盖了早期的工作和基础。我们进一步讨论背景概念并确定 NeSy 发展背后的关键驱动因素。之后，我们根据强调该研究范式的几个主要特征对最近具有里程碑意义的方法进行了分类，包括神经符号整合、知识表示、知识嵌入和功能。然后，我们简要讨论现代 NeSy 方法在几个领域的成功应用。最后，我们确定了未解决的问题以及潜在的未来研究方向。这项调查有望帮助新的研究人员进入这个快速发展的领域，并加速向数据和知识驱动的 AI 迈进。

arXiv, 2022. DOI: 10.48550/arXiv.2210.15889

Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing

翻译

Wenguan Wang, Yi Yang, Fei Wu

Abstract:

Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and statistical paradigms of cognition, has been an active research area of Artificial Intelligence (AI) for many years. As NeSy … >>>

翻译

84.

王昊 (2023-01-31 23:53):

#paper Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules http://arxiv.org/abs/2001.01568 Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules. Retrieved January 31, 2023. VCM图像编码基线方法（cheng2020网络），用于机器视觉编码的特征提取阶段，是图像压缩方法类算法。作者提出使用离散的高斯混合似然来参数化潜在表示的分布，可以获得更准确和灵活的概率模型。此外，作者还使用attention module来提高网络对图像中复杂区域的关注能力。具体地,作者提出使用离散高斯混合模型来对latent representation进行熵估计，这样可以对y提供多个最可能的均值，而每一个mixture的方差可以更小，达到的效果是实现更准确的概率模型，节约编码y所需要的比特数。第二，作者还加入了简化版的attention modules，可以提高网络对于non-zero responses，即复杂区域的关注，同时不引入过多的训练复杂度。

arXiv, 2020.

Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules

翻译

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

Abstract:

Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. … >>>

翻译

85.

前进 (2023-01-31 23:30):

#paper Rethinking 1x1 Convolutions: Can we train CNNs with Frozen Random Filters? arXiv:2301.11360 本文引入了一种新的卷积块，计算(冻结随机)滤波器的可学习线性组合(LC)，并由此提出 LCResNets，还提出一种新的权重共享机制，可大幅减少权重的数量。在本文中，即使在仅随机初始化且从不更新空间滤波器的极端情况下，某些CNN架构也可以被训练以超过标准训练的精度。通过将逐点(1x1)卷积的概念重新解释为学习冻结(随机)空间滤波器的线性组合(LC)的算子，这种方法不仅可以在CIFAR和ImageNet上达到较高的测试精度，而且在模型鲁棒性、泛化、稀疏性和所需权重的总数方面具有良好。此外本文提出了一种新的权重共享机制，该机制允许在所有空间卷积层之间共享单个权重张量，以大幅减少权重的数量。

arXiv, 2023.

Rethinking 1x1 Convolutions: Can we train CNNs with Frozen Random Filters?

翻译

Paul Gavrikov, Janis Keuper

Abstract:

Modern CNNs are learning the weights of vast numbers of convolutional operators. In this paper, we raise the fundamental question if this is actually necessary. We show that even in … >>>

翻译

86.

尹志 (2023-01-31 20:59):

#paper Diffusion Models: A Comprehensive Survey of Methods and Applications, https://doi.org/10.48550/arXiv.2209.00796. 这篇综述对当前非常热门的扩散模型进行了详细的介绍与梳理。文章将当前的扩散模型总结为三类主要模型：DDPMs、SGMs、score SDEs，三类模型逐级一般化，可处理更广泛的问题。除了对三类主流扩散模型进行了详细的讲解，对比，对其相关改进工作进行了梳理，文章还探讨了扩散模型与其它主流的生成模型的联系与区别。文章在最后列举了扩散模型目前在各个领域的应用。考虑到扩散模型受物理概念启发，非常看好其后续结合数学物理的更多推广和应用，比如最近顾险峰老师就在文章中指出基于最优传输的可能改进，这确实是非常有意思的想法和主题。

arXiv, 2022. DOI: 10.48550/arXiv.2209.00796

Diffusion Models: A Comprehensive Survey of Methods and Applications

翻译

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Bin Cui, Ming-Hsuan Yang

Abstract:

Diffusion models have emerged as a powerful new family of deep generative models with record-breaking performance in many applications, including image synthesis, video generation, and molecule design. In this survey, … >>>

翻译

87.

张浩彬 (2023-01-30 13:34):

#paper https://doi.org/10.48550/arXiv.2202.01575 COST: CONTRASTIVE LEARNING OF DISENTANGLED SEASONAL-TREND REPRESENTATIONS FOR TIME SERIES FORECASTING 1. 文章认为一个时间序列可由3个部分组成，趋势项+季节项+误差项。我们需要学习的趋势项和季节项 2. 从整体结构上看，对于原始序列通过编码器（TCN）将原始序列映射到隐空间中，之后分别通过两个结构分理出趋势项及季节项分别进行对比学习 a. 对于趋势项来说，对于获得的隐空间表示，输入到自回归专家混合提取器中进行趋势提取，并通过时域进行对比损失学习。时域的对比损失学习参考了Moco进行 b. 对于季节项，用离散傅里叶变换将隐空间映射到频域，频域损失函数定义为波幅和相位的损失。 3. 最终总的损失函数时域+频域的损失函数 4. 基于5个数据和多个基线模型进行对比，包括TS2Vec、TNC，Moco，Informer、LogTrans、TCN等，大部分取得了SOTA的效果

arXiv, 2022. DOI: 10.48550/arXiv.2202.01575

CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting

翻译

Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, Steven Hoi

Abstract:

Deep learning has been actively studied for time series forecasting, and the mainstream paradigm is based on the end-to-end training of neural network architectures, ranging from classical LSTM/RNNs to more … >>>

翻译

88.

林海onrush (2023-01-27 01:30):

#paper, Twist: Sound Reasoning for Purity and Entanglement in Quantum Programs,DOI: 10.48550/arXiv.2205.02287,作者引入了纯度表达式的概念，以在量子程序中对纠缠状态进行推理判断。类似于经典内存的指针，并通过执行被称为门的操作来对它们进行评估。由于纠缠的特殊形式存在，导致量子比特的测量结果是相关的现象，而纠缠可以决定算法的正确性和编程模式的适用性。将纯度表达形式化，可以作为自动推理量子程序中纠缠的核心工具，是指其评价不受量子比特的测量结果影响的表达式。本文主要贡献在于提出了Twist，这是第一种具有类型系统的语言，用于对纯度进行合理推理，使开发者能够使用类型注解来识别纯度表达式。最后证明了Twist可以表达量子算法，捕捉其中的编程错误，并支持一些其他语言不允许的程序。同时产生的运行时验证开销小于3.5%。整体而言，是一项基础且有意义的工作。

arXiv, 2022. DOI: 10.48550/arXiv.2205.02287

Twist: Sound Reasoning for Purity and Entanglement in Quantum Programs

翻译

Charles Yuan, Christopher McNally, Michael Carbin

Abstract:

Quantum programming languages enable developers to implement algorithms for quantum computers that promise computational breakthroughs in classically intractable tasks. Programming quantum computers requires awareness of entanglement, the phenomenon in which … >>>

翻译

89.

张德祥 (2023-01-06 18:42):

#paper https://doi.org/10.48550/arXiv.2212.12393 A-NeSI: A Scalable Approximate Method for Probabilistic Neurosymbolic Inference 这篇论文受GFlownet启发，首次在MNIST ADD的训练上达到了 15位数的加法训练，人造算数天才指日可待。结合神经网络和符号计算。

arXiv, 2022. DOI: 10.48550/arXiv.2212.12393

A-NeSI: A Scalable Approximate Method for Probabilistic Neurosymbolic Inference

翻译

Emile van Krieken, Thiviyan Thanapalasingam, Jakub M. Tomczak, Frank van Harmelen, Annette ten Teije

Abstract:

We study the problem of combining neural networks with symbolic reasoning. Recently introduced frameworks for Probabilistic Neurosymbolic Learning (PNL), such as DeepProbLog, perform exponential-time exact inference, limiting the scalability of … >>>

翻译

90.

王昊 (2022-12-31 23:57):

#paper https://arxiv.org/abs/2111.08687v2 Jing Shao, Siyu Chen, Yangguang Li, et al. 2021. INTERN: A New Learning Paradigm Towards General Vision. 视觉基础模型的论文。“书生”（INTERN），旨在系统化解决当下人工智能视觉领域中存在的任务通用、场景泛化和数据效率等一系列瓶颈问题。“书生”由七大模块组成，包括通用视觉数据系统、通用视觉网络结构、通用视觉评测基准三个基础设施模块，以及区分上下游的四个训练阶段模块。多个阶段中学习到了很强的泛化能力。其可以在26个数据集上实现CV中的四类任务，仅使用10%的训练数据进行微调，性能便优于全套数据训练的对应模型。

arXiv, 2021. DOI: 10.48550/arXiv.2111.08687

INTERN: A New Learning Paradigm Towards General Vision

翻译

Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao

Abstract:

Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society. However, down the road, a … >>>

翻译

91.

尹志 (2022-12-31 14:48):

#paper doi: https://doi.org/10.48550/arXiv.2210.11250,Structure-based drug design with geometric deep learning. 这是一篇比较新的关于药物设计和深度学习的短小的综述。主要探讨了在结构化药物设计领域的若干重要子任务上，几何深度学习技术是如何发挥其作用的。考虑到结构化药物设计主要使用大分子（比如蛋白质、核酸）的三维几何信息来识别合适的配体，几何深度学习作为一种将几何对称性引入深度学习的技术是非常有潜力的工具。文章主要探讨了1）分子性质预测（结合亲和度、蛋白质功能、位置分数）；2）结合位点和结合面预测（小分子结合位点和蛋白-蛋白结合面）；3）结合位置生成和分子对接（配体-蛋白和蛋白-蛋白对接）；4）基于结构的小分子配体de novo 设计几个子任务。从分子的常见表征谈起，再讨论结构化药物设计中存在的对称性问题，然后通过四个小节，分别讨论了几何深度学习对四个子任务的研究现状。是基于AI的结构化药物设计领域的一篇很不错的guideline。

arXiv, 2022. DOI: 10.48550/arXiv.2210.11250

Structure-based drug design with geometric deep learning

翻译

Clemens Isert, Kenneth Atz, Gisbert Schneider

Abstract:

Structure-based drug design uses three-dimensional geometric information of macromolecules, such as proteins or nucleic acids, to identify suitable ligands. Geometric deep learning, an emerging concept of neural-network-based machine learning, has … >>>

翻译

92.

前进 (2022-12-31 11:39):

#paper Liu Y, Chen J, Wei S, et al. On Finite Difference Jacobian Computation in Deformable Image Registration[J]. arXiv preprint arXiv:2212.06060, 2022. 产生微分同胚的空间变换一直是变形图像配准的中心问题。作为一个微分同胚变换，应在任何位置都具有正的雅可比行列式|J|。|J|<0的体素数已被用于测试微分同胚性，也用于测量变换的不规则性。对于数字变换，|J|通常使用中心差来近似，但是对于即使在体素分辨率级别上也明显不具有差分同胚性的变换，这种策略可以产生正的|J|。为了证明这一点，论文首先研究了|J|的不同有限差分近似的几何意义。为了确定数字图像的微分同胚性，使用任何单独的有限差分逼近|J|是不够的。论文证明对于2D变换，|J|的四个唯一的有限差分近似必须是正的，以确保整个域是可逆的，并且在像素级没有折叠。在3D中，|J|的十个唯一的有限差分近似值需要是正的。论文提出的数字微分同胚准则解决了|J|的中心差分近似中固有的几个误差，并准确地检测非微分同胚数字变换。

arXiv, 2022. DOI: 10.48550/arXiv.2212.06060

On Finite Difference Jacobian Computation in Deformable Image Registration

翻译

Yihao Liu, Junyu Chen, Shuwen Wei, Aaron Carass, Jerry Prince

Abstract:

Producing spatial transformations that are diffeomorphic has been a central problem in deformable image registration. As a diffeomorphic transformation should have positive Jacobian determinant |J| everywhere, the number of voxels … >>>

翻译

93.

林海onrush (2022-11-30 21:51):

#paper，https://doi.org/10.48550/arXiv.2211.16197，FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs，该研究针对自动驾驶轨迹预测生成问题，提出了FJMP，一种学习有向无环相互作用图的因子分解多智能体联合运动预测框架.使用未来场景交互动力学作为稀疏有向交互图，边缘表示agent之间的显式交互，修剪图成有向无环图（DAG）并分解联合预测任务，根据 DAG 的部分排序，其中联合未来轨迹使用有向无环图神经网络DAGNN。在INTERACTION和Argoverse2数据集上，证明了FJMP与非因子化相比能得到准确且场景一致的联合轨迹预测。FJMP在交互的多智能体INTERACTION基准测试上取得SOTA。

arXiv, 2022. DOI: 10.48550/arXiv.2211.16197

FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs

翻译

Luke Rowe, Martin Ethier, Eli-Henry Dykhne, Krzysztof Czarnecki

Abstract:

Predicting the future motion of road agents is a critical task in an autonomous driving pipeline. In this work, we address the problem of generating a set of scene-level, or … >>>

翻译

94.

张德祥 (2022-11-16 09:16):

#paper https://doi.org/10.48550/arXiv.2206.02063 Active Bayesian Causal Inference ：We sequentially design experiments that are maximally informative about our target causal query, collect the corresponding interventional data, and update our beliefs to choose the next experiment；目前的工作中,我们考虑了更一般的设置,其中我们有兴趣进行因果推理,但没有获得参考因果模型的先验。在这种情况下,因果发现可以被视为达到目的的手段,而不是主要目标。由于两个原因, 专注于主动学习完整的因果模型以实现随后的因果推理可能是不利的。首先,如果我们只对因果模型的特定方面感兴趣,那么浪费样本来学习完整的因果图是次优的。其次,从少量数据中发现因果关系会带来显著的认知不确定性；我们提出了主动贝叶斯因果推理(ABCI),这是一个完全贝叶斯框架,用于整合因果发现和推理与实验设计。基本方法是将贝叶斯先验置于选择的因果模型类之上, 并将学习问题作为贝叶斯推理置于模型后验之上。给定未观察的因果模型,我们通过引入目标因果查询来形式化因果推理；我们遵循贝叶斯最优实验设计方法[10,42]然后根据我们当前的信念,在真正的因果模型上选择最能提供我们目标查询信息的可接受的干预。给定观察到的数据,我们然后通过计算因果模型和查询的后验来更新我们的信念,并使用它们来设计下一个实验。

arXiv, 2022. DOI: 10.48550/arXiv.2206.02063

Active Bayesian Causal Inference

翻译

Christian Toth, Lars Lorch, Christian Knoll, Andreas Krause, Franz Pernkopf, Robert Peharz, Julius von Kügelgen

Abstract:

Causal discovery and causal reasoning are classically treated as separate and consecutive tasks: one first infers the causal graph, and then uses it to estimate causal effects of interventions. However, … >>>

翻译

95.

张德祥 (2022-11-16 08:17):

#paper https://doi.org/10.48550/arXiv.2204.14170 Tractable Uncertainty for Structure Learning 不幸的是,DAGs 的超指数空间使得表示和学习这样的后验概率都极具挑战性。一个重大突破是引入了基于order的表示(Friedman & Koller,2003),其中状态空间被简化为拓扑序的空间，即使这样，任然难于计算。基于样本的表征对后验的覆盖非常有限,限制了它们所能提供的信息。例如,考虑在给定任意一组所需边的情况下,寻找最可能的图扩展的问题。给定超指数空间,即使是大样本也可能不包含与给定边集一致的单个订单,这使得回答这样的查询是不可能的。因此需要寻找紧凑的表示。利用阶模分布中存在的精确的层次条件独立性。这允许OrderSPNs 在相对于其大小的潜在指数级更大的订单集合上表达分布。提供线性时间的Bayesian causal effects因果计算。

arXiv, 2022. DOI: 10.48550/arXiv.2204.14170

Tractable Uncertainty for Structure Learning

翻译

Benjie Wang, Matthew Wicker, Marta Kwiatkowska

Abstract:

Bayesian structure learning allows one to capture uncertainty over the causal directed acyclic graph (DAG) responsible for generating given data. In this work, we present Tractable Uncertainty for STructure learning … >>>

翻译

96.

张德祥 (2022-11-14 14:39):

#paper https://doi.org/10.48550/arXiv.2210.12761 Path integrals, particular kinds, and strange things FEP 是一个第一原理解释或方法,可以应用于任何“事物”, 以某种方式消除物理学、生物学和心理学之间的界限。这种应用认可了许多关于感知行为和自组织的规范性解释。范围从控制论到协同学(敖,2004;阿什比,1979 年;哈肯,1983;凯尔索,2021); 从强化学习到人工好奇心(巴尔托等人,2013;施密德胡伯,1991;萨顿和巴尔托,1981 年;Tsividis 等人,2021 年); 从预测处理到通用计算(Clark,2013bHohwy,2016;赫特,2006); 从模型预测控制到empowerment(Hafner 等人,2020;Klyubin 等人,2005),等等。文章用统计物理学和信息论的标准结果来解开上面叙述的论点。

arXiv, None. DOI: 10.48550/arXiv.2210.12761

Path integrals, particular kinds, and strange things

翻译

Karl Friston, Lancelot Da Costa, Dalton A.R. Sakthivadivel, Conor Heins, Grigorios A. Pavliotis, Maxwell Ramstead, Thomas Parr

Abstract:

This paper describes a path integral formulation of the free energy principle. The ensuing account expresses the paths or trajectories that a particle takes as it evolves over time. The … >>>

翻译

97.

林李泽强 (2022-10-31 23:29):

#paper doi：arxiv.org/abs/2210.09217 Statistical learning methods for neuroimaging data analysis with applications 这是一篇尚未发布得预印本，作者是具有统计学背景的研究人员。在这篇文章中，作者从统计学的角度全面回顾了从神经成像技术到大规模神经成像研究再到统计学习方法中的统计问题。文中有三个主要的内容:(1)从统计学视角看待和综述影像处理方法；(2)介绍了当前最前沿的几个神经成像数据集；(3)从统计学视角介绍了9类影像数据的统计方法。这篇文章从统计学的角度讲述神经成像领域的问题，适合具有数理背景的作为领域入门读物，当然也适合其他背景的研究人员站在统计学角度看待神经成像数据分析中的问题。

arXiv, 2022.

Statistical learning methods for neuroimaging data analysis with applications

翻译

Hongtu Zhu, Tengfei Li, Bingxin Zhao

Abstract:

The aim of this paper is to provide a comprehensive review of statistical challenges in neuroimaging data analysis from neuroimaging techniques to large-scale neuroimaging studies to statistical learning methods. We … >>>

翻译

98.

song (2022-10-31 12:02):

#paper Conditional Diffusion Probabilistic Model for Speech Enhancement, https://arxiv.org/abs/2202.05256# 一般的扩散模型在speech相关的task上表现并不优秀，原因是扩散模型假设所有的噪音是符合高斯分布的，而在speech任务中只有少量噪音的高斯噪音（白噪音）更多的是各种stationary和non-stationary noise。本文解决这一问题的方法是在reverse和diffuse过程中除了基于上一步的输出外，还基于一个带噪声语音，y,从每一步乘以一个高斯噪音变成乘以带噪声语音于当前步语音的差于高斯噪音的积。在这个过程中模型学到了带噪声语音（非高斯噪音）的特征。这个方法解决了非高斯分布数据使用扩散模型的问题。但语音增强问题有其特殊性，语音增强任务的数据集本身就带有干净语音和噪声语音，使这个任务较为适合这个方法，其他语音任务不一定会有干净语音作为输入。比如语音转换任务就没有大量目标语音作为干净语音输入，可以在此基础上再做研究

arXiv, 2022.

Conditional Diffusion Probabilistic Model for Speech Enhancement

翻译

Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao

Abstract:

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech … >>>

翻译

99.

林海onrush (2022-10-29 13:58):

#paper，Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning , url : https://arxiv.org/abs/1811.12808#，本论文回顾了用于解决模型评估、模型选择和算法选择三项任务的不同技术，并参考理论和实证研究讨论了每一项技术的主要优势和劣势。进而，给出建议以促进机器学习研究与应用方面的最佳实践。详细论文解析见下面pdf

arXiv, 2018. DOI: 10.48550/arXiv.1811.12808

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning

翻译

Sebastian Raschka

Abstract:

The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings. This article reviews different … >>>

翻译

100.

林海onrush (2022-10-29 13:51):

#paper，Formal Algorithms for Transformers，url：https://arxiv.org/pdf/2207.09238.pdf，在过去5年多的时间里，Transfermers在多个领域表现出惊人的效果。但是，对于Transformers算法的描述基本都集中在使用图形、文字描述、或针对优化部分的解释，并没有一篇论文给出一个较为完整的Algorithm伪代码。deepmind官方给出了形式化算法伪代码，论文详解见下面PDF

arXiv, 2022. DOI: 10.48550/arXiv.2207.09238

Formal Algorithms for Transformers

翻译

Mary Phuong, Marcus Hutter

Abstract:

This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used … >>>

翻译