来自杂志 arXiv 的文献。
当前共找到 110 篇文献分享,本页显示第 1 - 20 篇。
1.
前进 (2024-10-31 15:09):
#paper arXiv:2408.05839v2 Deep Learning in Medical Image Registration: Magic or Mirage? 38th Conference on Neural Information Processing Systems (NeurIPS 2024) 这篇论文深入探讨了医学图像配准领域中,基于深度学习的图像配准(DLIR)与传统优化方法的性能对比。论文比较了传统优化方法和基于学习的学习方法在DIR中的性能,指出传统方法在跨模态的泛化能力和稳健性能方面具有优势,而基于学习的方法则通过弱监督来实现更优的性能。通过一系列实验,论文验证了在无监督设置下,基于学习的方法在标签匹配性能上并没有显著超越传统方法,并提出了一个假设,即学习方法中的架构设计不太可能影响像素强度分布和标签之间的互信息,因此也不太可能显著提升基于学习的方法的性能。此外,论文还展示了在弱监督下,基于学习的方法具有更高的配准精度,这是传统方法难以实现的。然而,基于学习的方法对数据分布的变化较为敏感,并且未能展现出对数据分布变化的鲁棒性。论文最后给出结论,如果没有大型标记数据集,传统优化方法仍然是更优的选择。
arXiv, 2024-08-11T18:20:08Z. DOI: 10.48550/arXiv.2408.05839
Abstract:
Classical optimization and learning-based methods are the two reigningparadigms in deformable image registration. While optimization-based methodsboast generalizability across modalities and robust performance, learning-basedmethods promise peak performance, incorporating weak supervision and … >>>
Classical optimization and learning-based methods are the two reigningparadigms in deformable image registration. While optimization-based methodsboast generalizability across modalities and robust performance, learning-basedmethods promise peak performance, incorporating weak supervision and amortizedoptimization. However, the exact conditions for either paradigm to perform wellover the other are shrouded and not explicitly outlined in the existingliterature. In this paper, we make an explicit correspondence between themutual information of the distribution of per-pixel intensity and labels, andthe performance of classical registration methods. This strong correlationhints to the fact that architectural designs in learning-based methods isunlikely to affect this correlation, and therefore, the performance oflearning-based methods. This hypothesis is thoroughly validated withstate-of-the-art classical and learning-based methods. However, learning-basedmethods with weak supervision can perform high-fidelity intensity and labelregistration, which is not possible with classical methods. Next, we show thatthis high-fidelity feature learning does not translate to invariance to domainshift, and learning-based methods are sensitive to such changes in the datadistribution. Finally, we propose a general recipe to choose the best paradigmfor a given registration problem, based on these observations. <<<
翻译
2.
张浩彬 (2024-10-30 10:19):
#paper AdapterFusion: Non-Destructive Task Composition for Transfer Learning https://doi.org/10.48550/arXiv.2005.00247 adapter的改进版本,AdapterFusion。简单来说就是多个任务分别构建adapter,之后通过组合adapters的方式实现更好知识融合。 摘要简述:序列微调和多任务学习是旨在融合多个任务知识的方法;然而,它们存在灾难性遗忘和数据集平衡困难的问题。为了解决这些缺点,我们提出了AdapterFusion,这是一种新的两阶段学习算法,可以利用多个任务的知识。首先,在知识提取阶段,我们学习称为adapters的特定任务参数,这些参数封装了特定任务的信息。然后,我们在单独的知识组合步骤中组合adapters。我们表明,通过分离这两个阶段,即知识提取和知识组合,分类器可以以非破坏性的方式有效地利用从多个任务中学习的表示。我们在16个不同的NLU任务上对AdapterFusion进行了实证评估,发现它可以有效地在模型的不同层结合各种类型的知识。我们表明,我们的方法优于传统策略,如完全微调以及多任务学习。我们的代码和adapters可在AdapterHub.ml上获得。
arXiv, 2020-05-01T07:03:42Z. DOI: 10.48550/arXiv.2005.00247
Abstract:
Sequential fine-tuning and multi-task learning are methods aiming toincorporate knowledge from multiple tasks; however, they suffer fromcatastrophic forgetting and difficulties in dataset balancing. To address theseshortcomings, we propose AdapterFusion, a … >>>
Sequential fine-tuning and multi-task learning are methods aiming toincorporate knowledge from multiple tasks; however, they suffer fromcatastrophic forgetting and difficulties in dataset balancing. To address theseshortcomings, we propose AdapterFusion, a new two stage learning algorithm thatleverages knowledge from multiple tasks. First, in the knowledge extractionstage we learn task specific parameters called adapters, that encapsulate thetask-specific information. We then combine the adapters in a separate knowledgecomposition step. We show that by separating the two stages, i.e., knowledgeextraction and knowledge composition, the classifier can effectively exploitthe representations learned from multiple tasks in a non-destructive manner. Weempirically evaluate AdapterFusion on 16 diverse NLU tasks, and find that iteffectively combines various types of knowledge at different layers of themodel. We show that our approach outperforms traditional strategies such asfull fine-tuning as well as multi-task learning. Our code and adapters areavailable at AdapterHub.ml. <<<
翻译
3.
刘昊辰 (2024-10-12 10:09):
#paper arXiv:2409.12272v1 [cs.LG] 18 Sep 2024, Mastering Chess with a Transformer Model. 这是一篇关于Transformer模型在国际象棋中的应用的研究论文。论文证明了Transformer在国际象棋中的有效性在很大程度上取决于注意力机制中位置编码的选择。基于这一观察,论文采用了Shaw等人的通用位置编码方案,并大规模地训练了具有这种技术和其他增强功能的模型,将得到的架构称为ChessFormer。这种架构在对弈实力和解谜能力方面显著优于先前的工作,且计算成本只是其一小部分。下载地址:https://arxiv.org/pdf/2409.12272
arXiv, 2024-09-18T19:05:21Z. DOI: 10.48550/arXiv.2409.12272
Abstract:
Transformer models have demonstrated impressive capabilities when trained atscale, excelling at difficult cognitive tasks requiring complex reasoning andrational decision-making. In this paper, we explore the application oftransformer models to chess, … >>>
Transformer models have demonstrated impressive capabilities when trained atscale, excelling at difficult cognitive tasks requiring complex reasoning andrational decision-making. In this paper, we explore the application oftransformer models to chess, focusing on the critical role of the positionencoding within the attention mechanism. We show that in chess, transformersendowed with a sufficiently versatile position encoding can match existingchess-playing models at a fraction of the computational cost. Our architecturesignificantly outperforms AlphaZero at 8x fewer FLOPS and matches priorgrandmaster-level transformer-based agents at 30x fewer FLOPS. <<<
翻译
4.
尹志 (2024-09-30 23:02):
#paper https://doi.org/10.48550/arXiv.2405.20328 mRNA secondary structure prediction using utility-scale quantum computers。 这是今年IBM和Moderna合作的一篇工作。作者用CVaR-based VQE算法对mRNA的二级结构做了预测。RNA由于其单链多变的特性,非常难以预测。当然也正是这个原因,在计算上很容易被归类到组合优化问题的范畴。因此利用量子计算机去设计特定算法来加速解决,并给出最优结构显得顺理成章。文章使用了IBM的量子处理器Eagle和Heron, 得出的结果和经典算法CPLEX保持一致。当然,考虑到使用了NISQ的方式,如何保证机器的校准及错误抑制文章并没有交代的很细致,默认Eagle和Heron已经做到了吧。当然,这也给VQC算法(包括VQE、QAOA)解决组合优化问题做了一个很好的示范,充分证明了变分算法的灵活性。
arXiv, 2024-05-30T17:58:17Z. DOI: 10.48550/arXiv.2405.20328
Abstract:
Recent advancements in quantum computing have opened new avenues for tacklinglong-standing complex combinatorial optimization problems that are intractablefor classical computers. Predicting secondary structure of mRNA is one suchnotoriously difficult problem … >>>
Recent advancements in quantum computing have opened new avenues for tacklinglong-standing complex combinatorial optimization problems that are intractablefor classical computers. Predicting secondary structure of mRNA is one suchnotoriously difficult problem that can benefit from the ever-increasingmaturity of quantum computing technology. Accurate prediction of mRNA secondarystructure is critical in designing RNA-based therapeutics as it dictatesvarious steps of an mRNA life cycle, including transcription, translation, anddecay. The current generation of quantum computers have reached utility-scale,allowing us to explore relatively large problem sizes. In this paper, weexamine the feasibility of solving mRNA secondary structures on a quantumcomputer with sequence length up to 60 nucleotides representing problems in thequbit range of 10 to 80. We use Conditional Value at Risk (CVaR)-based VQEalgorithm to solve the optimization problems, originating from the mRNAstructure prediction problem, on the IBM Eagle and Heron quantum processors. Toour encouragement, even with ``minimal'' error mitigation and fixed-depthcircuits, our hardware runs yield accurate predictions of minimum free energy(MFE) structures that match the results of the classical solver CPLEX. Ourresults provide sufficient evidence for the viability of solving mRNA structureprediction problems on a quantum computer and motivate continued research inthis direction. <<<
翻译
5.
张浩彬 (2024-09-30 17:03):
#paper DOI 10.48550/arXiv.1902.00751 Parameter-Efficient Transfer Learning for NLP 。ICML 2019 Google 提出了Adapter,这算是peft方法中的开篇文章了。最近在整理大模型的peft的经典文章准备给学生上课,这篇作为开篇最为合适。 微调大型预训练模型是NLP中有效的迁移机制。然而,在存在许多下游任务的情况下,微调在参数效率方面不佳:每个任务都需要一个全新的模型。作为替代方案,我们提出了使用适配器模块进行迁移。适配器模块产生紧凑且可扩展的模型;它们只为每个任务添加少量可训练参数,并且可以在不重新访问之前任务的情况下添加新任务。原始网络的参数保持固定,从而产生高度的参数共享。为了证明适配器的有效性,我们将最近提出的BERT Transformer模型迁移到26个不同的文本分类任务,包括GLUE基准测试。适配器达到了接近最先进的性能,同时每个任务只添加少量参数。在GLUE上,我们达到了完全微调性能的0.4%以内,每个任务只增加3.6%的参数。相比之下,微调每个任务训练100%的参数。 论文中提出了以往的领域适应方法,我们都需要单独对模型进行训练,一般来说包括了两种办法,分别是基于特征的迁移和微调。基于特征的迁移就是基于预训练的embedding模型进行作为特征输入,然后输入到特定的下游任务模型中。
arXiv, 2019-02-02T16:29:47Z. DOI: 10.48550/arXiv.1902.00751
Abstract:
Fine-tuning large pre-trained models is an effective transfer mechanism inNLP. However, in the presence of many downstream tasks, fine-tuning isparameter inefficient: an entire new model is required for every task. … >>>
Fine-tuning large pre-trained models is an effective transfer mechanism inNLP. However, in the presence of many downstream tasks, fine-tuning isparameter inefficient: an entire new model is required for every task. As analternative, we propose transfer with adapter modules. Adapter modules yield acompact and extensible model; they add only a few trainable parameters pertask, and new tasks can be added without revisiting previous ones. Theparameters of the original network remain fixed, yielding a high degree ofparameter sharing. To demonstrate adapter's effectiveness, we transfer therecently proposed BERT Transformer model to 26 diverse text classificationtasks, including the GLUE benchmark. Adapters attain near state-of-the-artperformance, whilst adding only a few parameters per task. On GLUE, we attainwithin 0.4% of the performance of full fine-tuning, adding only 3.6% parametersper task. By contrast, fine-tuning trains 100% of the parameters per task. <<<
翻译
6.
刘昊辰 (2024-09-06 09:51):
#paper arXiv:2012.11045v1 [cs.AI] 20 Dec 2020, Monte-Carlo Graph Search for AlphaZero. 这是一篇关于如何改进AlphaZero算法的研究论文。AlphaZero算法在棋类游戏中取得了显著成果,但传统的MCTS算法并不共享不同子树之间的信息,这限制了其效率。论文将AlphaZero的搜索树从有向树扩展到有向无环图,允许不同子树之间的信息流动,显著减少内存消耗;并提出了结合蒙特卡洛图搜索(MCGS)的一系列改进,包括 ϵ-greedy、改进的残局求解器和领域知识的整合作为约束条件。使用CrazyAra引擎在国际象棋和crazyhouse上进行评估,展示了这些改进为AlphaZero带来的显著提升。下载地址:https://arxiv.org/pdf/2012.11045
arXiv, 2020-12-20T22:51:38Z. DOI: 10.48550/arXiv.2012.11045
Abstract:
The AlphaZero algorithm has been successfully applied in a range of discretedomains, most notably board games. It utilizes a neural network, that learns avalue and policy function to guide the … >>>
The AlphaZero algorithm has been successfully applied in a range of discretedomains, most notably board games. It utilizes a neural network, that learns avalue and policy function to guide the exploration in a Monte-Carlo TreeSearch. Although many search improvements have been proposed for Monte-CarloTree Search in the past, most of them refer to an older variant of the UpperConfidence bounds for Trees algorithm that does not use a policy for planning.We introduce a new, improved search algorithm for AlphaZero which generalizesthe search tree to a directed acyclic graph. This enables information flowacross different subtrees and greatly reduces memory consumption. Along withMonte-Carlo Graph Search, we propose a number of further extensions, such asthe inclusion of Epsilon-greedy exploration, a revised terminal solver and theintegration of domain knowledge as constraints. In our evaluations, we use theCrazyAra engine on chess and crazyhouse as examples to show that these changesbring significant improvements to AlphaZero. <<<
翻译
7.
刘昊辰 (2024-08-20 15:24):
#paper arXiv:2406.00741v1 [cs.AI] 2 Jun 2024, Learning to Play 7 Wonders Duel Without Human Supervision. 这篇论文介绍了玩桌游七大奇迹对决的人工智能程序ZeusAI。ZeusAI的灵感来源于AlphaZero强化学习算法,它结合了MCTS和Transformer,在没有人类监督的情况下学习游戏。ZeusAI与人类玩家的对弈结果显示,它达到了非常高的竞技水平,赢得了38局中的26局。文章以ZeusAI为工具研究了该桌游的平衡性。社区普遍认为先手玩家有显著优势,ZeusAI的自我对弈游戏证实了这一点。文章提出了一些规则变体,以减少这种不平衡,例如改变初始金币数量或改变奇迹选择阶段。下载地址:https://arxiv.org/pdf/2406.00741
arXiv, 2024-06-02T13:28:57Z. DOI: 10.48550/arXiv.2406.00741
Abstract:
This paper introduces ZeusAI, an artificial intelligence system developed toplay the board game 7 Wonders Duel. Inspired by the AlphaZero reinforcementlearning algorithm, ZeusAI relies on a combination of Monte Carlo … >>>
This paper introduces ZeusAI, an artificial intelligence system developed toplay the board game 7 Wonders Duel. Inspired by the AlphaZero reinforcementlearning algorithm, ZeusAI relies on a combination of Monte Carlo Tree Searchand a Transformer Neural Network to learn the game without human supervision.ZeusAI competes at the level of top human players, develops both known andnovel strategies, and allows us to test rule variants to improve the game'sbalance. This work demonstrates how AI can help in understanding and enhancingboard games. <<<
翻译
8.
符毓 Yu (2024-07-31 21:51):
#paper doi.org/10.48550/arXiv.2312.06512, 2024, Stoch BiRo: Design and Control of a low cost bipedal robot. 本文所提出的双足平台模型突出了熟练的行走能力、低计算需求和轻量级硬件设计。强化学习的奖励函数设计是用作动画镜像模仿跟随(motion-imitation rewards)并没有优先服务于整个机器人的IMU的水平保持,减少了很多扭矩模拟的数据
arXiv, 2023-12-11T16:39:11Z. DOI: 10.48550/arXiv.2312.06512
Abstract:
This paper introduces the Stoch BiRo, a cost-effective bipedal robot designedwith a modular mechanical structure having point feet to navigate uneven andunfamiliar terrains. The robot employs proprioceptive actuation in abduction,hips, … >>>
This paper introduces the Stoch BiRo, a cost-effective bipedal robot designedwith a modular mechanical structure having point feet to navigate uneven andunfamiliar terrains. The robot employs proprioceptive actuation in abduction,hips, and knees, leveraging a Raspberry Pi4 for control. Overcomingcomputational limitations, a Learning-based Linear Policy controller managesbalance and locomotion with only 3 degrees of freedom (DoF) per leg, distinctfrom the typical 5DoF in bipedal systems. Integrated within a modular controlarchitecture, these controllers enable autonomous handling of unforeseenterrain disturbances without external sensors or prior environment knowledge.The robot's policies are trained and simulated using MuJoCo, transferringlearned behaviors to the Stoch BiRo hardware for initial walking validations.This work highlights the Stoch BiRo's adaptability and cost-effectiveness inmechanical design, control strategies, and autonomous navigation, promisingdiverse applications in real-world robotics scenarios. <<<
翻译
9.
前进 (2024-07-31 11:35):
#paper DOI:https://doi.org/10.48550/arXiv.2006.16236 Katharopoulos A, Vyas A, Pappas N, et al. Transformers are rnns: Fast autoregressive transformers with linear attention[C]//International conference on machine learning. PMLR, 2020: 5156-5165. 这篇论文提出了一种新型的线性Transformer模型,该模型通过将自注意力机制表达为线性点积的核特征映射,并利用矩阵乘法的结合性质,显著降低了传统Transformer在处理长序列时的计算复杂度,从O(N^2)降低到O(N)。作者展示了这种新模型不仅能够实现与标准Transformer相似的性能,而且在自回归预测长序列时速度提升了多达4000倍。此外,论文还探讨了Transformer与循环神经网络(RNN)之间的关系,证明了通过适当的转换,Transformer可以像RNN一样高效地进行自回归预测。
arXiv, 2020-06-29T17:55:38Z. DOI: 10.48550/arXiv.2006.16236
Abstract:
Transformers achieve remarkable performance in several tasks but due to theirquadratic complexity, with respect to the input's length, they areprohibitively slow for very long sequences. To address this limitation, weexpress … >>>
Transformers achieve remarkable performance in several tasks but due to theirquadratic complexity, with respect to the input's length, they areprohibitively slow for very long sequences. To address this limitation, weexpress the self-attention as a linear dot-product of kernel feature maps andmake use of the associativity property of matrix products to reduce thecomplexity from $\mathcal{O}\left(N^2\right)$ to $\mathcal{O}\left(N\right)$,where $N$ is the sequence length. We show that this formulation permits aniterative implementation that dramatically accelerates autoregressivetransformers and reveals their relationship to recurrent neural networks. Ourlinear transformers achieve similar performance to vanilla transformers andthey are up to 4000x faster on autoregressive prediction of very longsequences. <<<
翻译
10.
符毓 Yu (2024-06-30 23:02):
#paper doi.org/10.48550/arXiv.2404.17569, 2024, MaPa: Text-driven Photorealistic Material Painting for 3D Shapes. 本文提供了通过文字给3D模型渲染高质量材质表面的算法。 算法分为四步,首先,将网格分解为不同的片段,并使用片段控制图像生成技术(具体采用 ControlNet)将它们投影到 2D 图像上;第二,根据相似的材质属性和外观将这些片段分类。第三,每个材质组都会经过选择过程,会在此过程中识别和优化合适的材质图,以准确表示其纹理和特性。最后是迭代的,不断在多个视图中渲染和优化这些材质图,填补视觉数据中的任何空白,并重复分组和优化阶段,直到网格的每个片段都由相应的材质图准确表示。这种综合方法可确保根据 3D 网格每个片段的独特特征定制详细而逼真的材质纹理。
Abstract:
This paper aims to generate materials for 3D meshes from text descriptions.Unlike existing methods that synthesize texture maps, we propose to generatesegment-wise procedural material graphs as the appearance representation, whichsupports … >>>
This paper aims to generate materials for 3D meshes from text descriptions.Unlike existing methods that synthesize texture maps, we propose to generatesegment-wise procedural material graphs as the appearance representation, whichsupports high-quality rendering and provides substantial flexibility inediting. Instead of relying on extensive paired data, i.e., 3D meshes withmaterial graphs and corresponding text descriptions, to train a material graphgenerative model, we propose to leverage the pre-trained 2D diffusion model asa bridge to connect the text and material graphs. Specifically, our approachdecomposes a shape into a set of segments and designs a segment-controlleddiffusion model to synthesize 2D images that are aligned with mesh parts. Basedon generated images, we initialize parameters of material graphs and fine-tunethem through the differentiable rendering module to produce materials inaccordance with the textual description. Extensive experiments demonstrate thesuperior performance of our framework in photorealism, resolution, andeditability over existing methods. Project page: https://zju3dv.github.io/MaPa <<<
翻译
11.
前进 (2024-06-30 22:29):
#paper Liu R , Li Z , Fan X ,et al.Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond[J]. 2020.DOI:10.48550/arXiv.2004.14557. 论文提出了一个新的基于深度学习的框架,旨在通过多尺度传播优化微分同胚模型来整合传统变形配准方法和基于深度学习的方法的优势,并避免它们的局限性。具体来说,作者提出了一个通用的优化模型来解决微分同胚配准问题,并开发了一系列可学习的架构,以从粗到细的学习图像特征完成配准。此外,论文还提出了一种新颖的双层自调整训练策略,允许高效地搜索任务特定的超参数,这增加了对各种类型数据的灵活性,同时减少了计算和人力负担。 作者多种数据集上进行了配准实验,包括大脑MRI数据的图像到图谱配准和肝脏CT数据的图像到图像配准。实验结果表明,所提出的方法在保持微分同胚的同时,达到了最先进的性能。此外,作者还将其框架应用于多模态图像配准,并研究了其配准如何支持医学图像分析的下游任务,包括多模态融合和图像分割。
Abstract:
Conventional deformable registration methods aim at solving an optimizationmodel carefully designed on image pairs and their computational costs areexceptionally high. In contrast, recent deep learning based approaches canprovide fast deformation … >>>
Conventional deformable registration methods aim at solving an optimizationmodel carefully designed on image pairs and their computational costs areexceptionally high. In contrast, recent deep learning based approaches canprovide fast deformation estimation. These heuristic network architectures arefully data-driven and thus lack explicit geometric constraints, e.g.,topology-preserving, which are indispensable to generate plausibledeformations. We design a new deep learning based framework to optimize adiffeomorphic model via multi-scale propagation in order to integrateadvantages and avoid limitations of these two categories of approaches.Specifically, we introduce a generic optimization model to formulatediffeomorphic registration and develop a series of learnable architectures toobtain propagative updating in the coarse-to-fine feature space. Moreover, wepropose a novel bilevel self-tuned training strategy, allowing efficient searchof task-specific hyper-parameters. This training strategy increases theflexibility to various types of data while reduces computational and humanburdens. We conduct two groups of image registration experiments on 3D volumedatasets including image-to-atlas registration on brain MRI data andimage-to-image registration on liver CT data. Extensive results demonstrate thestate-of-the-art performance of the proposed method with diffeomorphicguarantee and extreme efficiency. We also apply our framework to challengingmulti-modal image registration, and investigate how our registration to supportthe down-streaming tasks for medical image analysis including multi-modalfusion and image segmentation. <<<
翻译
12.
张浩彬 (2024-06-30 10:34):
@paper https://doi.org/10.48550/arXiv.2403.10131 RAFT: Adapting Language Model to Domain Specific RAG 对我而言很有启发性的paper。在大型文本数据集上预训练大型语言模型(LLMs)已成为一种标准范式。当将这些LLMs用于许多下游应用时,通常会将新的知识(例如,时效性新闻或私有领域知识)通过基于RAG(Retrieval-Augmented Generation,检索增强生成)的提示或微调,融入到预训练模型中。然而,模型如何以最优方式获取这种新知识仍然是一个开放的问题。在这篇论文中,提出了检索增强微调(Retrieval Augmented Fine Tuning,RAFT),简单来说,就是你要用rag的东西微调一下,并使用思维链熟悉一下要做的事情。当然,rag本身和微调就是两个套路,现在合在一起,似乎有点本末倒置,这也是这篇论文我认为没有讨论清楚的地方。不过这些不清楚的地方也是新的研究空间。
Abstract:
Pretraining Large Language Models (LLMs) on large corpora of textual data isnow a standard paradigm. When using these LLMs for many downstreamapplications, it is common to additionally bake in new … >>>
Pretraining Large Language Models (LLMs) on large corpora of textual data isnow a standard paradigm. When using these LLMs for many downstreamapplications, it is common to additionally bake in new knowledge (e.g.,time-critical news, or private domain knowledge) into the pretrained modeleither through RAG-based-prompting, or fine-tuning. However, the optimalmethodology for the model to gain such new knowledge remains an open question.In this paper, we present Retrieval Augmented FineTuning (RAFT), a trainingrecipe that improves the model's ability to answer questions in a "open-book"in-domain settings. In RAFT, given a question, and a set of retrieveddocuments, we train the model to ignore those documents that don't help inanswering the question, which we call, distractor documents. RAFT accomplishesthis by citing verbatim the right sequence from the relevant document thatwould help answer the question. This coupled with RAFT's chain-of-thought-styleresponse helps improve the model's ability to reason. In domain-specific RAG,RAFT consistently improves the model's performance across PubMed, HotpotQA, andGorilla datasets, presenting a post-training recipe to improve pre-trained LLMsto in-domain RAG. RAFT's code and demo are open-sourced atgithub.com/ShishirPatil/gorilla. <<<
翻译
13.
张浩彬 (2024-05-31 07:31):
#paper doi:https://doi.org/10.48550/arXiv.2403.10131 RAFT: Adapting Language Model to Domain Specific RAG 简单但有效的思路。传统大模型变为领域 应用,我们可以微调也可以使用rag,但微软说,我们可以应该基于rag微调。RAFT 是一种将预训练的大型语言模型微调到特定领域 RAG 设置的通用方法。在特定领域 RAG 中,模型需要根据特定领域的一组文档回答问题,例如企业中的私有文件。这与通用 RAG 不同,因为通用 RAG 中的模型并不知道它将在哪个领域进行测试。简单来说,微调是闭卷考试,靠记忆回答。rag是开卷开始,虽然我没记忆,但是考试的时候可以翻书,那么raft就是开卷考试前,我还是先看了一下教科书,虽然没看全,但是大概知道考题长什么样子,但没关系,因为考试的时候我还可以翻书。
arXiv, 2024.
Abstract: No abstract available.
14.
尹志 (2024-05-30 15:52):
#paper  Protein Conformation Generation via Force-Guided SE(3) Diffusion Models  https://doi.org/10.48550/arXiv.2403.14088 字节跳动的一个新工作,还是蛋白质构象生成,还是SE(3) diffusion model, 不过区别于常见的静态构象的生成,这个工作提出了动态构象的生成, 这当然有意义的多,毕竟真实世界的蛋白质构象是动态的,是一个构象分布。文章引入物理信息作为guidance,这个思路很有意思,因为这样既可以 兼顾物理系统的先验,又回避了类似md这样的纯模型计算的性能问题,类似将md的计算进行了抽象,形成先验,作为guidance,然后利用生成模型进行生成。
Abstract:
The conformational landscape of proteins is crucial to understanding theirfunctionality in complex biological processes. Traditional physics-basedcomputational methods, such as molecular dynamics (MD) simulations, suffer fromrare event sampling and long equilibration … >>>
The conformational landscape of proteins is crucial to understanding theirfunctionality in complex biological processes. Traditional physics-basedcomputational methods, such as molecular dynamics (MD) simulations, suffer fromrare event sampling and long equilibration time problems, hindering theirapplications in general protein systems. Recently, deep generative modelingtechniques, especially diffusion models, have been employed to generate novelprotein conformations. However, existing score-based diffusion methods cannotproperly incorporate important physical prior knowledge to guide the generationprocess, causing large deviations in the sampled protein conformations from theequilibrium distribution. In this paper, to overcome these limitations, wepropose a force-guided SE(3) diffusion model, ConfDiff, for proteinconformation generation. By incorporating a force-guided network with a mixtureof data-based score models, ConfDiff can can generate protein conformationswith rich diversity while preserving high fidelity. Experiments on a variety ofprotein conformation prediction tasks, including 12 fast-folding proteins andthe Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our methodsurpasses the state-of-the-art method. <<<
翻译
15.
尹志 (2024-04-30 22:48):
#paper doi:https://doi.org/10.48550/arXiv.2211.07697,NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations, 2022. Do Neural Networks Trained with Topological Features Learn Different Internal Representations? 作者主要讨论了使用拓扑特征训练神经网络和使用常规数据直接进行神经网络训练在表征上的区别。结论很有意思,比较容易猜到的是,两者确实有区别,特别是在作者选择的metrics下,这也说明了拓扑机器学习的价值。但作者发现在一些情况下,也存在可以利用简单的表征来替代拓扑特征训练的模型。当然,在具体的数据场景下怎么样提取出合适的拓扑特征显著区别于使用raw data可以提取的特征,这仍是一个开放的主题。
Abstract:
There is a growing body of work that leverages features extracted viatopological data analysis to train machine learning models. While this field,sometimes known as topological machine learning (TML), has seen … >>>
There is a growing body of work that leverages features extracted viatopological data analysis to train machine learning models. While this field,sometimes known as topological machine learning (TML), has seen some notablesuccesses, an understanding of how the process of learning from topologicalfeatures differs from the process of learning from raw data is still limited.In this work, we begin to address one component of this larger issue by askingwhether a model trained with topological features learns internalrepresentations of data that are fundamentally different than those learned bya model trained with the original raw data. To quantify ``different'', weexploit two popular metrics that can be used to measure the similarity of thehidden representations of data within neural networks, neural stitching andcentered kernel alignment. From these we draw a range of conclusions about howtraining with topological features does and does not change the representationsthat a model learns. Perhaps unsurprisingly, we find that structurally, thehidden representations of models trained and evaluated on topological featuresdiffer substantially compared to those trained and evaluated on thecorresponding raw data. On the other hand, our experiments show that in somecases, these representations can be reconciled (at least to the degree requiredto solve the corresponding task) using a simple affine transformation. Weconjecture that this means that neural networks trained on raw data may extractsome limited topological features in the process of making predictions. <<<
翻译
16.
前进 (2024-04-30 11:44):
#paper Han D, Pan X, Han Y, et al. Flatten transformer: Vision transformer using focused linear attention[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 5961-5971. 自注意力(self-attention)在计算机视觉任务中应用时面临的主要挑战是其二次计算复杂度,这使得处理视觉任务变得非常昂贵。作为Softmax注意力的一种替代方案,线性注意力通过精心设计的映射函数来近似Softmax操作,从而将计算复杂度从二次降低到线性。尽管线性注意力在理论上更加高效,但现有的线性注意力方法要么性能显著下降,要么需要额外的计算开销,这限制了它们的实际应用。为了克服这些限制,论文提出了FLA模块,它通过两个主要的改进来提高效率和表达能力:焦点能力:1 通过一个简单的映射函数,增强了自注意力对最信息特征的聚焦能力。特征多样性:引入了一个高效的秩恢复模块,通过深度卷积(DWC)来恢复注意力矩阵的秩,增加了特征的多样性。通过在多个先进的视觉Transformer模型上的广泛实验,FLA模块在多个基准测试中均显示出了一致的性能提升。
17.
张浩彬 (2024-04-29 20:35):
#paper doi: https://doi.org/10.48550/arXiv.2211.14730 A Time Series is Worth 64 Words: Long-term Forecasting with Transformers ICLR2023的文章,提出了PatchTST。受vision Transformer的启发,把patch技术引入到时序问题。并且回应了早期另一篇认为Transformer用在时间序列其实并不比传统线性模型好的文章(Are transformers effective for time series forecasting?(2022)),重新取得了sota。然而23年底,又有新方法出现了,讨论了其实关键不是transformer,而是patch技术
Abstract:
We propose an efficient design of Transformer-based models for multivariatetime series forecasting and self-supervised representation learning. It isbased on two key components: (i) segmentation of time series intosubseries-level patches which … >>>
We propose an efficient design of Transformer-based models for multivariatetime series forecasting and self-supervised representation learning. It isbased on two key components: (i) segmentation of time series intosubseries-level patches which are served as input tokens to Transformer; (ii)channel-independence where each channel contains a single univariate timeseries that shares the same embedding and Transformer weights across all theseries. Patching design naturally has three-fold benefit: local semanticinformation is retained in the embedding; computation and memory usage of theattention maps are quadratically reduced given the same look-back window; andthe model can attend longer history. Our channel-independent patch time seriesTransformer (PatchTST) can improve the long-term forecasting accuracysignificantly when compared with that of SOTA Transformer-based models. We alsoapply our model to self-supervised pre-training tasks and attain excellentfine-tuning performance, which outperforms supervised training on largedatasets. Transferring of masked pre-trained representation on one dataset toothers also produces SOTA forecasting accuracy. Code is available at:https://github.com/yuqinie98/PatchTST. <<<
翻译
18.
林海onrush (2024-04-02 00:39):
#paper, Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series, doi:https://doi.org/10.48550/arXiv.2311.13326,这篇论文针对金融时间序列的无模型控制问题,提出了一种新颖的解决思路。传统的强化学习方法在这一领域面临训练数据有限且噪声大的挑战。为此,本文探索了将课程学习和模仿学习这两种在机器人领域已有成功应用的范式引入到金融问题中。通过在两个代表性的数据集上的大量实证实验,论文发现课程学习能够显著提升强化学习算法在复杂金融时间序列决策中的表现,优于所有baseline方法。课程学习通过数据增强逐步提高训练任务的难度,体现了 "由易到难" 的学习策略。实验表明,这种适度的数据平滑可以有效降低数据中的噪声,使得强化学习算法更好地捕捉到真实的市场信号。 相比之下,直接应用模仿学习的效果并不理想。进一步的分析表明,这可能是由于模仿学习在去除噪声的同时,也丢失了部分关键的市场信号。从统计学的角度看,模仿学习实现了噪声和信号的分解,但过度的去噪反而损害了策略学习的效果。 本文的理论贡献在于提出了一个信号噪声分解的统计框架,用于解释课程学习和模仿学习在金融时间序列问题上的效果差异。这一框架也为算法的改进提供了新的思路。此外,论文还讨论了一些有待未来进一步探索的方向,包括考察信号噪声分解的非平稳特性,探索其他形式的数据平滑方法,以及将课程学习拓展应用到其他类型的高噪声时间序列学习任务中。
Abstract:
Curriculum learning and imitation learning have been leveraged extensively inthe robotics domain. However, minimal research has been done on leveragingthese ideas on control tasks over highly stochastic time-series data. Here, … >>>
Curriculum learning and imitation learning have been leveraged extensively inthe robotics domain. However, minimal research has been done on leveragingthese ideas on control tasks over highly stochastic time-series data. Here, wetheoretically and empirically explore these approaches in a representativecontrol task over complex time-series data. We implement the fundamental ideasof curriculum learning via data augmentation, while imitation learning isimplemented via policy distillation from an oracle. Our findings reveal thatcurriculum learning should be considered a novel direction in improvingcontrol-task performance over complex time-series. Our ample random-seedout-sample empirics and ablation studies are highly encouraging for curriculumlearning for time-series control. These findings are especially encouraging aswe tune all overlapping hyperparameters on the baseline -- giving an advantageto the baseline. On the other hand, we find that imitation learning should beused with caution. <<<
翻译
19.
符毓 Yu (2024-03-31 23:50):
#paper doi.org/10.48550/arXiv.2403.16527, 2024, Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art. 智能控制系统能通过预训练在各场景下得到广泛应用,但在训练外场景下表现糟糕。大模型出现有希望提供现有训练方式缺乏的推理能力,但大模型会产生“幻觉”(听起来合理但很差的决策)。本文尝试定义“幻觉”,并给出检测和缓解规划中出现“幻觉”的方法分类,评估指标和数据集等
Abstract:
Autonomous systems are soon to be ubiquitous, from manufacturing autonomy toagricultural field robots, and from health care assistants to the entertainmentindustry. The majority of these systems are developed with modularsub-components … >>>
Autonomous systems are soon to be ubiquitous, from manufacturing autonomy toagricultural field robots, and from health care assistants to the entertainmentindustry. The majority of these systems are developed with modularsub-components for decision-making, planning, and control that may behand-engineered or learning-based. While these existing approaches have beenshown to perform well under the situations they were specifically designed for,they can perform especially poorly in rare, out-of-distribution scenarios thatwill undoubtedly arise at test-time. The rise of foundation models trained onmultiple tasks with impressively large datasets from a variety of fields hasled researchers to believe that these models may provide common sense reasoningthat existing planners are missing. Researchers posit that this common sensereasoning will bridge the gap between algorithm development and deployment toout-of-distribution tasks, like how humans adapt to unexpected scenarios. Largelanguage models have already penetrated the robotics and autonomous systemsdomains as researchers are scrambling to showcase their potential use cases indeployment. While this application direction is very promising empirically,foundation models are known to hallucinate and generate decisions that maysound reasonable, but are in fact poor. We argue there is a need to step backand simultaneously design systems that can quantify the certainty of a model'sdecision, and detect when it may be hallucinating. In this work, we discuss thecurrent use cases of foundation models for decision-making tasks, provide ageneral definition for hallucinations with examples, discuss existingapproaches to hallucination detection and mitigation with a focus on decisionproblems, and explore areas for further research in this exciting field. <<<
翻译
20.
符毓 Yu (2024-02-29 22:43):
#paper doi.org/10.48550/arXiv.2304.09349 2023, LLM as A Robotic Brain: Unifying Egocentric Memory and Control. LLM 代理通过预训练获得知识和推理能力来解决机器人技术和规划任务。然而,人们在教机器人“该做什么”付出了较多努力。文章重点在于传达机器人不能做什么,以及满足安全操作标准。针对在协作环境中部署LLM代理,提出了解决LLM模型固有的概率性和不能应对复杂条件的约束方式。最终在VirtualHome环境和真实机器人实验上都表明,能在不影响目标完成率的情况下满足安全约束条件
回到顶部