来自杂志 arXiv 的文献。
当前共找到 166 篇文献分享,本页显示第 61 - 80 篇。
61.
张浩彬 (2024-09-30 17:03):
#paper DOI 10.48550/arXiv.1902.00751 Parameter-Efficient Transfer Learning for NLP 。ICML 2019 Google 提出了Adapter,这算是peft方法中的开篇文章了。最近在整理大模型的peft的经典文章准备给学生上课,这篇作为开篇最为合适。 微调大型预训练模型是NLP中有效的迁移机制。然而,在存在许多下游任务的情况下,微调在参数效率方面不佳:每个任务都需要一个全新的模型。作为替代方案,我们提出了使用适配器模块进行迁移。适配器模块产生紧凑且可扩展的模型;它们只为每个任务添加少量可训练参数,并且可以在不重新访问之前任务的情况下添加新任务。原始网络的参数保持固定,从而产生高度的参数共享。为了证明适配器的有效性,我们将最近提出的BERT Transformer模型迁移到26个不同的文本分类任务,包括GLUE基准测试。适配器达到了接近最先进的性能,同时每个任务只添加少量参数。在GLUE上,我们达到了完全微调性能的0.4%以内,每个任务只增加3.6%的参数。相比之下,微调每个任务训练100%的参数。 论文中提出了以往的领域适应方法,我们都需要单独对模型进行训练,一般来说包括了两种办法,分别是基于特征的迁移和微调。基于特征的迁移就是基于预训练的embedding模型进行作为特征输入,然后输入到特定的下游任务模型中。
arXiv, 2019-02-02T16:29:47Z. DOI: 10.48550/arXiv.1902.00751
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly
Abstract:
Fine-tuning large pre-trained models is an effective transfer mechanism in
NLP. However, in the presence of many downstream tasks, fine-tuning is
parameter inefficient: an entire new model is required for every task. As an
alternative, we propose transfer with adapter modules. Adapter modules yield a
compact and extensible model; they add only a few trainable parameters per
task, and new tasks can be added without revisiting previous ones. The
parameters of the original network remain fixed, yielding a high degree of
parameter sharing. To demonstrate adapter's effectivene… >>>
Fine-tuning large pre-trained models is an effective transfer mechanism in<br>NLP. However, in the presence of many downstream tasks, fine-tuning is<br>parameter inefficient: an entire new model is required for every task. As an<br>alternative, we propose transfer with adapter modules. Adapter modules yield a<br>compact and extensible model; they add only a few trainable parameters per<br>task, and new tasks can be added without revisiting previous ones. The<br>parameters of the original network remain fixed, yielding a high degree of<br>parameter sharing. To demonstrate adapter's effectiveness, we transfer the<br>recently proposed BERT Transformer model to 26 diverse text classification<br>tasks, including the GLUE benchmark. Adapters attain near state-of-the-art<br>performance, whilst adding only a few parameters per task. On GLUE, we attain<br>within 0.4% of the performance of full fine-tuning, adding only 3.6% parameters<br>per task. By contrast, fine-tuning trains 100% of the parameters per task. <<<
62.
刘昊辰 (2024-09-06 09:51):
#paper arXiv:2012.11045v1 [cs.AI] 20 Dec 2020, Monte-Carlo Graph Search for AlphaZero. 这是一篇关于如何改进AlphaZero算法的研究论文。AlphaZero算法在棋类游戏中取得了显著成果,但传统的MCTS算法并不共享不同子树之间的信息,这限制了其效率。论文将AlphaZero的搜索树从有向树扩展到有向无环图,允许不同子树之间的信息流动,显著减少内存消耗;并提出了结合蒙特卡洛图搜索(MCGS)的一系列改进,包括 ϵ-greedy、改进的残局求解器和领域知识的整合作为约束条件。使用CrazyAra引擎在国际象棋和crazyhouse上进行评估,展示了这些改进为AlphaZero带来的显著提升。下载地址:https://arxiv.org/pdf/2012.11045
arXiv, 2020-12-20T22:51:38Z. DOI: 10.48550/arXiv.2012.11045
Johannes Czech, Patrick Korus, Kristian Kersting
Abstract:
The AlphaZero algorithm has been successfully applied in a range of discrete
domains, most notably board games. It utilizes a neural network, that learns a
value and policy function to guide the exploration in a Monte-Carlo Tree
Search. Although many search improvements have been proposed for Monte-Carlo
Tree Search in the past, most of them refer to an older variant of the Upper
Confidence bounds for Trees algorithm that does not use a policy for planning.
We introduce a new, improved search algorithm for AlphaZero which generalizes
the search tree to a directed acyclic … >>>
The AlphaZero algorithm has been successfully applied in a range of discrete<br>domains, most notably board games. It utilizes a neural network, that learns a<br>value and policy function to guide the exploration in a Monte-Carlo Tree<br>Search. Although many search improvements have been proposed for Monte-Carlo<br>Tree Search in the past, most of them refer to an older variant of the Upper<br>Confidence bounds for Trees algorithm that does not use a policy for planning.<br>We introduce a new, improved search algorithm for AlphaZero which generalizes<br>the search tree to a directed acyclic graph. This enables information flow<br>across different subtrees and greatly reduces memory consumption. Along with<br>Monte-Carlo Graph Search, we propose a number of further extensions, such as<br>the inclusion of Epsilon-greedy exploration, a revised terminal solver and the<br>integration of domain knowledge as constraints. In our evaluations, we use the<br>CrazyAra engine on chess and crazyhouse as examples to show that these changes<br>bring significant improvements to AlphaZero. <<<
63.
刘昊辰 (2024-08-20 15:24):
#paper arXiv:2406.00741v1 [cs.AI] 2 Jun 2024, Learning to Play 7 Wonders Duel Without Human Supervision. 这篇论文介绍了玩桌游七大奇迹对决的人工智能程序ZeusAI。ZeusAI的灵感来源于AlphaZero强化学习算法,它结合了MCTS和Transformer,在没有人类监督的情况下学习游戏。ZeusAI与人类玩家的对弈结果显示,它达到了非常高的竞技水平,赢得了38局中的26局。文章以ZeusAI为工具研究了该桌游的平衡性。社区普遍认为先手玩家有显著优势,ZeusAI的自我对弈游戏证实了这一点。文章提出了一些规则变体,以减少这种不平衡,例如改变初始金币数量或改变奇迹选择阶段。下载地址:https://arxiv.org/pdf/2406.00741
arXiv, 2024-06-02T13:28:57Z. DOI: 10.48550/arXiv.2406.00741
Giovanni Paolini, Lorenzo Moreschini, Francesco Veneziano, Alessandro Iraci
Abstract:
This paper introduces ZeusAI, an artificial intelligence system developed to<br>play the board game 7 Wonders Duel. Inspired by the AlphaZero reinforcement<br>learning algorithm, ZeusAI relies on a combination of Monte Carlo Tree Search<br>and a Transformer Neural Network to learn the game without human supervision.<br>ZeusAI competes at the level of top human players, develops both known and<br>novel strategies, and allows us to test rule variants to improve the game's<br>balance. This work demonstrates how AI can help in understanding and enhancing<br>board games.
64.
符毓 (2024-07-31 21:51):
#paper doi.org/10.48550/arXiv.2312.06512, 2024, Stoch BiRo: Design and Control of a low cost bipedal robot. 本文所提出的双足平台模型突出了熟练的行走能力、低计算需求和轻量级硬件设计。强化学习的奖励函数设计是用作动画镜像模仿跟随(motion-imitation rewards)并没有优先服务于整个机器人的IMU的水平保持,减少了很多扭矩模拟的数据
arXiv, 2023-12-11T16:39:11Z. DOI: 10.48550/arXiv.2312.06512
GVS Mothish, Karthik Rajgopal, Ravi Kola, Manan Tayal, Shishir Kolathaya
Abstract:
This paper introduces the Stoch BiRo, a cost-effective bipedal robot designed
with a modular mechanical structure having point feet to navigate uneven and
unfamiliar terrains. The robot employs proprioceptive actuation in abduction,
hips, and knees, leveraging a Raspberry Pi4 for control. Overcoming
computational limitations, a Learning-based Linear Policy controller manages
balance and locomotion with only 3 degrees of freedom (DoF) per leg, distinct
from the typical 5DoF in bipedal systems. Integrated within a modular control
architecture, these controllers enable auton… >>>
This paper introduces the Stoch BiRo, a cost-effective bipedal robot designed<br>with a modular mechanical structure having point feet to navigate uneven and<br>unfamiliar terrains. The robot employs proprioceptive actuation in abduction,<br>hips, and knees, leveraging a Raspberry Pi4 for control. Overcoming<br>computational limitations, a Learning-based Linear Policy controller manages<br>balance and locomotion with only 3 degrees of freedom (DoF) per leg, distinct<br>from the typical 5DoF in bipedal systems. Integrated within a modular control<br>architecture, these controllers enable autonomous handling of unforeseen<br>terrain disturbances without external sensors or prior environment knowledge.<br>The robot's policies are trained and simulated using MuJoCo, transferring<br>learned behaviors to the Stoch BiRo hardware for initial walking validations.<br>This work highlights the Stoch BiRo's adaptability and cost-effectiveness in<br>mechanical design, control strategies, and autonomous navigation, promising<br>diverse applications in real-world robotics scenarios. <<<
65.
前进 (2024-07-31 11:35):
#paper DOI:https://doi.org/10.48550/arXiv.2006.16236 Katharopoulos A, Vyas A, Pappas N, et al. Transformers are rnns: Fast autoregressive transformers with linear attention[C]//International conference on machine learning. PMLR, 2020: 5156-5165. 这篇论文提出了一种新型的线性Transformer模型,该模型通过将自注意力机制表达为线性点积的核特征映射,并利用矩阵乘法的结合性质,显著降低了传统Transformer在处理长序列时的计算复杂度,从O(N^2)降低到O(N)。作者展示了这种新模型不仅能够实现与标准Transformer相似的性能,而且在自回归预测长序列时速度提升了多达4000倍。此外,论文还探讨了Transformer与循环神经网络(RNN)之间的关系,证明了通过适当的转换,Transformer可以像RNN一样高效地进行自回归预测。
arXiv, 2020-06-29T17:55:38Z. DOI: 10.48550/arXiv.2006.16236
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret
Abstract:
Transformers achieve remarkable performance in several tasks but due to their
quadratic complexity, with respect to the input's length, they are
prohibitively slow for very long sequences. To address this limitation, we
express the self-attention as a linear dot-product of kernel feature maps and
make use of the associativity property of matrix products to reduce the
complexity from $\mathcal{O}\left(N^2\right)$ to $\mathcal{O}\left(N\right)$,
where $N$ is the sequence length. We show that this formulation permits an
iterative implementation that dramatically accelerates … >>>
Transformers achieve remarkable performance in several tasks but due to their<br>quadratic complexity, with respect to the input's length, they are<br>prohibitively slow for very long sequences. To address this limitation, we<br>express the self-attention as a linear dot-product of kernel feature maps and<br>make use of the associativity property of matrix products to reduce the<br>complexity from $\mathcal{O}\left(N^2\right)$ to $\mathcal{O}\left(N\right)$,<br>where $N$ is the sequence length. We show that this formulation permits an<br>iterative implementation that dramatically accelerates autoregressive<br>transformers and reveals their relationship to recurrent neural networks. Our<br>linear transformers achieve similar performance to vanilla transformers and<br>they are up to 4000x faster on autoregressive prediction of very long<br>sequences. <<<
66.
符毓 (2024-06-30 23:02):
#paper doi.org/10.48550/arXiv.2404.17569, 2024, MaPa: Text-driven Photorealistic Material Painting for 3D Shapes. 本文提供了通过文字给3D模型渲染高质量材质表面的算法。 算法分为四步,首先,将网格分解为不同的片段,并使用片段控制图像生成技术(具体采用 ControlNet)将它们投影到 2D 图像上;第二,根据相似的材质属性和外观将这些片段分类。第三,每个材质组都会经过选择过程,会在此过程中识别和优化合适的材质图,以准确表示其纹理和特性。最后是迭代的,不断在多个视图中渲染和优化这些材质图,填补视觉数据中的任何空白,并重复分组和优化阶段,直到网格的每个片段都由相应的材质图准确表示。这种综合方法可确保根据 3D 网格每个片段的独特特征定制详细而逼真的材质纹理。
Shangzhan Zhang, Sida Peng, Tao Xu, Yuanbo Yang, Tianrun Chen, Nan Xue, Yujun Shen, Hujun Bao, Ruizhen Hu, Xiaowei Zhou
Abstract:
This paper aims to generate materials for 3D meshes from text descriptions.
Unlike existing methods that synthesize texture maps, we propose to generate
segment-wise procedural material graphs as the appearance representation, which
supports high-quality rendering and provides substantial flexibility in
editing. Instead of relying on extensive paired data, i.e., 3D meshes with
material graphs and corresponding text descriptions, to train a material graph
generative model, we propose to leverage the pre-trained 2D diffusion model as
a bridge to connect the text and materia… >>>
This paper aims to generate materials for 3D meshes from text descriptions.<br>Unlike existing methods that synthesize texture maps, we propose to generate<br>segment-wise procedural material graphs as the appearance representation, which<br>supports high-quality rendering and provides substantial flexibility in<br>editing. Instead of relying on extensive paired data, i.e., 3D meshes with<br>material graphs and corresponding text descriptions, to train a material graph<br>generative model, we propose to leverage the pre-trained 2D diffusion model as<br>a bridge to connect the text and material graphs. Specifically, our approach<br>decomposes a shape into a set of segments and designs a segment-controlled<br>diffusion model to synthesize 2D images that are aligned with mesh parts. Based<br>on generated images, we initialize parameters of material graphs and fine-tune<br>them through the differentiable rendering module to produce materials in<br>accordance with the textual description. Extensive experiments demonstrate the<br>superior performance of our framework in photorealism, resolution, and<br>editability over existing methods. Project page: https://zju3dv.github.io/MaPa <<<
67.
前进 (2024-06-30 22:29):
#paper Liu R , Li Z , Fan X ,et al.Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond[J]. 2020.DOI:10.48550/arXiv.2004.14557. 论文提出了一个新的基于深度学习的框架,旨在通过多尺度传播优化微分同胚模型来整合传统变形配准方法和基于深度学习的方法的优势,并避免它们的局限性。具体来说,作者提出了一个通用的优化模型来解决微分同胚配准问题,并开发了一系列可学习的架构,以从粗到细的学习图像特征完成配准。此外,论文还提出了一种新颖的双层自调整训练策略,允许高效地搜索任务特定的超参数,这增加了对各种类型数据的灵活性,同时减少了计算和人力负担。 作者多种数据集上进行了配准实验,包括大脑MRI数据的图像到图谱配准和肝脏CT数据的图像到图像配准。实验结果表明,所提出的方法在保持微分同胚的同时,达到了最先进的性能。此外,作者还将其框架应用于多模态图像配准,并研究了其配准如何支持医学图像分析的下游任务,包括多模态融合和图像分割。
Risheng Liu, Zi Li, Xin Fan, Chenying Zhao, Hao Huang, Zhongxuan Luo
Abstract:
Conventional deformable registration methods aim at solving an optimization
model carefully designed on image pairs and their computational costs are
exceptionally high. In contrast, recent deep learning based approaches can
provide fast deformation estimation. These heuristic network architectures are
fully data-driven and thus lack explicit geometric constraints, e.g.,
topology-preserving, which are indispensable to generate plausible
deformations. We design a new deep learning based framework to optimize a
diffeomorphic model via multi-scale propagation in order to int… >>>
Conventional deformable registration methods aim at solving an optimization<br>model carefully designed on image pairs and their computational costs are<br>exceptionally high. In contrast, recent deep learning based approaches can<br>provide fast deformation estimation. These heuristic network architectures are<br>fully data-driven and thus lack explicit geometric constraints, e.g.,<br>topology-preserving, which are indispensable to generate plausible<br>deformations. We design a new deep learning based framework to optimize a<br>diffeomorphic model via multi-scale propagation in order to integrate<br>advantages and avoid limitations of these two categories of approaches.<br>Specifically, we introduce a generic optimization model to formulate<br>diffeomorphic registration and develop a series of learnable architectures to<br>obtain propagative updating in the coarse-to-fine feature space. Moreover, we<br>propose a novel bilevel self-tuned training strategy, allowing efficient search<br>of task-specific hyper-parameters. This training strategy increases the<br>flexibility to various types of data while reduces computational and human<br>burdens. We conduct two groups of image registration experiments on 3D volume<br>datasets including image-to-atlas registration on brain MRI data and<br>image-to-image registration on liver CT data. Extensive results demonstrate the<br>state-of-the-art performance of the proposed method with diffeomorphic<br>guarantee and extreme efficiency. We also apply our framework to challenging<br>multi-modal image registration, and investigate how our registration to support<br>the down-streaming tasks for medical image analysis including multi-modal<br>fusion and image segmentation. <<<
68.
张浩彬 (2024-06-30 10:34):
@paper https://doi.org/10.48550/arXiv.2403.10131 RAFT: Adapting Language Model to Domain Specific RAG 对我而言很有启发性的paper。在大型文本数据集上预训练大型语言模型(LLMs)已成为一种标准范式。当将这些LLMs用于许多下游应用时,通常会将新的知识(例如,时效性新闻或私有领域知识)通过基于RAG(Retrieval-Augmented Generation,检索增强生成)的提示或微调,融入到预训练模型中。然而,模型如何以最优方式获取这种新知识仍然是一个开放的问题。在这篇论文中,提出了检索增强微调(Retrieval Augmented Fine Tuning,RAFT),简单来说,就是你要用rag的东西微调一下,并使用思维链熟悉一下要做的事情。当然,rag本身和微调就是两个套路,现在合在一起,似乎有点本末倒置,这也是这篇论文我认为没有讨论清楚的地方。不过这些不清楚的地方也是新的研究空间。
Tianjun Zhang, Shishir G. Patil, Naman Jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez
Abstract:
Pretraining Large Language Models (LLMs) on large corpora of textual data is
now a standard paradigm. When using these LLMs for many downstream
applications, it is common to additionally bake in new knowledge (e.g.,
time-critical news, or private domain knowledge) into the pretrained model
either through RAG-based-prompting, or fine-tuning. However, the optimal
methodology for the model to gain such new knowledge remains an open question.
In this paper, we present Retrieval Augmented FineTuning (RAFT), a training
recipe that improves the model's ability to answer question… >>>
Pretraining Large Language Models (LLMs) on large corpora of textual data is<br>now a standard paradigm. When using these LLMs for many downstream<br>applications, it is common to additionally bake in new knowledge (e.g.,<br>time-critical news, or private domain knowledge) into the pretrained model<br>either through RAG-based-prompting, or fine-tuning. However, the optimal<br>methodology for the model to gain such new knowledge remains an open question.<br>In this paper, we present Retrieval Augmented FineTuning (RAFT), a training<br>recipe that improves the model's ability to answer questions in a "open-book"<br>in-domain settings. In RAFT, given a question, and a set of retrieved<br>documents, we train the model to ignore those documents that don't help in<br>answering the question, which we call, distractor documents. RAFT accomplishes<br>this by citing verbatim the right sequence from the relevant document that<br>would help answer the question. This coupled with RAFT's chain-of-thought-style<br>response helps improve the model's ability to reason. In domain-specific RAG,<br>RAFT consistently improves the model's performance across PubMed, HotpotQA, and<br>Gorilla datasets, presenting a post-training recipe to improve pre-trained LLMs<br>to in-domain RAG. RAFT's code and demo are open-sourced at<br>github.com/ShishirPatil/gorilla. <<<
69.
张浩彬 (2024-05-31 07:31):
#paper doi:https://doi.org/10.48550/arXiv.2403.10131 RAFT: Adapting Language Model to Domain Specific RAG 简单但有效的思路。传统大模型变为领域 应用,我们可以微调也可以使用rag,但微软说,我们可以应该基于rag微调。RAFT 是一种将预训练的大型语言模型微调到特定领域 RAG 设置的通用方法。在特定领域 RAG 中,模型需要根据特定领域的一组文档回答问题,例如企业中的私有文件。这与通用 RAG 不同,因为通用 RAG 中的模型并不知道它将在哪个领域进行测试。简单来说,微调是闭卷考试,靠记忆回答。rag是开卷开始,虽然我没记忆,但是考试的时候可以翻书,那么raft就是开卷考试前,我还是先看了一下教科书,虽然没看全,但是大概知道考题长什么样子,但没关系,因为考试的时候我还可以翻书。
70.
尹志 (2024-05-30 15:52):
#paper  Protein Conformation Generation via Force-Guided SE(3) Diffusion Models  https://doi.org/10.48550/arXiv.2403.14088 字节跳动的一个新工作,还是蛋白质构象生成,还是SE(3) diffusion model, 不过区别于常见的静态构象的生成,这个工作提出了动态构象的生成, 这当然有意义的多,毕竟真实世界的蛋白质构象是动态的,是一个构象分布。文章引入物理信息作为guidance,这个思路很有意思,因为这样既可以 兼顾物理系统的先验,又回避了类似md这样的纯模型计算的性能问题,类似将md的计算进行了抽象,形成先验,作为guidance,然后利用生成模型进行生成。
Yan Wang, Lihao Wang, Yuning Shen, Yiqun Wang, Huizhuo Yuan, Yue Wu, Quanquan Gu
Abstract:
The conformational landscape of proteins is crucial to understanding their
functionality in complex biological processes. Traditional physics-based
computational methods, such as molecular dynamics (MD) simulations, suffer from
rare event sampling and long equilibration time problems, hindering their
applications in general protein systems. Recently, deep generative modeling
techniques, especially diffusion models, have been employed to generate novel
protein conformations. However, existing score-based diffusion methods cannot
properly incorporate important physical prio… >>>
The conformational landscape of proteins is crucial to understanding their<br>functionality in complex biological processes. Traditional physics-based<br>computational methods, such as molecular dynamics (MD) simulations, suffer from<br>rare event sampling and long equilibration time problems, hindering their<br>applications in general protein systems. Recently, deep generative modeling<br>techniques, especially diffusion models, have been employed to generate novel<br>protein conformations. However, existing score-based diffusion methods cannot<br>properly incorporate important physical prior knowledge to guide the generation<br>process, causing large deviations in the sampled protein conformations from the<br>equilibrium distribution. In this paper, to overcome these limitations, we<br>propose a force-guided SE(3) diffusion model, ConfDiff, for protein<br>conformation generation. By incorporating a force-guided network with a mixture<br>of data-based score models, ConfDiff can can generate protein conformations<br>with rich diversity while preserving high fidelity. Experiments on a variety of<br>protein conformation prediction tasks, including 12 fast-folding proteins and<br>the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our method<br>surpasses the state-of-the-art method. <<<
71.
尹志 (2024-04-30 22:48):
#paper doi:https://doi.org/10.48550/arXiv.2211.07697,NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations, 2022. Do Neural Networks Trained with Topological Features Learn Different Internal Representations? 作者主要讨论了使用拓扑特征训练神经网络和使用常规数据直接进行神经网络训练在表征上的区别。结论很有意思,比较容易猜到的是,两者确实有区别,特别是在作者选择的metrics下,这也说明了拓扑机器学习的价值。但作者发现在一些情况下,也存在可以利用简单的表征来替代拓扑特征训练的模型。当然,在具体的数据场景下怎么样提取出合适的拓扑特征显著区别于使用raw data可以提取的特征,这仍是一个开放的主题。
Sarah McGuire, Shane Jackson, Tegan Emerson, Henry Kvinge
Abstract:
There is a growing body of work that leverages features extracted via
topological data analysis to train machine learning models. While this field,
sometimes known as topological machine learning (TML), has seen some notable
successes, an understanding of how the process of learning from topological
features differs from the process of learning from raw data is still limited.
In this work, we begin to address one component of this larger issue by asking
whether a model trained with topological features learns internal
representations of data that are fundamentally differe… >>>
There is a growing body of work that leverages features extracted via<br>topological data analysis to train machine learning models. While this field,<br>sometimes known as topological machine learning (TML), has seen some notable<br>successes, an understanding of how the process of learning from topological<br>features differs from the process of learning from raw data is still limited.<br>In this work, we begin to address one component of this larger issue by asking<br>whether a model trained with topological features learns internal<br>representations of data that are fundamentally different than those learned by<br>a model trained with the original raw data. To quantify ``different'', we<br>exploit two popular metrics that can be used to measure the similarity of the<br>hidden representations of data within neural networks, neural stitching and<br>centered kernel alignment. From these we draw a range of conclusions about how<br>training with topological features does and does not change the representations<br>that a model learns. Perhaps unsurprisingly, we find that structurally, the<br>hidden representations of models trained and evaluated on topological features<br>differ substantially compared to those trained and evaluated on the<br>corresponding raw data. On the other hand, our experiments show that in some<br>cases, these representations can be reconciled (at least to the degree required<br>to solve the corresponding task) using a simple affine transformation. We<br>conjecture that this means that neural networks trained on raw data may extract<br>some limited topological features in the process of making predictions. <<<
72.
前进 (2024-04-30 11:44):
#paper Han D, Pan X, Han Y, et al. Flatten transformer: Vision transformer using focused linear attention[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 5961-5971. 自注意力(self-attention)在计算机视觉任务中应用时面临的主要挑战是其二次计算复杂度,这使得处理视觉任务变得非常昂贵。作为Softmax注意力的一种替代方案,线性注意力通过精心设计的映射函数来近似Softmax操作,从而将计算复杂度从二次降低到线性。尽管线性注意力在理论上更加高效,但现有的线性注意力方法要么性能显著下降,要么需要额外的计算开销,这限制了它们的实际应用。为了克服这些限制,论文提出了FLA模块,它通过两个主要的改进来提高效率和表达能力:焦点能力:1 通过一个简单的映射函数,增强了自注意力对最信息特征的聚焦能力。特征多样性:引入了一个高效的秩恢复模块,通过深度卷积(DWC)来恢复注意力矩阵的秩,增加了特征的多样性。通过在多个先进的视觉Transformer模型上的广泛实验,FLA模块在多个基准测试中均显示出了一致的性能提升。
73.
张浩彬 (2024-04-29 20:35):
#paper doi: https://doi.org/10.48550/arXiv.2211.14730 A Time Series is Worth 64 Words: Long-term Forecasting with Transformers ICLR2023的文章,提出了PatchTST。受vision Transformer的启发,把patch技术引入到时序问题。并且回应了早期另一篇认为Transformer用在时间序列其实并不比传统线性模型好的文章(Are transformers effective for time series forecasting?(2022)),重新取得了sota。然而23年底,又有新方法出现了,讨论了其实关键不是transformer,而是patch技术
Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam
Abstract:
We propose an efficient design of Transformer-based models for multivariate
time series forecasting and self-supervised representation learning. It is
based on two key components: (i) segmentation of time series into
subseries-level patches which are served as input tokens to Transformer; (ii)
channel-independence where each channel contains a single univariate time
series that shares the same embedding and Transformer weights across all the
series. Patching design naturally has three-fold benefit: local semantic
information is retained in the embedding; computation and m… >>>
We propose an efficient design of Transformer-based models for multivariate<br>time series forecasting and self-supervised representation learning. It is<br>based on two key components: (i) segmentation of time series into<br>subseries-level patches which are served as input tokens to Transformer; (ii)<br>channel-independence where each channel contains a single univariate time<br>series that shares the same embedding and Transformer weights across all the<br>series. Patching design naturally has three-fold benefit: local semantic<br>information is retained in the embedding; computation and memory usage of the<br>attention maps are quadratically reduced given the same look-back window; and<br>the model can attend longer history. Our channel-independent patch time series<br>Transformer (PatchTST) can improve the long-term forecasting accuracy<br>significantly when compared with that of SOTA Transformer-based models. We also<br>apply our model to self-supervised pre-training tasks and attain excellent<br>fine-tuning performance, which outperforms supervised training on large<br>datasets. Transferring of masked pre-trained representation on one dataset to<br>others also produces SOTA forecasting accuracy. Code is available at:<br>https://github.com/yuqinie98/PatchTST. <<<
74.
林海onrush (2024-04-02 00:39):
#paper, Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series, doi:https://doi.org/10.48550/arXiv.2311.13326,这篇论文针对金融时间序列的无模型控制问题,提出了一种新颖的解决思路。传统的强化学习方法在这一领域面临训练数据有限且噪声大的挑战。为此,本文探索了将课程学习和模仿学习这两种在机器人领域已有成功应用的范式引入到金融问题中。通过在两个代表性的数据集上的大量实证实验,论文发现课程学习能够显著提升强化学习算法在复杂金融时间序列决策中的表现,优于所有baseline方法。课程学习通过数据增强逐步提高训练任务的难度,体现了 "由易到难" 的学习策略。实验表明,这种适度的数据平滑可以有效降低数据中的噪声,使得强化学习算法更好地捕捉到真实的市场信号。 相比之下,直接应用模仿学习的效果并不理想。进一步的分析表明,这可能是由于模仿学习在去除噪声的同时,也丢失了部分关键的市场信号。从统计学的角度看,模仿学习实现了噪声和信号的分解,但过度的去噪反而损害了策略学习的效果。 本文的理论贡献在于提出了一个信号噪声分解的统计框架,用于解释课程学习和模仿学习在金融时间序列问题上的效果差异。这一框架也为算法的改进提供了新的思路。此外,论文还讨论了一些有待未来进一步探索的方向,包括考察信号噪声分解的非平稳特性,探索其他形式的数据平滑方法,以及将课程学习拓展应用到其他类型的高噪声时间序列学习任务中。
Woosung Koh, Insu Choi, Yuntae Jang, Gimin Kang, Woo Chang Kim
Abstract:
Curriculum learning and imitation learning have been leveraged extensively in
the robotics domain. However, minimal research has been done on leveraging
these ideas on control tasks over highly stochastic time-series data. Here, we
theoretically and empirically explore these approaches in a representative
control task over complex time-series data. We implement the fundamental ideas
of curriculum learning via data augmentation, while imitation learning is
implemented via policy distillation from an oracle. Our findings reveal that
curriculum learning should be considered … >>>
Curriculum learning and imitation learning have been leveraged extensively in<br>the robotics domain. However, minimal research has been done on leveraging<br>these ideas on control tasks over highly stochastic time-series data. Here, we<br>theoretically and empirically explore these approaches in a representative<br>control task over complex time-series data. We implement the fundamental ideas<br>of curriculum learning via data augmentation, while imitation learning is<br>implemented via policy distillation from an oracle. Our findings reveal that<br>curriculum learning should be considered a novel direction in improving<br>control-task performance over complex time-series. Our ample random-seed<br>out-sample empirics and ablation studies are highly encouraging for curriculum<br>learning for time-series control. These findings are especially encouraging as<br>we tune all overlapping hyperparameters on the baseline -- giving an advantage<br>to the baseline. On the other hand, we find that imitation learning should be<br>used with caution. <<<
75.
符毓 (2024-03-31 23:50):
#paper doi.org/10.48550/arXiv.2403.16527, 2024, Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art. 智能控制系统能通过预训练在各场景下得到广泛应用,但在训练外场景下表现糟糕。大模型出现有希望提供现有训练方式缺乏的推理能力,但大模型会产生“幻觉”(听起来合理但很差的决策)。本文尝试定义“幻觉”,并给出检测和缓解规划中出现“幻觉”的方法分类,评估指标和数据集等
Neeloy Chakraborty, Melkior Ornik, Katherine Driggs-Campbell
Abstract:
Autonomous systems are soon to be ubiquitous, from manufacturing autonomy to
agricultural field robots, and from health care assistants to the entertainment
industry. The majority of these systems are developed with modular
sub-components for decision-making, planning, and control that may be
hand-engineered or learning-based. While these existing approaches have been
shown to perform well under the situations they were specifically designed for,
they can perform especially poorly in rare, out-of-distribution scenarios that
will undoubtedly arise at test-time. The rise of… >>>
Autonomous systems are soon to be ubiquitous, from manufacturing autonomy to<br>agricultural field robots, and from health care assistants to the entertainment<br>industry. The majority of these systems are developed with modular<br>sub-components for decision-making, planning, and control that may be<br>hand-engineered or learning-based. While these existing approaches have been<br>shown to perform well under the situations they were specifically designed for,<br>they can perform especially poorly in rare, out-of-distribution scenarios that<br>will undoubtedly arise at test-time. The rise of foundation models trained on<br>multiple tasks with impressively large datasets from a variety of fields has<br>led researchers to believe that these models may provide common sense reasoning<br>that existing planners are missing. Researchers posit that this common sense<br>reasoning will bridge the gap between algorithm development and deployment to<br>out-of-distribution tasks, like how humans adapt to unexpected scenarios. Large<br>language models have already penetrated the robotics and autonomous systems<br>domains as researchers are scrambling to showcase their potential use cases in<br>deployment. While this application direction is very promising empirically,<br>foundation models are known to hallucinate and generate decisions that may<br>sound reasonable, but are in fact poor. We argue there is a need to step back<br>and simultaneously design systems that can quantify the certainty of a model's<br>decision, and detect when it may be hallucinating. In this work, we discuss the<br>current use cases of foundation models for decision-making tasks, provide a<br>general definition for hallucinations with examples, discuss existing<br>approaches to hallucination detection and mitigation with a focus on decision<br>problems, and explore areas for further research in this exciting field. <<<
76.
符毓 (2024-02-29 22:43):
#paper doi.org/10.48550/arXiv.2304.09349 2023, LLM as A Robotic Brain: Unifying Egocentric Memory and Control. LLM 代理通过预训练获得知识和推理能力来解决机器人技术和规划任务。然而,人们在教机器人“该做什么”付出了较多努力。文章重点在于传达机器人不能做什么,以及满足安全操作标准。针对在协作环境中部署LLM代理,提出了解决LLM模型固有的概率性和不能应对复杂条件的约束方式。最终在VirtualHome环境和真实机器人实验上都表明,能在不影响目标完成率的情况下满足安全约束条件
77.
小W (2024-02-29 20:28):
#paper doi:arXiv:2203.13906 Biolink Model: A Universal Schema for Knowledge Graphs in Clinical, Biomedical, and Translational Science 本文介绍了欧洲分子生物学实验室对于生命进程的认识 Biolink 模型,其使用yaml变体 linkml ( Linked data Modeling Language )定义一组分层的、相互关联的类以及它们之间的关系,以此来表征转化科学中的实体以及这些实体之间的联系。其工作包含标准生物模式、样本、TranslatorMinimal三个模型库以及使用其模型关联不同本体数据的方法。基于此模型,其他团队开发了NIH 的Biomedical Data Translator项目,以及 2023 发表于 Nat. Biotechnol 的 BioCypher 。
Deepak R. Unni, Sierra A. T. Moxon, Michael Bada, Matthew Brush, Richard Bruskiewich, Paul Clemons, Vlado Dancik, Michel Dumontier, Karamarie Fecho, Gustavo Glusman ... >>>
Deepak R. Unni, Sierra A. T. Moxon, Michael Bada, Matthew Brush, Richard Bruskiewich, Paul Clemons, Vlado Dancik, Michel Dumontier, Karamarie Fecho, Gustavo Glusman, Jennifer J. Hadlock, Nomi L. Harris, Arpita Joshi, Tim Putman, Guangrong Qin, Stephen A. Ramsey, Kent A. Shefchek, Harold Solbrig, Karthik Soman, Anne T. Thessen, Melissa A. Haendel, Chris Bizon, Christopher J. Mungall, the Biomedical Data Translator Consortium <<<
Abstract:
Within clinical, biomedical, and translational science, an increasing number
of projects are adopting graphs for knowledge representation. Graph-based data
models elucidate the interconnectedness between core biomedical concepts,
enable data structures to be easily updated, and support intuitive queries,
visualizations, and inference algorithms. However, knowledge discovery across
these "knowledge graphs" (KGs) has remained difficult. Data set heterogeneity
and complexity; the proliferation of ad hoc data formats; poor compliance with
guidelines on findability, accessibil… >>>
Within clinical, biomedical, and translational science, an increasing number<br>of projects are adopting graphs for knowledge representation. Graph-based data<br>models elucidate the interconnectedness between core biomedical concepts,<br>enable data structures to be easily updated, and support intuitive queries,<br>visualizations, and inference algorithms. However, knowledge discovery across<br>these "knowledge graphs" (KGs) has remained difficult. Data set heterogeneity<br>and complexity; the proliferation of ad hoc data formats; poor compliance with<br>guidelines on findability, accessibility, interoperability, and reusability;<br>and, in particular, the lack of a universally-accepted, open-access model for<br>standardization across biomedical KGs has left the task of reconciling data<br>sources to downstream consumers. Biolink Model is an open source data model<br>that can be used to formalize the relationships between data structures in<br>translational science. It incorporates object-oriented classification and<br>graph-oriented features. The core of the model is a set of hierarchical,<br>interconnected classes (or categories) and relationships between them (or<br>predicates), representing biomedical entities such as gene, disease, chemical,<br>anatomical structure, and phenotype. The model provides class and edge<br>attributes and associations that guide how entities should relate to one<br>another. Here, we highlight the need for a standardized data model for KGs,<br>describe Biolink Model, and compare it with other models. We demonstrate the<br>utility of Biolink Model in various initiatives, including the Biomedical Data<br>Translator Consortium and the Monarch Initiative, and show how it has supported<br>easier integration and interoperability of biomedical KGs, bringing together<br>knowledge from multiple sources and helping to realize the goals of<br>translational science. <<<
78.
🐼太真实 (2024-02-29 10:04):
#paper ProPainter: Improving Propagation and Transformer for Video Inpainting 本文介绍了一种新的视频修复技术——ProPainter,通过双域传播和掩码引导稀疏视频Transformer的设计,实现了高效而准确的视频修复。文章详细介绍了ProPainter的三个关键组成部分:循环流场完成、双域传播和掩码引导稀疏视频Transformer,并提供了相应的技术细节和实验结果。
Shangchen Zhou, Chongyi Li, Kelvin C. K. Chan, Chen Change Loy
Abstract:
Flow-based propagation and spatiotemporal Transformer are two mainstream
mechanisms in video inpainting (VI). Despite the effectiveness of these
components, they still suffer from some limitations that affect their
performance. Previous propagation-based approaches are performed separately
either in the image or feature domain. Global image propagation isolated from
learning may cause spatial misalignment due to inaccurate optical flow.
Moreover, memory or computational constraints limit the temporal range of
feature propagation and video Transformer, preventing explorati… >>>
Flow-based propagation and spatiotemporal Transformer are two mainstream<br>mechanisms in video inpainting (VI). Despite the effectiveness of these<br>components, they still suffer from some limitations that affect their<br>performance. Previous propagation-based approaches are performed separately<br>either in the image or feature domain. Global image propagation isolated from<br>learning may cause spatial misalignment due to inaccurate optical flow.<br>Moreover, memory or computational constraints limit the temporal range of<br>feature propagation and video Transformer, preventing exploration of<br>correspondence information from distant frames. To address these issues, we<br>propose an improved framework, called ProPainter, which involves enhanced<br>ProPagation and an efficient Transformer. Specifically, we introduce<br>dual-domain propagation that combines the advantages of image and feature<br>warping, exploiting global correspondences reliably. We also propose a<br>mask-guided sparse video Transformer, which achieves high efficiency by<br>discarding unnecessary and redundant tokens. With these components, ProPainter<br>outperforms prior arts by a large margin of 1.46 dB in PSNR while maintaining<br>appealing efficiency. <<<
79.
尹志 (2024-02-28 22:09):
#paper An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists doi: https://doi.org/10.48550/arXiv.1710.04019 生成式AI风光无两,Sora甚嚣尘上,虽然我还做不到这样的效果(对,我就是酸),但我却认为这不是终极方案,特别是对于物理世界、生物系统。The Bitter Lesson中对scaling law的强调甚至信奉,在语言、视频这样的领域有其价值,但生命科学、物理系统有数十亿年的的历史(物理系统应该是创始之初把),生命的演化、物理系统的本源,人类对其千百年来积累的原理性探索,应该是更优的先验。哦,回到这篇paper的主题。拓扑数据分析,是一种将系统的拓扑与几何性质引入分析建模过程,从而对系统获取更深刻的理解的工具。本篇综述对这个工具做了细致的讲解并对它的应用领域做了分析和tutorial。对拓扑数据分析这门技术的数学前置也做了简单但细致的介绍,主要是代数拓扑和计算几何。之所以有前面一段的碎碎念,就是因为我结合最近的一些实践,切实感受到拓扑和几何这些抽象的数学工具与生成式AI的结合,对生物系统和物理世界的描述,也许是优于目前暴力怼计算的一种更高效的建模方式,能够更深入系统的物理本质。如果你也相信物理系统和生命世界的简单高效的,是美丽简洁的,建议尝试一下这些新的技术。对了,这篇综述的revison信息是[Submitted on 11 Oct 2017 (v1), last revised 25 Feb 2021 (this version, v2)], 是不是说明了点什么呢?
Frédéric Chazal, Bertrand Michel
Abstract:
Topological Data Analysis is a recent and fast growing field providing a set<br>of new topological and geometric tools to infer relevant features for possibly<br>complex data. This paper is a brief introduction, through a few selected<br>topics, to basic fundamental and practical aspects of \tda\ for non experts.
80.
前进 (2024-01-31 22:50):
#paper arxiv.org//pdf/2311.026 2023 Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection. 大型多模态模型 (LMM) GPT-4V(ision) 赋予 GPT-4 视觉grounding能力,使得通过视觉问答 (VQA) 范式处理某些任务成为可能。本文探讨了面向 VQA 的 GPT-4V 在最近流行的视觉异常检测(AD)中的潜力,并首次对流行的 MVTec AD 和 VisA 数据集进行定性和定量评估。 考虑到该任务需要图像/像素级评估,提出的 GPT-4V-AD 框架包含三个组成部分:1)粒度区域划分,2)提示设计,3)用于轻松定量评估的 Text2Segmentation,并做了一些不同的 尝试进行比较分析。 结果表明,GPT-4V可以通过VQA范式在零样本AD任务中取得一定的结果,例如在MVTec AD和VisA数据集上分别实现图像级77.1/88.0和像素级68.0/76.6 AU-ROC 。 然而,其性能与最先进的零样本方法(例如WinCLIP和CLIP-AD)相比仍然存在一定差距,需要进一步研究。 这项研究为零样本 AD 任务中面向 VQA 的 LMM 的研究提供了基线参考
Jiangning Zhang, Xuhai Chen, Zhucun Xue, Yabiao Wang, Chengjie Wang, Yong Liu
Abstract:
Large Multimodal Model (LMM) GPT-4V(ision) endows GPT-4 with visual grounding
capabilities, making it possible to handle certain tasks through the Visual
Question Answering (VQA) paradigm. This paper explores the potential of
VQA-oriented GPT-4V in the recently popular visual Anomaly Detection (AD) and
is the first to conduct qualitative and quantitative evaluations on the popular
MVTec AD and VisA datasets. Considering that this task requires both
image-/pixel-level evaluations, the proposed GPT-4V-AD framework contains three
components: 1) Granular Region Division, 2) P… >>>
Large Multimodal Model (LMM) GPT-4V(ision) endows GPT-4 with visual grounding<br>capabilities, making it possible to handle certain tasks through the Visual<br>Question Answering (VQA) paradigm. This paper explores the potential of<br>VQA-oriented GPT-4V in the recently popular visual Anomaly Detection (AD) and<br>is the first to conduct qualitative and quantitative evaluations on the popular<br>MVTec AD and VisA datasets. Considering that this task requires both<br>image-/pixel-level evaluations, the proposed GPT-4V-AD framework contains three<br>components: 1) Granular Region Division, 2) Prompt Designing, 3)<br>Text2Segmentation for easy quantitative evaluation, and have made some<br>different attempts for comparative analysis. The results show that GPT-4V can<br>achieve certain results in the zero-shot AD task through a VQA paradigm, such<br>as achieving image-level 77.1/88.0 and pixel-level 68.0/76.6 AU-ROCs on MVTec<br>AD and VisA datasets, respectively. However, its performance still has a<br>certain gap compared to the state-of-the-art zero-shot method, e.g., WinCLIP<br>ann CLIP-AD, and further research is needed. This study provides a baseline<br>reference for the research of VQA-oriented LMM in the zero-shot AD task, and we<br>also post several possible future works. Code is available at<br>\url{https://github.com/zhangzjn/GPT-4V-AD}. <<<
回到顶部