来自杂志 arXiv 的文献。
当前共找到 127 篇文献分享,本页显示第 1 - 20 篇。
1.
Vincent
(2025-03-31 16:09):
#paper doi: https://doi.org/10.48550/arXiv.2503.00096 BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology 大语言模型在加速科学发现方面展现出了重要潜力。目前大语言模型智能体在生物信息领域的应用缺乏系统评估,这篇文章整理了近50个真实场景,约300个开放性问题来衡量基于大语言模型的智能体在解决复杂生信问题的能力,作者测试了两个前沿大语言模型(gpt 4o和claude 3.5 sonnet),发现这些模型在回答开放性问题的准确率都较低,回答多选问题的能力也并不比随机选择策略好。这篇文章的贡献在于提供了测试用例与评估框架,为更搭建性能更好的智能体打下了基础
arXiv,
2025-02-28T18:47:57Z.
DOI: 10.48550/arXiv.2503.00096
Abstract:
Large Language Models (LLMs) and LLM-based agents show great promise inaccelerating scientific research. Existing benchmarks for measuring thispotential and guiding future development continue to evolve from pure recalland rote knowledge …
>>>
Large Language Models (LLMs) and LLM-based agents show great promise inaccelerating scientific research. Existing benchmarks for measuring thispotential and guiding future development continue to evolve from pure recalland rote knowledge tasks, towards more practical work such as literature reviewand experimental planning. Bioinformatics is a domain where fully autonomousAI-driven discovery may be near, but no extensive benchmarks for measuringprogress have been introduced to date. We therefore present the BioinformaticsBenchmark (BixBench), a dataset comprising over 50 real-world scenarios ofpractical biological data analysis with nearly 300 associated open-answerquestions designed to measure the ability of LLM-based agents to explorebiological datasets, perform long, multi-step analytical trajectories, andinterpret the nuanced results of those analyses. We evaluate the performance oftwo frontier LLMs (GPT-4o and Claude 3.5 Sonnet) using a custom agent frameworkwe open source. We find that even the latest frontier models only achieve 17%accuracy in the open-answer regime, and no better than random in amultiple-choice setting. By exposing the current limitations of frontiermodels, we hope BixBench can spur the development of agents capable ofconducting rigorous bioinformatic analysis and accelerate scientific discovery.
<<<
翻译
2.
尹志
(2025-03-31 15:06):
#paper:doi:doi.org/10.48550/arXiv.2502.11974, Image Inversion: A Survey from GANs to Diffusion and Beyond(2025).
综述了image inversion常见的算法模型,很新,主要介绍了GAN和diffusion模型,也提了DiT和Rectified Flow框架。image inversion的核心问题涉及latent space, 对其它生成式AI的问题都非常重要。
arXiv,
2025-02-17T16:20:48Z.
DOI: 10.48550/arXiv.2502.11974
Abstract:
Image inversion is a fundamental task in generative models, aiming to mapimages back to their latent representations to enable downstream applicationssuch as editing, restoration, and style transfer. This paper provides …
>>>
Image inversion is a fundamental task in generative models, aiming to mapimages back to their latent representations to enable downstream applicationssuch as editing, restoration, and style transfer. This paper provides acomprehensive review of the latest advancements in image inversion techniques,focusing on two main paradigms: Generative Adversarial Network (GAN) inversionand diffusion model inversion. We categorize these techniques based on theiroptimization methods. For GAN inversion, we systematically classify existingmethods into encoder-based approaches, latent optimization approaches, andhybrid approaches, analyzing their theoretical foundations, technicalinnovations, and practical trade-offs. For diffusion model inversion, weexplore training-free strategies, fine-tuning methods, and the design ofadditional trainable modules, highlighting their unique advantages andlimitations. Additionally, we discuss several popular downstream applicationsand emerging applications beyond image tasks, identifying current challengesand future research directions. By synthesizing the latest developments, thispaper aims to provide researchers and practitioners with a valuable referenceresource, promoting further advancements in the field of image inversion. Wekeep track of the latest works at https://github.com/RyanChenYN/ImageInversion
<<<
翻译
3.
Kunji
(2025-02-28 23:59):
#paper, https://arxiv.org/pdf/2410.05273, HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers, VLA依赖于数十亿参数的VLM,虽然具有强大的泛化能力,但计算成本高、推理速度慢,限制了其在动态任务中的应用。为了解决这些局限性,文章提出了HiRT框架(Hierarchical Robot Transformer framework),借鉴了人类认知的双过程理论,采用双系统架构和异步操作机制,实现频率与性能之间的平衡。在模拟和真实环境中的实验结果表明,HiRT取得了显著的改进。在静态任务中,控制频率提高了一倍,并实现了相当的成功率。此外,在之前VLA模型难以应对的真实世界动态操作任务中,HiRT将成功率从48%提高到了75%。
arXiv,
2024-09-12T09:18:09Z.
DOI: 10.48550/arXiv.2410.05273
Abstract:
Large Vision-Language-Action (VLA) models, leveraging powerful pre trainedVision-Language Models (VLMs) backends, have shown promise in robotic controldue to their impressive generalization ability. However, the success comes at acost. Their reliance …
>>>
Large Vision-Language-Action (VLA) models, leveraging powerful pre trainedVision-Language Models (VLMs) backends, have shown promise in robotic controldue to their impressive generalization ability. However, the success comes at acost. Their reliance on VLM backends with billions of parameters leads to highcomputational costs and inference latency, limiting the testing scenarios tomainly quasi-static tasks and hindering performance in dynamic tasks requiringrapid interactions. To address these limitations, this paper proposes HiRT, aHierarchical Robot Transformer framework that enables flexible frequency andperformance trade-off. HiRT keeps VLMs running at low frequencies to capturetemporarily invariant features while enabling real-time interaction through ahigh-frequency vision-based policy guided by the slowly updated features.Experiment results in both simulation and real-world settings demonstratesignificant improvements over baseline methods. Empirically, in static tasks,we double the control frequency and achieve comparable success rates.Additionally, on novel real-world dynamic ma nipulation tasks which arechallenging for previous VLA models, HiRT improves the success rate from 48% to75%.
<<<
翻译
4.
符毓 Yu
(2025-02-28 23:00):
#paper doi.org/10.48550/arXiv.2411.13677, 2024, Bimanual Dexterity for Complex Tasks. 遥操作是机器人获取数据的重要方式。文章介绍了一种便携、低成本(总成本约12k美元,其中5k的手,7k的系统;可额外配合双机械臂16k)且极其精确的双手人形机器人手臂系统遥操作方法,展示了该系统在桌面和移动环境中的适用性,并展示了它在执行双手灵巧任务时相较于其他方法(如 SteamVR 和 Vision Pro等)的高效性。但由于缺乏触觉反馈,操作员只能依赖视觉反馈进行遥操作,无法感知机器人手臂的感觉
arXiv,
2024-11-20T19:53:35Z.
DOI: 10.48550/arXiv.2411.13677
Abstract:
To train generalist robot policies, machine learning methods often require asubstantial amount of expert human teleoperation data. An ideal robot forhumans collecting data is one that closely mimics them: bimanual …
>>>
To train generalist robot policies, machine learning methods often require asubstantial amount of expert human teleoperation data. An ideal robot forhumans collecting data is one that closely mimics them: bimanual arms anddexterous hands. However, creating such a bimanual teleoperation system withover 50 DoF is a significant challenge. To address this, we introduce Bidex, anextremely dexterous, low-cost, low-latency and portable bimanual dexterousteleoperation system which relies on motion capture gloves and teacher arms. Wecompare Bidex to a Vision Pro teleoperation system and a SteamVR system andfind Bidex to produce better quality data for more complex tasks at a fasterrate. Additionally, we show Bidex operating a mobile bimanual robot for in thewild tasks. The robot hands (5k USD) and teleoperation system (7k USD) isreadily reproducible and can be used on many robot arms including two xArms(16k USD). Website at https://bidex-teleop.github.io/
<<<
翻译
5.
尹志
(2025-02-28 15:55):
#paper doi:10.48550/arXiv.2205.15463 Few-Shot Diffusion Models. 文章提出了一种扩散模型及set-based ViT的方式实现few shot生成的技术。实验表明,该模型仅需5个样本就可以完成新类别的生成。
arXiv,
2022-05-30T23:20:33Z.
DOI: 10.48550/arXiv.2205.15463
Abstract:
Denoising diffusion probabilistic models (DDPM) are powerful hierarchicallatent variable models with remarkable sample generation quality and trainingstability. These properties can be attributed to parameter sharing in thegenerative hierarchy, as well …
>>>
Denoising diffusion probabilistic models (DDPM) are powerful hierarchicallatent variable models with remarkable sample generation quality and trainingstability. These properties can be attributed to parameter sharing in thegenerative hierarchy, as well as a parameter-free diffusion-based inferenceprocedure. In this paper, we present Few-Shot Diffusion Models (FSDM), aframework for few-shot generation leveraging conditional DDPMs. FSDMs aretrained to adapt the generative process conditioned on a small set of imagesfrom a given class by aggregating image patch information using a set-basedVision Transformer (ViT). At test time, the model is able to generate samplesfrom previously unseen classes conditioned on as few as 5 samples from thatclass. We empirically show that FSDM can perform few-shot generation andtransfer to new datasets. We benchmark variants of our method on complex visiondatasets for few-shot learning and compare to unconditional and conditionalDDPM baselines. Additionally, we show how conditioning the model on patch-basedinput set information improves training convergence.
<<<
翻译
6.
刘昊辰
(2025-02-25 22:38):
#paper Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks. 这是一篇关于如何使用强化学习(Reinforcement Learning)和循环神经网络(Recurrent Neural Networks, RNN)来玩六角格战棋游戏(Hex and Counter Wargames)的研究论文。论文提出一种结合AlphaZero强化学习算法和循环神经网络的新系统,以应对六角格战棋游戏的战略复杂性。该系统能够在不同地形和战术情况下进行泛化,并探索其在更大地图上的扩展能力。提出的系统在有限的训练资源和计算能力下,能够在复杂的六角格战棋游戏中取得良好的表现,展示了其在复杂场景中的泛化能力。下载地址:https://arxiv.org/abs/2502.13918
arXiv,
2025-02-19T17:52:45Z.
DOI: 10.48550/arXiv.2502.13918
Abstract:
Hex and Counter Wargames are adversarial two-player simulations of realmilitary conflicts requiring complex strategic decision-making. Unlikeclassical board games, these games feature intricate terrain/unit interactions,unit stacking, large maps of varying sizes, …
>>>
Hex and Counter Wargames are adversarial two-player simulations of realmilitary conflicts requiring complex strategic decision-making. Unlikeclassical board games, these games feature intricate terrain/unit interactions,unit stacking, large maps of varying sizes, and simultaneous move and combatdecisions involving hundreds of units. This paper introduces a novel systemdesigned to address the strategic complexity of Hex and Counter Wargames byintegrating cutting-edge advancements in Recurrent Neural Networks withAlphaZero, a reliable modern Reinforcement Learning algorithm. The systemutilizes a new Neural Network architecture developed from existing research,incorporating innovative state and action representations tailored to thesespecific game environments. With minimal training, our solution has shownpromising results in typical scenarios, demonstrating the ability to generalizeacross different terrain and tactical situations. Additionally, we explore thesystem's potential to scale to larger map sizes. The developed system is openlyaccessible, facilitating continued research and exploration within thischallenging domain.
<<<
翻译
7.
惊鸿
(2025-02-15 00:02):
#paper DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Pub Date : 2024-05-07
DOI : arxiv-2405.04434
我们提出了 DeepSeek-V2,一种强大的专家混合 (MoE) 语言模型,其特点是经济的训练和高效的推理。它总共包括236B个参数,其中每个令牌激活21B个参数,并支持128K令牌的上下文长度。 DeepSeek-V2采用多头潜在注意力(MLA)和DeepSeekMoE等创新架构。 MLA 通过将键值 (KV) 缓存显着压缩为潜在向量来保证高效推理,而 DeepSeekMoE 则可以通过稀疏计算以经济的成本训练强大的模型。与 DeepSeek 67B 相比,DeepSeek-V2 性能显着增强,同时节省了 42.5% 的训练成本,减少了 93.3% 的 KV 缓存,最大生成吞吐量提升至 5.76 倍。我们在由 8.1T 代币组成的高质量多源语料库上对 DeepSeek-V2 进行预训练,并进一步进行监督微调(SFT)和强化学习(RL)以充分释放其潜力。评估结果表明,即使只有21B个激活参数,DeepSeek-V2及其聊天版本仍然达到了开源模型中顶级的性能。模型检查点位于“https://github.com/deepseek-ai/DeepSeek-V2”。
arXiv,
2024-05-07T15:56:43Z.
DOI: 10.48550/arXiv.2405.04434
Abstract:
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language modelcharacterized by economical training and efficient inference. It comprises 236Btotal parameters, of which 21B are activated for each token, and supports acontext …
>>>
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language modelcharacterized by economical training and efficient inference. It comprises 236Btotal parameters, of which 21B are activated for each token, and supports acontext length of 128K tokens. DeepSeek-V2 adopts innovative architecturesincluding Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guaranteesefficient inference through significantly compressing the Key-Value (KV) cacheinto a latent vector, while DeepSeekMoE enables training strong models at aneconomical cost through sparse computation. Compared with DeepSeek 67B,DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximumgeneration throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-qualityand multi-source corpus consisting of 8.1T tokens, and further performSupervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlockits potential. Evaluation results show that, even with only 21B activatedparameters, DeepSeek-V2 and its chat versions still achieve top-tierperformance among open-source models.
<<<
翻译
8.
林海onrush
(2025-01-31 23:53):
#paper, https://doi.org/10.48550/arXiv.2312.01156, Efficient Light Source Placement using Quantum Computing, 这是一个有趣的小问题, 如何利用量子计算解决《我的世界》游戏中的火把放置问题,将形式转化为二次无约束二进制优化(QUBO)问题,通过迭代学习拉格朗日乘子来处理约束条件。实验说明该方法能在合理迭代次数内找到有效的火把放置方案,虽然当前量子硬件存在局限性,经典方法在较大地图上表现更优一些。火把放置问题与集合覆盖问题相联系,展示量子计算在资源优化问题中的价值。
arXiv,
2023-12-02T15:28:59Z.
DOI: 10.48550/arXiv.2312.01156
Abstract:
NP-hard problems regularly come up in video games, with interestingconnections to real-world problems. In the game Minecraft, players placetorches on the ground to light up dark areas. Placing them in …
>>>
NP-hard problems regularly come up in video games, with interestingconnections to real-world problems. In the game Minecraft, players placetorches on the ground to light up dark areas. Placing them in a way thatminimizes the total number of torches to save resources is far from trivial. Inthis paper, we use Quantum Computing to approach this problem. To this end, wederive a QUBO formulation of the torch placement problem, which we uncover tobe very similar to another NP-hard problem. We employ a solution strategy thatinvolves learning Lagrangian weights in an iterative process, adding to theever growing toolbox of QUBO formulations. Finally, we perform experiments onreal quantum hardware using real game data to demonstrate that our approachyields good torch placements.
<<<
翻译
9.
前进
(2025-01-31 22:31):
#paper 10.48550/arxiv.2408.10234 The Unbearable Slowness of Being: Why do we live at 10 bits/s? arXiv:2408.10234v2 [q-bio.NC] Jieyu Zheng, Markus Meiste
论文探讨了人类行为信息处理速度的悖论性缓慢。尽管人类的感官系统能够以每秒约10⁹比特(bits/s)的速度收集信息,但人类的整体信息处理速度却仅为每秒10比特。这种巨大的差异尚未得到充分解释,涉及大脑功能的许多基本方面。通过多种实验和案例,论文展示了人类行为的信息处理速度约为10 bits/s,且这种速度限制可能与大脑的串行处理特性有关。尽管外周神经系统(如视锥细胞和视神经)能够以极高的速率处理信息,但大脑的中枢部分似乎以串行方式处理信息,一次只能专注于一个任务。这种串行处理方式可能是大脑在进化过程中形成的,因为早期神经系统的主要功能是控制运动,而运动决策通常是局部的、单一的。此外,论文还提出大脑可能存在“外脑”和“内脑”两种模式:外脑负责处理高维度的感官输入和运动输出,信息处理速率极高;内脑则负责处理低维度的信息流,用于决策和行为控制,信息处理速率极低(约10 bits/s)。这种内外脑的分工可能是导致信息处理速度受限的重要原因。论文建议未来的研究需要进一步探索大脑内外信息处理的差异,以及如何优化信息处理效率。
arXiv,
2024-08-03T22:56:45Z.
DOI: 10.48550/arXiv.2408.10234
Abstract:
This article is about the neural conundrum behind the slowness of humanbehavior. The information throughput of a human being is about 10 bits/s. Incomparison, our sensory systems gather data at …
>>>
This article is about the neural conundrum behind the slowness of humanbehavior. The information throughput of a human being is about 10 bits/s. Incomparison, our sensory systems gather data at ~10^9 bits/s. The stark contrastbetween these numbers remains unexplained and touches on fundamental aspects ofbrain function: What neural substrate sets this speed limit on the pace of ourexistence? Why does the brain need billions of neurons to process 10 bits/s?Why can we only think about one thing at a time? The brain seems to operate intwo distinct modes: the "outer" brain handles fast high-dimensional sensory andmotor signals, whereas the "inner" brain processes the reduced few bits neededto control behavior. Plausible explanations exist for the large neuron numbersin the outer brain, but not for the inner brain, and we propose new researchdirections to remedy this.
<<<
翻译
10.
尹志
(2025-01-31 17:05):
#paper https://doi.org/10.48550/arXiv.2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
一篇讨论大语言模型使用情况的文章, 特别举了在AI顶会评审中使用的具体例子。(包括ICLR 2024、NeurIPS 2023、CoRL 2023和EMNLP 2023。)研究发现,这些论文review中,有6.5%至16.9%可能被LLM大幅修改,而且这些review有很多有趣的特点,比如confidence比较低,接近ddl才提交,而且不太愿意回应作者反驳等。更多有趣的现象可参考原文。文章中贴了最常见的AI喜欢使用的形容词,比如“commendable”, “meticulous”, and “intricate”等,确实很像AI搞的,哈哈哈。 看来以后审稿人要对作者更加负责才行噢。
arXiv,
2024-03-11T21:51:39Z.
DOI: 10.48550/arXiv.2403.07183
Abstract:
We present an approach for estimating the fraction of text in a large corpuswhich is likely to be substantially modified or produced by a large languagemodel (LLM). Our maximum likelihood …
>>>
We present an approach for estimating the fraction of text in a large corpuswhich is likely to be substantially modified or produced by a large languagemodel (LLM). Our maximum likelihood model leverages expert-written andAI-generated reference texts to accurately and efficiently examine real-worldLLM-use at the corpus level. We apply this approach to a case study ofscientific peer review in AI conferences that took place after the release ofChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggestthat between 6.5% and 16.9% of text submitted as peer reviews to theseconferences could have been substantially modified by LLMs, i.e. beyondspell-checking or minor writing updates. The circumstances in which generatedtext occurs offer insight into user behavior: the estimated fraction ofLLM-generated text is higher in reviews which report lower confidence, weresubmitted close to the deadline, and from reviewers who are less likely torespond to author rebuttals. We also observe corpus-level trends in generatedtext which may be too subtle to detect at the individual level, and discuss theimplications of such trends on peer review. We call for futureinterdisciplinary work to examine how LLM use is changing our information andknowledge practices.
<<<
翻译
11.
Vincent
(2025-01-31 14:05):
#paper https://doi.org/10.48550/arXiv.2111.06377 arxiv. 2021. Masked Autoencoders Are Scalable Vision Learners. Computer vision里很经典的一篇文章,提出了一种简单、快速、有效的模型 Masked autoencoder (MAE)。核心思路是随机遮盖图像区域,然后用模型去复原这些被遮盖的区域。MAE由不对称的编码器和解码器构成,编码器将图像的可见区域编码到隐空间,解码器使用隐空间的数据表征和遮盖符还原原始图片。值得注意的是即使遮盖区域达到75%,还原的图像和原始图像仍然很像,也说明图像里面的信息是十分稀疏的。另外由于编码区域只使用了原始图像的一部分,这使得MAE能大大加速训练的过程,同时得益于自监督学习和更好的表征能力,其在下游任务的预测效果也更好。值得注意的是,这种“预测掩盖区域”的技术在语言模型中早有应用,这篇文章只是将其用在了CV领域,展现了CV也可以用NLP的一些研究思路来推进。
arXiv,
2021-11-11T18:46:40Z.
DOI: 10.48550/arXiv.2111.06377
Abstract:
This paper shows that masked autoencoders (MAE) are scalable self-supervisedlearners for computer vision. Our MAE approach is simple: we mask randompatches of the input image and reconstruct the missing pixels. …
>>>
This paper shows that masked autoencoders (MAE) are scalable self-supervisedlearners for computer vision. Our MAE approach is simple: we mask randompatches of the input image and reconstruct the missing pixels. It is based ontwo core designs. First, we develop an asymmetric encoder-decoder architecture,with an encoder that operates only on the visible subset of patches (withoutmask tokens), along with a lightweight decoder that reconstructs the originalimage from the latent representation and mask tokens. Second, we find thatmasking a high proportion of the input image, e.g., 75%, yields a nontrivialand meaningful self-supervisory task. Coupling these two designs enables us totrain large models efficiently and effectively: we accelerate training (by 3xor more) and improve accuracy. Our scalable approach allows for learninghigh-capacity models that generalize well: e.g., a vanilla ViT-Huge modelachieves the best accuracy (87.8%) among methods that use only ImageNet-1Kdata. Transfer performance in downstream tasks outperforms supervisedpre-training and shows promising scaling behavior.
<<<
翻译
12.
符毓 Yu
(2025-01-31 11:25):
#paper doi.org/10.48550/arXiv.2405.18730, 2024, Development of a Novel Impedance-Controlled Quasi-Direct-Drive Robotic Hand. 准直驱执行器除了低成本、易于控制等优势外,本文提出准直驱执行器在灵巧手的应用场景,如从桌子边缘拾取硬币等小物体,或从非结构化环境中快速 / 动态抓取小物体,也有独特的优势。
arXiv,
2024-05-29T03:20:46Z.
DOI: 10.48550/arXiv.2405.18730
Abstract:
Most robotic hands and grippers rely on actuators with large gearboxes andforce sensors for controlling gripping force. However, this might not be idealfor tasks that require the robot to interact …
>>>
Most robotic hands and grippers rely on actuators with large gearboxes andforce sensors for controlling gripping force. However, this might not be idealfor tasks that require the robot to interact with an unstructured and unknownenvironment. In this paper, we introduce a novel quasi-direct-drivetwo-fingered robotic hand with variable impedance control in the joint spaceand Cartesian space. The hand has a total of four degrees of freedom,backdrivable differential gear trains, and four brushless direct current (BLDC)motors. Motor torque is controlled through Field-Oriented Control (FOC) withcurrent sensing. Variable impedance control enables the robotic hand to executedexterous manipulation tasks safely during environment-robot and human-robotinteractions. The quasi-direct-drive actuators eliminate the need for complextactile/force sensors or precise motion planning when handling environmentalcontact. A majority-3D-printed assembly makes this a low-cost research platformbuilt with affordable, readily available off-the-shelf components. Experimentalvalidation demonstrates the robotic hand's capability for stable force-closureand form-closure grasps in the presence of disturbances, reliable in-handmanipulation, and safe dynamic manipulations despite contact with theenvironment.
<<<
翻译
13.
刘昊辰
(2025-01-24 14:04):
#paper Proof Number Based Monte-Carlo Tree Search. 这篇论文提出了 PN-MCTS 算法,将蒙特卡洛树搜索(MCTS)和证明数搜索(PNS)相结合,通过在多个游戏领域实验,验证了该算法在部分游戏上相比传统 MCTS 的优势,为游戏搜索算法改进提供了新方向。下载地址:https://arxiv.org/pdf/2303.09449
arXiv,
2023-03-16T16:27:07Z.
DOI: 10.48550/arXiv.2303.09449
Abstract:
This paper proposes a new game-search algorithm, PN-MCTS, which combinesMonte-Carlo Tree Search (MCTS) and Proof-Number Search (PNS). These twoalgorithms have been successfully applied for decision making in a range ofdomains. …
>>>
This paper proposes a new game-search algorithm, PN-MCTS, which combinesMonte-Carlo Tree Search (MCTS) and Proof-Number Search (PNS). These twoalgorithms have been successfully applied for decision making in a range ofdomains. We define three areas where the additional knowledge provided by theproof and disproof numbers gathered in MCTS trees might be used: final moveselection, solving subtrees, and the UCB1 selection mechanism. We test allpossible combinations on different time settings, playing against vanilla UCTon several games: Lines of Action ($7$$\times$$7$ and $8$$\times$$8$ boardsizes), MiniShogi, Knightthrough, and Awari. Furthermore, we extend this newalgorithm to properly address games with draws, like Awari, by adding anadditional layer of PNS on top of the MCTS tree. The experiments show thatPN-MCTS is able to outperform MCTS in all tested game domains, achieving winrates up to 96.2% for Lines of Action.
<<<
翻译
14.
林海onrush
(2025-01-01 00:27):
#paper, doi: https://doi.org/10.48550/arXiv.2305.19229 ,FedDisco: Federated Learning with Discrepancy-Aware Collaboration, AI顶会ICML上的一篇联邦学习文章,这篇论文提出了一种新的联邦学习(Federated Learning, FL)方法,称为 FedDisco,用于解决数据异质性问题,特别是类别分布的差异性。传统联邦学习通常根据客户端数据集的大小分配模型聚合权重,但这种方法无法充分反映客户端数据的类别分布差异,导致全局模型优化性能不足。FedDisco 引入了一种“差异感知”的聚合权重计算方式,将客户端的数据集大小和本地与全局类别分布的差异程度结合起来,通过调整聚合权重优化全局模型。这一方法在保持隐私保护的前提下,提高了通信和计算效率,并通过理论分析证明了其能有效收紧优化误差上界,从而改善全局模型性能。
实验表明,FedDisco 在多种异质性场景和数据集上显著优于现有的联邦学习方法,且其模块化设计可以轻松整合到现有方法中以进一步提升性能。此外,该方法在仅部分客户端参与的场景和文本分类任务中也表现出良好的适用性。FedDisco 的关键优势在于其创新的聚合权重分配策略,能够在低计算和通信开销下,提升联邦学习算法的鲁棒性和泛化能力。
arXiv,
2023-05-30T17:20:51Z.
DOI: 10.48550/arXiv.2305.19229
Abstract:
This work considers the category distribution heterogeneity in federatedlearning. This issue is due to biased labeling preferences at multiple clientsand is a typical setting of data heterogeneity. To alleviate this …
>>>
This work considers the category distribution heterogeneity in federatedlearning. This issue is due to biased labeling preferences at multiple clientsand is a typical setting of data heterogeneity. To alleviate this issue, mostprevious works consider either regularizing local models or fine-tuning theglobal model, while they ignore the adjustment of aggregation weights andsimply assign weights based on the dataset size. However, based on ourempirical observations and theoretical analysis, we find that the dataset sizeis not optimal and the discrepancy between local and global categorydistributions could be a beneficial and complementary indicator for determiningaggregation weights. We thus propose a novel aggregation method, FederatedLearning with Discrepancy-aware Collaboration (FedDisco), whose aggregationweights not only involve both the dataset size and the discrepancy value, butalso contribute to a tighter theoretical upper bound of the optimization error.FedDisco also promotes privacy-preservation, communication and computationefficiency, as well as modularity. Extensive experiments show that our FedDiscooutperforms several state-of-the-art methods and can be easily incorporatedwith many existing methods to further enhance the performance. Our code will beavailable at https://github.com/MediaBrain-SJTU/FedDisco.
<<<
翻译
15.
前进
(2024-12-31 20:09):
#paper DOI 10.48550/arXiv.2111.06377 He, K., Chen, X., Xie, S., Li, Y., Doll'ar, P., & Girshick, R. B. (2021). Masked Autoencoders Are Scalable Vision Learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 本文提出了一种创新的自监督学习框架器(MAE)。该方法的核心创新在于采用随机遮蔽策略,仅利用图像中未被遮蔽的25%像素来重建整个图像,从而迫使模型学习到更有效的视觉特征。此外,MAE采用非对称的编码器-解码器架构。使用一个编码器,仅处理未被遮蔽的图像部分,以及一个轻量级的解码器,它从编码器的输出和遮蔽部分的位置信息中重建原始图像。大幅降低了计算成本,提高了训练效率。实验结果表明,MAE在自监督预训练方面具有出色的泛化能力,可应用于多种下游任务,且具备良好的可扩展性。
arXiv,
2021-11-11T18:46:40Z.
DOI: 10.48550/arXiv.2111.06377
Abstract:
This paper shows that masked autoencoders (MAE) are scalable self-supervisedlearners for computer vision. Our MAE approach is simple: we mask randompatches of the input image and reconstruct the missing pixels. …
>>>
This paper shows that masked autoencoders (MAE) are scalable self-supervisedlearners for computer vision. Our MAE approach is simple: we mask randompatches of the input image and reconstruct the missing pixels. It is based ontwo core designs. First, we develop an asymmetric encoder-decoder architecture,with an encoder that operates only on the visible subset of patches (withoutmask tokens), along with a lightweight decoder that reconstructs the originalimage from the latent representation and mask tokens. Second, we find thatmasking a high proportion of the input image, e.g., 75%, yields a nontrivialand meaningful self-supervisory task. Coupling these two designs enables us totrain large models efficiently and effectively: we accelerate training (by 3xor more) and improve accuracy. Our scalable approach allows for learninghigh-capacity models that generalize well: e.g., a vanilla ViT-Huge modelachieves the best accuracy (87.8%) among methods that use only ImageNet-1Kdata. Transfer performance in downstream tasks outperforms supervisedpre-training and shows promising scaling behavior.
<<<
翻译
16.
尹志
(2024-11-30 22:05):
#paper https://doi.org/10.48550/arXiv.1701.08223 2017, The Python-based Simulations of Chemistry Framework (PySCF)。非常重要的量子化学工具PySCF的介绍。2014年启动的项目,从一开始的仅仅有几个函数功能,到现在对各种量化问题的计算的良好支持,其易用性及可扩展性得到了社群的认可。这个特性其实在软件于2015年发布的时候就设定好了。因此,几乎所有功能代码都由python实现,只有遇到特别的time-ciritical的代码部分才去用c实现。当然,这个特性使得目前大量量化计算的库都依赖于pyscf,俨然成为开源领域的gaussion的有力竞争者。
arXiv,
2017-01-27T23:57:43Z.
DOI: 10.48550/arXiv.1701.08223
Abstract:
PySCF is a general-purpose electronic structure platform designed from theground up to emphasize code simplicity, both to aid new method development, aswell as for flexibility in computational workflow. The package …
>>>
PySCF is a general-purpose electronic structure platform designed from theground up to emphasize code simplicity, both to aid new method development, aswell as for flexibility in computational workflow. The package provides a widerange of tools to support simulations of finite size systems, extended systemswith periodic boundary conditions, low dimensional periodic systems, and customHamiltonians, using mean-field and post-mean-field methods with standardGaussian basis functions. To ensure easy of extensibility, PySCF uses thePython language to implement almost all its features, while computationallycritical paths are implemented with heavily optimized C routines. Using thiscombined Python/C implementation, the package is as efficient as the bestexisting C or Fortran based quantum chemistry programs. In this paper wedocument the capabilities and design philosophy of the current version of thePySCF package.
<<<
翻译
17.
符毓 Yu
(2024-11-30 20:46):
#paper doi.org/10.48550/arXiv.2411.18454, 2024, Optimizing Coverage in Convex Quadrilateral Regions with a Single UAV. 本文研究了单个无人机的最佳悬停高度,以提供对地面上任何凸四边形区域的覆盖。无人机采用了一个定向天线与倾斜波束,产生一个椭圆形的覆盖模式。考虑两种情况:(1)在四边形内内接最大的椭圆以覆盖其内部,以及(2)围绕四边形外接最小的椭圆以确保完全覆盖。我们推导出最佳的无人机高度和天线倾斜条件下,在这两种情况下的简化但广泛接受的路径损耗模型和覆盖效率的数值结果。这项工作有助于开发节能的无人机通信系统。
arXiv,
2024-11-27T15:45:31Z.
DOI: 10.48550/arXiv.2411.18454
Abstract:
This letter investigates the optimal hovering altitude of a single UAV toprovide coverage over any convex quadrilateral region on the ground. The UAVemploys a directional antenna with a tiltable beam, …
>>>
This letter investigates the optimal hovering altitude of a single UAV toprovide coverage over any convex quadrilateral region on the ground. The UAVemploys a directional antenna with a tiltable beam, producing an ellipticalcoverage pattern. Two scenarios are considered: (1) inscribing the largestellipse within the quadrilateral to cover its interior, and (2) circumscribingthe smallest ellipse about the quadrilateral to ensure full coverage. We derivethe optimal UAV altitude and antenna tilt conditions in both scenarios for asimplified yet widely accepted path loss model and present numerical resultsfor coverage efficiency. The work contributes to the development ofenergy-efficient UAV-based communication systems.
<<<
翻译
18.
前进
(2024-10-31 15:09):
#paper arXiv:2408.05839v2 Deep Learning in Medical Image Registration: Magic or Mirage? 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
这篇论文深入探讨了医学图像配准领域中,基于深度学习的图像配准(DLIR)与传统优化方法的性能对比。论文比较了传统优化方法和基于学习的学习方法在DIR中的性能,指出传统方法在跨模态的泛化能力和稳健性能方面具有优势,而基于学习的方法则通过弱监督来实现更优的性能。通过一系列实验,论文验证了在无监督设置下,基于学习的方法在标签匹配性能上并没有显著超越传统方法,并提出了一个假设,即学习方法中的架构设计不太可能影响像素强度分布和标签之间的互信息,因此也不太可能显著提升基于学习的方法的性能。此外,论文还展示了在弱监督下,基于学习的方法具有更高的配准精度,这是传统方法难以实现的。然而,基于学习的方法对数据分布的变化较为敏感,并且未能展现出对数据分布变化的鲁棒性。论文最后给出结论,如果没有大型标记数据集,传统优化方法仍然是更优的选择。
arXiv,
2024-08-11T18:20:08Z.
DOI: 10.48550/arXiv.2408.05839
Abstract:
Classical optimization and learning-based methods are the two reigningparadigms in deformable image registration. While optimization-based methodsboast generalizability across modalities and robust performance, learning-basedmethods promise peak performance, incorporating weak supervision and …
>>>
Classical optimization and learning-based methods are the two reigningparadigms in deformable image registration. While optimization-based methodsboast generalizability across modalities and robust performance, learning-basedmethods promise peak performance, incorporating weak supervision and amortizedoptimization. However, the exact conditions for either paradigm to perform wellover the other are shrouded and not explicitly outlined in the existingliterature. In this paper, we make an explicit correspondence between themutual information of the distribution of per-pixel intensity and labels, andthe performance of classical registration methods. This strong correlationhints to the fact that architectural designs in learning-based methods isunlikely to affect this correlation, and therefore, the performance oflearning-based methods. This hypothesis is thoroughly validated withstate-of-the-art classical and learning-based methods. However, learning-basedmethods with weak supervision can perform high-fidelity intensity and labelregistration, which is not possible with classical methods. Next, we show thatthis high-fidelity feature learning does not translate to invariance to domainshift, and learning-based methods are sensitive to such changes in the datadistribution. Finally, we propose a general recipe to choose the best paradigmfor a given registration problem, based on these observations.
<<<
翻译
19.
张浩彬
(2024-10-30 10:19):
#paper
AdapterFusion: Non-Destructive Task Composition for Transfer Learning
https://doi.org/10.48550/arXiv.2005.00247
adapter的改进版本,AdapterFusion。简单来说就是多个任务分别构建adapter,之后通过组合adapters的方式实现更好知识融合。
摘要简述:序列微调和多任务学习是旨在融合多个任务知识的方法;然而,它们存在灾难性遗忘和数据集平衡困难的问题。为了解决这些缺点,我们提出了AdapterFusion,这是一种新的两阶段学习算法,可以利用多个任务的知识。首先,在知识提取阶段,我们学习称为adapters的特定任务参数,这些参数封装了特定任务的信息。然后,我们在单独的知识组合步骤中组合adapters。我们表明,通过分离这两个阶段,即知识提取和知识组合,分类器可以以非破坏性的方式有效地利用从多个任务中学习的表示。我们在16个不同的NLU任务上对AdapterFusion进行了实证评估,发现它可以有效地在模型的不同层结合各种类型的知识。我们表明,我们的方法优于传统策略,如完全微调以及多任务学习。我们的代码和adapters可在AdapterHub.ml上获得。
arXiv,
2020-05-01T07:03:42Z.
DOI: 10.48550/arXiv.2005.00247
Abstract:
Sequential fine-tuning and multi-task learning are methods aiming toincorporate knowledge from multiple tasks; however, they suffer fromcatastrophic forgetting and difficulties in dataset balancing. To address theseshortcomings, we propose AdapterFusion, a …
>>>
Sequential fine-tuning and multi-task learning are methods aiming toincorporate knowledge from multiple tasks; however, they suffer fromcatastrophic forgetting and difficulties in dataset balancing. To address theseshortcomings, we propose AdapterFusion, a new two stage learning algorithm thatleverages knowledge from multiple tasks. First, in the knowledge extractionstage we learn task specific parameters called adapters, that encapsulate thetask-specific information. We then combine the adapters in a separate knowledgecomposition step. We show that by separating the two stages, i.e., knowledgeextraction and knowledge composition, the classifier can effectively exploitthe representations learned from multiple tasks in a non-destructive manner. Weempirically evaluate AdapterFusion on 16 diverse NLU tasks, and find that iteffectively combines various types of knowledge at different layers of themodel. We show that our approach outperforms traditional strategies such asfull fine-tuning as well as multi-task learning. Our code and adapters areavailable at AdapterHub.ml.
<<<
翻译
20.
刘昊辰
(2024-10-12 10:09):
#paper arXiv:2409.12272v1 [cs.LG] 18 Sep 2024, Mastering Chess with a Transformer Model. 这是一篇关于Transformer模型在国际象棋中的应用的研究论文。论文证明了Transformer在国际象棋中的有效性在很大程度上取决于注意力机制中位置编码的选择。基于这一观察,论文采用了Shaw等人的通用位置编码方案,并大规模地训练了具有这种技术和其他增强功能的模型,将得到的架构称为ChessFormer。这种架构在对弈实力和解谜能力方面显著优于先前的工作,且计算成本只是其一小部分。下载地址:https://arxiv.org/pdf/2409.12272
arXiv,
2024-09-18T19:05:21Z.
DOI: 10.48550/arXiv.2409.12272
Abstract:
Transformer models have demonstrated impressive capabilities when trained atscale, excelling at difficult cognitive tasks requiring complex reasoning andrational decision-making. In this paper, we explore the application oftransformer models to chess, …
>>>
Transformer models have demonstrated impressive capabilities when trained atscale, excelling at difficult cognitive tasks requiring complex reasoning andrational decision-making. In this paper, we explore the application oftransformer models to chess, focusing on the critical role of the positionencoding within the attention mechanism. We show that in chess, transformersendowed with a sufficiently versatile position encoding can match existingchess-playing models at a fraction of the computational cost. Our architecturesignificantly outperforms AlphaZero at 8x fewer FLOPS and matches priorgrandmaster-level transformer-based agents at 30x fewer FLOPS.
<<<
翻译