来自杂志 arXiv 的文献。
当前共找到 125 篇文献分享,本页显示第 1 - 20 篇。
1.
Kunji (2025-02-28 23:59):
#paper, https://arxiv.org/pdf/2410.05273, HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers, VLA依赖于数十亿参数的VLM,虽然具有强大的泛化能力,但计算成本高、推理速度慢,限制了其在动态任务中的应用。为了解决这些局限性,文章提出了HiRT框架(Hierarchical Robot Transformer framework),借鉴了人类认知的双过程理论,采用双系统架构和异步操作机制,实现频率与性能之间的平衡。在模拟和真实环境中的实验结果表明,HiRT取得了显著的改进。在静态任务中,控制频率提高了一倍,并实现了相当的成功率。此外,在之前VLA模型难以应对的真实世界动态操作任务中,HiRT将成功率从48%提高到了75%。
arXiv, 2024-09-12T09:18:09Z. DOI: 10.48550/arXiv.2410.05273
Abstract:
Large Vision-Language-Action (VLA) models, leveraging powerful pre trainedVision-Language Models (VLMs) backends, have shown promise in robotic controldue to their impressive generalization ability. However, the success comes at acost. Their reliance … >>>
Large Vision-Language-Action (VLA) models, leveraging powerful pre trainedVision-Language Models (VLMs) backends, have shown promise in robotic controldue to their impressive generalization ability. However, the success comes at acost. Their reliance on VLM backends with billions of parameters leads to highcomputational costs and inference latency, limiting the testing scenarios tomainly quasi-static tasks and hindering performance in dynamic tasks requiringrapid interactions. To address these limitations, this paper proposes HiRT, aHierarchical Robot Transformer framework that enables flexible frequency andperformance trade-off. HiRT keeps VLMs running at low frequencies to capturetemporarily invariant features while enabling real-time interaction through ahigh-frequency vision-based policy guided by the slowly updated features.Experiment results in both simulation and real-world settings demonstratesignificant improvements over baseline methods. Empirically, in static tasks,we double the control frequency and achieve comparable success rates.Additionally, on novel real-world dynamic ma nipulation tasks which arechallenging for previous VLA models, HiRT improves the success rate from 48% to75%. <<<
翻译
2.
符毓 Yu (2025-02-28 23:00):
#paper doi.org/10.48550/arXiv.2411.13677, 2024, Bimanual Dexterity for Complex Tasks. 遥操作是机器人获取数据的重要方式。文章介绍了一种便携、低成本(总成本约12k美元,其中5k的手,7k的系统;可额外配合双机械臂16k)且极其精确的双手人形机器人手臂系统遥操作方法,展示了该系统在桌面和移动环境中的适用性,并展示了它在执行双手灵巧任务时相较于其他方法(如 SteamVR 和 Vision Pro等)的高效性。但由于缺乏触觉反馈,操作员只能依赖视觉反馈进行遥操作,无法感知机器人手臂的感觉
arXiv, 2024-11-20T19:53:35Z. DOI: 10.48550/arXiv.2411.13677
Abstract:
To train generalist robot policies, machine learning methods often require asubstantial amount of expert human teleoperation data. An ideal robot forhumans collecting data is one that closely mimics them: bimanual … >>>
To train generalist robot policies, machine learning methods often require asubstantial amount of expert human teleoperation data. An ideal robot forhumans collecting data is one that closely mimics them: bimanual arms anddexterous hands. However, creating such a bimanual teleoperation system withover 50 DoF is a significant challenge. To address this, we introduce Bidex, anextremely dexterous, low-cost, low-latency and portable bimanual dexterousteleoperation system which relies on motion capture gloves and teacher arms. Wecompare Bidex to a Vision Pro teleoperation system and a SteamVR system andfind Bidex to produce better quality data for more complex tasks at a fasterrate. Additionally, we show Bidex operating a mobile bimanual robot for in thewild tasks. The robot hands (5k USD) and teleoperation system (7k USD) isreadily reproducible and can be used on many robot arms including two xArms(16k USD). Website at https://bidex-teleop.github.io/ <<<
翻译
3.
尹志 (2025-02-28 15:55):
#paper doi:10.48550/arXiv.2205.15463 Few-Shot Diffusion Models. 文章提出了一种扩散模型及set-based ViT的方式实现few shot生成的技术。实验表明,该模型仅需5个样本就可以完成新类别的生成。
arXiv, 2022-05-30T23:20:33Z. DOI: 10.48550/arXiv.2205.15463
Abstract:
Denoising diffusion probabilistic models (DDPM) are powerful hierarchicallatent variable models with remarkable sample generation quality and trainingstability. These properties can be attributed to parameter sharing in thegenerative hierarchy, as well … >>>
Denoising diffusion probabilistic models (DDPM) are powerful hierarchicallatent variable models with remarkable sample generation quality and trainingstability. These properties can be attributed to parameter sharing in thegenerative hierarchy, as well as a parameter-free diffusion-based inferenceprocedure. In this paper, we present Few-Shot Diffusion Models (FSDM), aframework for few-shot generation leveraging conditional DDPMs. FSDMs aretrained to adapt the generative process conditioned on a small set of imagesfrom a given class by aggregating image patch information using a set-basedVision Transformer (ViT). At test time, the model is able to generate samplesfrom previously unseen classes conditioned on as few as 5 samples from thatclass. We empirically show that FSDM can perform few-shot generation andtransfer to new datasets. We benchmark variants of our method on complex visiondatasets for few-shot learning and compare to unconditional and conditionalDDPM baselines. Additionally, we show how conditioning the model on patch-basedinput set information improves training convergence. <<<
翻译
4.
刘昊辰 (2025-02-25 22:38):
#paper Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks. 这是一篇关于如何使用强化学习(Reinforcement Learning)和循环神经网络(Recurrent Neural Networks, RNN)来玩六角格战棋游戏(Hex and Counter Wargames)的研究论文。论文提出一种结合AlphaZero强化学习算法和循环神经网络的新系统,以应对六角格战棋游戏的战略复杂性。该系统能够在不同地形和战术情况下进行泛化,并探索其在更大地图上的扩展能力。提出的系统在有限的训练资源和计算能力下,能够在复杂的六角格战棋游戏中取得良好的表现,展示了其在复杂场景中的泛化能力。下载地址:https://arxiv.org/abs/2502.13918
arXiv, 2025-02-19T17:52:45Z. DOI: 10.48550/arXiv.2502.13918
Abstract:
Hex and Counter Wargames are adversarial two-player simulations of realmilitary conflicts requiring complex strategic decision-making. Unlikeclassical board games, these games feature intricate terrain/unit interactions,unit stacking, large maps of varying sizes, … >>>
Hex and Counter Wargames are adversarial two-player simulations of realmilitary conflicts requiring complex strategic decision-making. Unlikeclassical board games, these games feature intricate terrain/unit interactions,unit stacking, large maps of varying sizes, and simultaneous move and combatdecisions involving hundreds of units. This paper introduces a novel systemdesigned to address the strategic complexity of Hex and Counter Wargames byintegrating cutting-edge advancements in Recurrent Neural Networks withAlphaZero, a reliable modern Reinforcement Learning algorithm. The systemutilizes a new Neural Network architecture developed from existing research,incorporating innovative state and action representations tailored to thesespecific game environments. With minimal training, our solution has shownpromising results in typical scenarios, demonstrating the ability to generalizeacross different terrain and tactical situations. Additionally, we explore thesystem's potential to scale to larger map sizes. The developed system is openlyaccessible, facilitating continued research and exploration within thischallenging domain. <<<
翻译
5.
惊鸿 (2025-02-15 00:02):
#paper DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Pub Date : 2024-05-07 DOI : arxiv-2405.04434 我们提出了 DeepSeek-V2,一种强大的专家混合 (MoE) 语言模型,其特点是经济的训练和高效的推理。它总共包括236B个参数,其中每个令牌激活21B个参数,并支持128K令牌的上下文长度。 DeepSeek-V2采用多头潜在注意力(MLA)和DeepSeekMoE等创新架构。 MLA 通过将键值 (KV) 缓存显着压缩为潜在向量来保证高效推理,而 DeepSeekMoE 则可以通过稀疏计算以经济的成本训练强大的模型。与 DeepSeek 67B 相比,DeepSeek-V2 性能显着增强,同时节省了 42.5% 的训练成本,减少了 93.3% 的 KV 缓存,最大生成吞吐量提升至 5.76 倍。我们在由 8.1T 代币组成的高质量多源语料库上对 DeepSeek-V2 进行预训练,并进一步进行监督微调(SFT)和强化学习(RL)以充分释放其潜力。评估结果表明,即使只有21B个激活参数,DeepSeek-V2及其聊天版本仍然达到了开源模型中顶级的性能。模型检查点位于“https://github.com/deepseek-ai/DeepSeek-V2”。
arXiv, 2024-05-07T15:56:43Z. DOI: 10.48550/arXiv.2405.04434
DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J. L. Cai, Jian Liang, Jianzhong Guo, Jiaqi Ni, Jiashi Li, Jin Chen, Jingyang Yuan, Junjie Qiu, Junxiao Song, Kai Dong, Kaige Gao, Kang Guan, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qihao Zhu, Qinyu Chen, Qiushi Du, R. J. Chen, R. L. Jin, Ruiqi Ge, Ruizhe Pan, Runxin Xu, Ruyi Chen, S. S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Size Zheng, T. Wang, Tian Pei, Tian Yuan, Tianyu Sun, W. L. Xiao, Wangding Zeng, Wei An, Wen Liu, Wenfeng Liang, Wenjun Gao, Wentao Zhang, X. Q. Li, Xiangyue Jin, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaojin Shen, Xiaokang Chen, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun, Xiaoxiang Wang, Xin Liu, Xin Xie, Xingkai Yu, Xinnan Song, Xinyi Zhou, Xinyu Yang, Xuan Lu, Xuecheng Su, Y. Wu, Y. K. Li, Y. X. Wei, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Li, Yaohui Wang, Yi Zheng, Yichao Zhang, Yiliang Xiong, Yilong Zhao, Ying He, Ying Tang, Yishi Piao, Yixin Dong, Yixuan Tan, Yiyuan Liu, Yongji Wang, Yongqiang Guo, Yuchen Zhu, Yuduan Wang, Yuheng Zou, Yukun Zha, Yunxian Ma, Yuting Yan, Yuxiang You, Yuxuan Liu, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhen Huang, Zhen Zhang, Zhenda Xie, Zhewen Hao, Zhihong Shao, Zhiniu Wen, Zhipeng Xu, Zhongyu Zhang, Zhuoshu Li, Zihan Wang, Zihui Gu, Zilin Li, Ziwei Xie <<<
Abstract:
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language modelcharacterized by economical training and efficient inference. It comprises 236Btotal parameters, of which 21B are activated for each token, and supports acontext … >>>
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language modelcharacterized by economical training and efficient inference. It comprises 236Btotal parameters, of which 21B are activated for each token, and supports acontext length of 128K tokens. DeepSeek-V2 adopts innovative architecturesincluding Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guaranteesefficient inference through significantly compressing the Key-Value (KV) cacheinto a latent vector, while DeepSeekMoE enables training strong models at aneconomical cost through sparse computation. Compared with DeepSeek 67B,DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximumgeneration throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-qualityand multi-source corpus consisting of 8.1T tokens, and further performSupervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlockits potential. Evaluation results show that, even with only 21B activatedparameters, DeepSeek-V2 and its chat versions still achieve top-tierperformance among open-source models. <<<
翻译
6.
林海onrush (2025-01-31 23:53):
#paper, https://doi.org/10.48550/arXiv.2312.01156, Efficient Light Source Placement using Quantum Computing, 这是一个有趣的小问题, 如何利用量子计算解决《我的世界》游戏中的火把放置问题,将形式转化为二次无约束二进制优化(QUBO)问题,通过迭代学习拉格朗日乘子来处理约束条件。实验说明该方法能在合理迭代次数内找到有效的火把放置方案,虽然当前量子硬件存在局限性,经典方法在较大地图上表现更优一些。火把放置问题与集合覆盖问题相联系,展示量子计算在资源优化问题中的价值。
arXiv, 2023-12-02T15:28:59Z. DOI: 10.48550/arXiv.2312.01156
Abstract:
NP-hard problems regularly come up in video games, with interestingconnections to real-world problems. In the game Minecraft, players placetorches on the ground to light up dark areas. Placing them in … >>>
NP-hard problems regularly come up in video games, with interestingconnections to real-world problems. In the game Minecraft, players placetorches on the ground to light up dark areas. Placing them in a way thatminimizes the total number of torches to save resources is far from trivial. Inthis paper, we use Quantum Computing to approach this problem. To this end, wederive a QUBO formulation of the torch placement problem, which we uncover tobe very similar to another NP-hard problem. We employ a solution strategy thatinvolves learning Lagrangian weights in an iterative process, adding to theever growing toolbox of QUBO formulations. Finally, we perform experiments onreal quantum hardware using real game data to demonstrate that our approachyields good torch placements. <<<
翻译
7.
前进 (2025-01-31 22:31):
#paper 10.48550/arxiv.2408.10234 The Unbearable Slowness of Being: Why do we live at 10 bits/s? arXiv:2408.10234v2 [q-bio.NC] Jieyu Zheng, Markus Meiste 论文探讨了人类行为信息处理速度的悖论性缓慢。尽管人类的感官系统能够以每秒约10⁹比特(bits/s)的速度收集信息,但人类的整体信息处理速度却仅为每秒10比特。这种巨大的差异尚未得到充分解释,涉及大脑功能的许多基本方面。通过多种实验和案例,论文展示了人类行为的信息处理速度约为10 bits/s,且这种速度限制可能与大脑的串行处理特性有关。尽管外周神经系统(如视锥细胞和视神经)能够以极高的速率处理信息,但大脑的中枢部分似乎以串行方式处理信息,一次只能专注于一个任务。这种串行处理方式可能是大脑在进化过程中形成的,因为早期神经系统的主要功能是控制运动,而运动决策通常是局部的、单一的。此外,论文还提出大脑可能存在“外脑”和“内脑”两种模式:外脑负责处理高维度的感官输入和运动输出,信息处理速率极高;内脑则负责处理低维度的信息流,用于决策和行为控制,信息处理速率极低(约10 bits/s)。这种内外脑的分工可能是导致信息处理速度受限的重要原因。论文建议未来的研究需要进一步探索大脑内外信息处理的差异,以及如何优化信息处理效率。
arXiv, 2024-08-03T22:56:45Z. DOI: 10.48550/arXiv.2408.10234
Abstract:
This article is about the neural conundrum behind the slowness of humanbehavior. The information throughput of a human being is about 10 bits/s. Incomparison, our sensory systems gather data at … >>>
This article is about the neural conundrum behind the slowness of humanbehavior. The information throughput of a human being is about 10 bits/s. Incomparison, our sensory systems gather data at ~10^9 bits/s. The stark contrastbetween these numbers remains unexplained and touches on fundamental aspects ofbrain function: What neural substrate sets this speed limit on the pace of ourexistence? Why does the brain need billions of neurons to process 10 bits/s?Why can we only think about one thing at a time? The brain seems to operate intwo distinct modes: the "outer" brain handles fast high-dimensional sensory andmotor signals, whereas the "inner" brain processes the reduced few bits neededto control behavior. Plausible explanations exist for the large neuron numbersin the outer brain, but not for the inner brain, and we propose new researchdirections to remedy this. <<<
翻译
8.
尹志 (2025-01-31 17:05):
#paper https://doi.org/10.48550/arXiv.2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews 一篇讨论大语言模型使用情况的文章, 特别举了在AI顶会评审中使用的具体例子。(包括ICLR 2024、NeurIPS 2023、CoRL 2023和EMNLP 2023。)研究发现,这些论文review中,有6.5%至16.9%可能被LLM大幅修改,而且这些review有很多有趣的特点,比如confidence比较低,接近ddl才提交,而且不太愿意回应作者反驳等。更多有趣的现象可参考原文。文章中贴了最常见的AI喜欢使用的形容词,比如“commendable”, “meticulous”, and “intricate”等,确实很像AI搞的,哈哈哈。 看来以后审稿人要对作者更加负责才行噢。
arXiv, 2024-03-11T21:51:39Z. DOI: 10.48550/arXiv.2403.07183
Abstract:
We present an approach for estimating the fraction of text in a large corpuswhich is likely to be substantially modified or produced by a large languagemodel (LLM). Our maximum likelihood … >>>
We present an approach for estimating the fraction of text in a large corpuswhich is likely to be substantially modified or produced by a large languagemodel (LLM). Our maximum likelihood model leverages expert-written andAI-generated reference texts to accurately and efficiently examine real-worldLLM-use at the corpus level. We apply this approach to a case study ofscientific peer review in AI conferences that took place after the release ofChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggestthat between 6.5% and 16.9% of text submitted as peer reviews to theseconferences could have been substantially modified by LLMs, i.e. beyondspell-checking or minor writing updates. The circumstances in which generatedtext occurs offer insight into user behavior: the estimated fraction ofLLM-generated text is higher in reviews which report lower confidence, weresubmitted close to the deadline, and from reviewers who are less likely torespond to author rebuttals. We also observe corpus-level trends in generatedtext which may be too subtle to detect at the individual level, and discuss theimplications of such trends on peer review. We call for futureinterdisciplinary work to examine how LLM use is changing our information andknowledge practices. <<<
翻译
9.
Vincent (2025-01-31 14:05):
#paper https://doi.org/10.48550/arXiv.2111.06377 arxiv. 2021. Masked Autoencoders Are Scalable Vision Learners. Computer vision里很经典的一篇文章,提出了一种简单、快速、有效的模型 Masked autoencoder (MAE)。核心思路是随机遮盖图像区域,然后用模型去复原这些被遮盖的区域。MAE由不对称的编码器和解码器构成,编码器将图像的可见区域编码到隐空间,解码器使用隐空间的数据表征和遮盖符还原原始图片。值得注意的是即使遮盖区域达到75%,还原的图像和原始图像仍然很像,也说明图像里面的信息是十分稀疏的。另外由于编码区域只使用了原始图像的一部分,这使得MAE能大大加速训练的过程,同时得益于自监督学习和更好的表征能力,其在下游任务的预测效果也更好。值得注意的是,这种“预测掩盖区域”的技术在语言模型中早有应用,这篇文章只是将其用在了CV领域,展现了CV也可以用NLP的一些研究思路来推进。
arXiv, 2021-11-11T18:46:40Z. DOI: 10.48550/arXiv.2111.06377
Abstract:
This paper shows that masked autoencoders (MAE) are scalable self-supervisedlearners for computer vision. Our MAE approach is simple: we mask randompatches of the input image and reconstruct the missing pixels. … >>>
This paper shows that masked autoencoders (MAE) are scalable self-supervisedlearners for computer vision. Our MAE approach is simple: we mask randompatches of the input image and reconstruct the missing pixels. It is based ontwo core designs. First, we develop an asymmetric encoder-decoder architecture,with an encoder that operates only on the visible subset of patches (withoutmask tokens), along with a lightweight decoder that reconstructs the originalimage from the latent representation and mask tokens. Second, we find thatmasking a high proportion of the input image, e.g., 75%, yields a nontrivialand meaningful self-supervisory task. Coupling these two designs enables us totrain large models efficiently and effectively: we accelerate training (by 3xor more) and improve accuracy. Our scalable approach allows for learninghigh-capacity models that generalize well: e.g., a vanilla ViT-Huge modelachieves the best accuracy (87.8%) among methods that use only ImageNet-1Kdata. Transfer performance in downstream tasks outperforms supervisedpre-training and shows promising scaling behavior. <<<
翻译
10.
符毓 Yu (2025-01-31 11:25):
#paper doi.org/10.48550/arXiv.2405.18730, 2024, Development of a Novel Impedance-Controlled Quasi-Direct-Drive Robotic Hand. 准直驱执行器除了低成本、易于控制等优势外,本文提出准直驱执行器在灵巧手的应用场景,如从桌子边缘拾取硬币等小物体,或从非结构化环境中快速 / 动态抓取小物体,也有独特的优势。
arXiv, 2024-05-29T03:20:46Z. DOI: 10.48550/arXiv.2405.18730
Abstract:
Most robotic hands and grippers rely on actuators with large gearboxes andforce sensors for controlling gripping force. However, this might not be idealfor tasks that require the robot to interact … >>>
Most robotic hands and grippers rely on actuators with large gearboxes andforce sensors for controlling gripping force. However, this might not be idealfor tasks that require the robot to interact with an unstructured and unknownenvironment. In this paper, we introduce a novel quasi-direct-drivetwo-fingered robotic hand with variable impedance control in the joint spaceand Cartesian space. The hand has a total of four degrees of freedom,backdrivable differential gear trains, and four brushless direct current (BLDC)motors. Motor torque is controlled through Field-Oriented Control (FOC) withcurrent sensing. Variable impedance control enables the robotic hand to executedexterous manipulation tasks safely during environment-robot and human-robotinteractions. The quasi-direct-drive actuators eliminate the need for complextactile/force sensors or precise motion planning when handling environmentalcontact. A majority-3D-printed assembly makes this a low-cost research platformbuilt with affordable, readily available off-the-shelf components. Experimentalvalidation demonstrates the robotic hand's capability for stable force-closureand form-closure grasps in the presence of disturbances, reliable in-handmanipulation, and safe dynamic manipulations despite contact with theenvironment. <<<
翻译
11.
刘昊辰 (2025-01-24 14:04):
#paper Proof Number Based Monte-Carlo Tree Search. 这篇论文提出了 PN-MCTS 算法,将蒙特卡洛树搜索(MCTS)和证明数搜索(PNS)相结合,通过在多个游戏领域实验,验证了该算法在部分游戏上相比传统 MCTS 的优势,为游戏搜索算法改进提供了新方向。下载地址:https://arxiv.org/pdf/2303.09449
arXiv, 2023-03-16T16:27:07Z. DOI: 10.48550/arXiv.2303.09449
Abstract:
This paper proposes a new game-search algorithm, PN-MCTS, which combinesMonte-Carlo Tree Search (MCTS) and Proof-Number Search (PNS). These twoalgorithms have been successfully applied for decision making in a range ofdomains. … >>>
This paper proposes a new game-search algorithm, PN-MCTS, which combinesMonte-Carlo Tree Search (MCTS) and Proof-Number Search (PNS). These twoalgorithms have been successfully applied for decision making in a range ofdomains. We define three areas where the additional knowledge provided by theproof and disproof numbers gathered in MCTS trees might be used: final moveselection, solving subtrees, and the UCB1 selection mechanism. We test allpossible combinations on different time settings, playing against vanilla UCTon several games: Lines of Action ($7$$\times$$7$ and $8$$\times$$8$ boardsizes), MiniShogi, Knightthrough, and Awari. Furthermore, we extend this newalgorithm to properly address games with draws, like Awari, by adding anadditional layer of PNS on top of the MCTS tree. The experiments show thatPN-MCTS is able to outperform MCTS in all tested game domains, achieving winrates up to 96.2% for Lines of Action. <<<
翻译
12.
林海onrush (2025-01-01 00:27):
#paper, doi: https://doi.org/10.48550/arXiv.2305.19229 ,FedDisco: Federated Learning with Discrepancy-Aware Collaboration, AI顶会ICML上的一篇联邦学习文章,这篇论文提出了一种新的联邦学习(Federated Learning, FL)方法,称为 FedDisco,用于解决数据异质性问题,特别是类别分布的差异性。传统联邦学习通常根据客户端数据集的大小分配模型聚合权重,但这种方法无法充分反映客户端数据的类别分布差异,导致全局模型优化性能不足。FedDisco 引入了一种“差异感知”的聚合权重计算方式,将客户端的数据集大小和本地与全局类别分布的差异程度结合起来,通过调整聚合权重优化全局模型。这一方法在保持隐私保护的前提下,提高了通信和计算效率,并通过理论分析证明了其能有效收紧优化误差上界,从而改善全局模型性能。 实验表明,FedDisco 在多种异质性场景和数据集上显著优于现有的联邦学习方法,且其模块化设计可以轻松整合到现有方法中以进一步提升性能。此外,该方法在仅部分客户端参与的场景和文本分类任务中也表现出良好的适用性。FedDisco 的关键优势在于其创新的聚合权重分配策略,能够在低计算和通信开销下,提升联邦学习算法的鲁棒性和泛化能力。
arXiv, 2023-05-30T17:20:51Z. DOI: 10.48550/arXiv.2305.19229
Abstract:
This work considers the category distribution heterogeneity in federatedlearning. This issue is due to biased labeling preferences at multiple clientsand is a typical setting of data heterogeneity. To alleviate this … >>>
This work considers the category distribution heterogeneity in federatedlearning. This issue is due to biased labeling preferences at multiple clientsand is a typical setting of data heterogeneity. To alleviate this issue, mostprevious works consider either regularizing local models or fine-tuning theglobal model, while they ignore the adjustment of aggregation weights andsimply assign weights based on the dataset size. However, based on ourempirical observations and theoretical analysis, we find that the dataset sizeis not optimal and the discrepancy between local and global categorydistributions could be a beneficial and complementary indicator for determiningaggregation weights. We thus propose a novel aggregation method, FederatedLearning with Discrepancy-aware Collaboration (FedDisco), whose aggregationweights not only involve both the dataset size and the discrepancy value, butalso contribute to a tighter theoretical upper bound of the optimization error.FedDisco also promotes privacy-preservation, communication and computationefficiency, as well as modularity. Extensive experiments show that our FedDiscooutperforms several state-of-the-art methods and can be easily incorporatedwith many existing methods to further enhance the performance. Our code will beavailable at https://github.com/MediaBrain-SJTU/FedDisco. <<<
翻译
13.
前进 (2024-12-31 20:09):
#paper DOI 10.48550/arXiv.2111.06377 He, K., Chen, X., Xie, S., Li, Y., Doll'ar, P., & Girshick, R. B. (2021). Masked Autoencoders Are Scalable Vision Learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 本文提出了一种创新的自监督学习框架器(MAE)。该方法的核心创新在于采用随机遮蔽策略,仅利用图像中未被遮蔽的25%像素来重建整个图像,从而迫使模型学习到更有效的视觉特征。此外,MAE采用非对称的编码器-解码器架构。使用一个编码器,仅处理未被遮蔽的图像部分,以及一个轻量级的解码器,它从编码器的输出和遮蔽部分的位置信息中重建原始图像。大幅降低了计算成本,提高了训练效率。实验结果表明,MAE在自监督预训练方面具有出色的泛化能力,可应用于多种下游任务,且具备良好的可扩展性。
arXiv, 2021-11-11T18:46:40Z. DOI: 10.48550/arXiv.2111.06377
Abstract:
This paper shows that masked autoencoders (MAE) are scalable self-supervisedlearners for computer vision. Our MAE approach is simple: we mask randompatches of the input image and reconstruct the missing pixels. … >>>
This paper shows that masked autoencoders (MAE) are scalable self-supervisedlearners for computer vision. Our MAE approach is simple: we mask randompatches of the input image and reconstruct the missing pixels. It is based ontwo core designs. First, we develop an asymmetric encoder-decoder architecture,with an encoder that operates only on the visible subset of patches (withoutmask tokens), along with a lightweight decoder that reconstructs the originalimage from the latent representation and mask tokens. Second, we find thatmasking a high proportion of the input image, e.g., 75%, yields a nontrivialand meaningful self-supervisory task. Coupling these two designs enables us totrain large models efficiently and effectively: we accelerate training (by 3xor more) and improve accuracy. Our scalable approach allows for learninghigh-capacity models that generalize well: e.g., a vanilla ViT-Huge modelachieves the best accuracy (87.8%) among methods that use only ImageNet-1Kdata. Transfer performance in downstream tasks outperforms supervisedpre-training and shows promising scaling behavior. <<<
翻译
14.
尹志 (2024-11-30 22:05):
#paper https://doi.org/10.48550/arXiv.1701.08223 2017, The Python-based Simulations of Chemistry Framework (PySCF)。非常重要的量子化学工具PySCF的介绍。2014年启动的项目,从一开始的仅仅有几个函数功能,到现在对各种量化问题的计算的良好支持,其易用性及可扩展性得到了社群的认可。这个特性其实在软件于2015年发布的时候就设定好了。因此,几乎所有功能代码都由python实现,只有遇到特别的time-ciritical的代码部分才去用c实现。当然,这个特性使得目前大量量化计算的库都依赖于pyscf,俨然成为开源领域的gaussion的有力竞争者。
arXiv, 2017-01-27T23:57:43Z. DOI: 10.48550/arXiv.1701.08223
Abstract:
PySCF is a general-purpose electronic structure platform designed from theground up to emphasize code simplicity, both to aid new method development, aswell as for flexibility in computational workflow. The package … >>>
PySCF is a general-purpose electronic structure platform designed from theground up to emphasize code simplicity, both to aid new method development, aswell as for flexibility in computational workflow. The package provides a widerange of tools to support simulations of finite size systems, extended systemswith periodic boundary conditions, low dimensional periodic systems, and customHamiltonians, using mean-field and post-mean-field methods with standardGaussian basis functions. To ensure easy of extensibility, PySCF uses thePython language to implement almost all its features, while computationallycritical paths are implemented with heavily optimized C routines. Using thiscombined Python/C implementation, the package is as efficient as the bestexisting C or Fortran based quantum chemistry programs. In this paper wedocument the capabilities and design philosophy of the current version of thePySCF package. <<<
翻译
15.
符毓 Yu (2024-11-30 20:46):
#paper doi.org/10.48550/arXiv.2411.18454, 2024, Optimizing Coverage in Convex Quadrilateral Regions with a Single UAV. 本文研究了单个无人机的最佳悬停高度,以提供对地面上任何凸四边形区域的覆盖。无人机采用了一个定向天线与倾斜波束,产生一个椭圆形的覆盖模式。考虑两种情况:(1)在四边形内内接最大的椭圆以覆盖其内部,以及(2)围绕四边形外接最小的椭圆以确保完全覆盖。我们推导出最佳的无人机高度和天线倾斜条件下,在这两种情况下的简化但广泛接受的路径损耗模型和覆盖效率的数值结果。这项工作有助于开发节能的无人机通信系统。
arXiv, 2024-11-27T15:45:31Z. DOI: 10.48550/arXiv.2411.18454
Abstract:
This letter investigates the optimal hovering altitude of a single UAV toprovide coverage over any convex quadrilateral region on the ground. The UAVemploys a directional antenna with a tiltable beam, … >>>
This letter investigates the optimal hovering altitude of a single UAV toprovide coverage over any convex quadrilateral region on the ground. The UAVemploys a directional antenna with a tiltable beam, producing an ellipticalcoverage pattern. Two scenarios are considered: (1) inscribing the largestellipse within the quadrilateral to cover its interior, and (2) circumscribingthe smallest ellipse about the quadrilateral to ensure full coverage. We derivethe optimal UAV altitude and antenna tilt conditions in both scenarios for asimplified yet widely accepted path loss model and present numerical resultsfor coverage efficiency. The work contributes to the development ofenergy-efficient UAV-based communication systems. <<<
翻译
16.
前进 (2024-10-31 15:09):
#paper arXiv:2408.05839v2 Deep Learning in Medical Image Registration: Magic or Mirage? 38th Conference on Neural Information Processing Systems (NeurIPS 2024) 这篇论文深入探讨了医学图像配准领域中,基于深度学习的图像配准(DLIR)与传统优化方法的性能对比。论文比较了传统优化方法和基于学习的学习方法在DIR中的性能,指出传统方法在跨模态的泛化能力和稳健性能方面具有优势,而基于学习的方法则通过弱监督来实现更优的性能。通过一系列实验,论文验证了在无监督设置下,基于学习的方法在标签匹配性能上并没有显著超越传统方法,并提出了一个假设,即学习方法中的架构设计不太可能影响像素强度分布和标签之间的互信息,因此也不太可能显著提升基于学习的方法的性能。此外,论文还展示了在弱监督下,基于学习的方法具有更高的配准精度,这是传统方法难以实现的。然而,基于学习的方法对数据分布的变化较为敏感,并且未能展现出对数据分布变化的鲁棒性。论文最后给出结论,如果没有大型标记数据集,传统优化方法仍然是更优的选择。
arXiv, 2024-08-11T18:20:08Z. DOI: 10.48550/arXiv.2408.05839
Abstract:
Classical optimization and learning-based methods are the two reigningparadigms in deformable image registration. While optimization-based methodsboast generalizability across modalities and robust performance, learning-basedmethods promise peak performance, incorporating weak supervision and … >>>
Classical optimization and learning-based methods are the two reigningparadigms in deformable image registration. While optimization-based methodsboast generalizability across modalities and robust performance, learning-basedmethods promise peak performance, incorporating weak supervision and amortizedoptimization. However, the exact conditions for either paradigm to perform wellover the other are shrouded and not explicitly outlined in the existingliterature. In this paper, we make an explicit correspondence between themutual information of the distribution of per-pixel intensity and labels, andthe performance of classical registration methods. This strong correlationhints to the fact that architectural designs in learning-based methods isunlikely to affect this correlation, and therefore, the performance oflearning-based methods. This hypothesis is thoroughly validated withstate-of-the-art classical and learning-based methods. However, learning-basedmethods with weak supervision can perform high-fidelity intensity and labelregistration, which is not possible with classical methods. Next, we show thatthis high-fidelity feature learning does not translate to invariance to domainshift, and learning-based methods are sensitive to such changes in the datadistribution. Finally, we propose a general recipe to choose the best paradigmfor a given registration problem, based on these observations. <<<
翻译
17.
张浩彬 (2024-10-30 10:19):
#paper AdapterFusion: Non-Destructive Task Composition for Transfer Learning https://doi.org/10.48550/arXiv.2005.00247 adapter的改进版本,AdapterFusion。简单来说就是多个任务分别构建adapter,之后通过组合adapters的方式实现更好知识融合。 摘要简述:序列微调和多任务学习是旨在融合多个任务知识的方法;然而,它们存在灾难性遗忘和数据集平衡困难的问题。为了解决这些缺点,我们提出了AdapterFusion,这是一种新的两阶段学习算法,可以利用多个任务的知识。首先,在知识提取阶段,我们学习称为adapters的特定任务参数,这些参数封装了特定任务的信息。然后,我们在单独的知识组合步骤中组合adapters。我们表明,通过分离这两个阶段,即知识提取和知识组合,分类器可以以非破坏性的方式有效地利用从多个任务中学习的表示。我们在16个不同的NLU任务上对AdapterFusion进行了实证评估,发现它可以有效地在模型的不同层结合各种类型的知识。我们表明,我们的方法优于传统策略,如完全微调以及多任务学习。我们的代码和adapters可在AdapterHub.ml上获得。
arXiv, 2020-05-01T07:03:42Z. DOI: 10.48550/arXiv.2005.00247
Abstract:
Sequential fine-tuning and multi-task learning are methods aiming toincorporate knowledge from multiple tasks; however, they suffer fromcatastrophic forgetting and difficulties in dataset balancing. To address theseshortcomings, we propose AdapterFusion, a … >>>
Sequential fine-tuning and multi-task learning are methods aiming toincorporate knowledge from multiple tasks; however, they suffer fromcatastrophic forgetting and difficulties in dataset balancing. To address theseshortcomings, we propose AdapterFusion, a new two stage learning algorithm thatleverages knowledge from multiple tasks. First, in the knowledge extractionstage we learn task specific parameters called adapters, that encapsulate thetask-specific information. We then combine the adapters in a separate knowledgecomposition step. We show that by separating the two stages, i.e., knowledgeextraction and knowledge composition, the classifier can effectively exploitthe representations learned from multiple tasks in a non-destructive manner. Weempirically evaluate AdapterFusion on 16 diverse NLU tasks, and find that iteffectively combines various types of knowledge at different layers of themodel. We show that our approach outperforms traditional strategies such asfull fine-tuning as well as multi-task learning. Our code and adapters areavailable at AdapterHub.ml. <<<
翻译
18.
刘昊辰 (2024-10-12 10:09):
#paper arXiv:2409.12272v1 [cs.LG] 18 Sep 2024, Mastering Chess with a Transformer Model. 这是一篇关于Transformer模型在国际象棋中的应用的研究论文。论文证明了Transformer在国际象棋中的有效性在很大程度上取决于注意力机制中位置编码的选择。基于这一观察,论文采用了Shaw等人的通用位置编码方案,并大规模地训练了具有这种技术和其他增强功能的模型,将得到的架构称为ChessFormer。这种架构在对弈实力和解谜能力方面显著优于先前的工作,且计算成本只是其一小部分。下载地址:https://arxiv.org/pdf/2409.12272
arXiv, 2024-09-18T19:05:21Z. DOI: 10.48550/arXiv.2409.12272
Abstract:
Transformer models have demonstrated impressive capabilities when trained atscale, excelling at difficult cognitive tasks requiring complex reasoning andrational decision-making. In this paper, we explore the application oftransformer models to chess, … >>>
Transformer models have demonstrated impressive capabilities when trained atscale, excelling at difficult cognitive tasks requiring complex reasoning andrational decision-making. In this paper, we explore the application oftransformer models to chess, focusing on the critical role of the positionencoding within the attention mechanism. We show that in chess, transformersendowed with a sufficiently versatile position encoding can match existingchess-playing models at a fraction of the computational cost. Our architecturesignificantly outperforms AlphaZero at 8x fewer FLOPS and matches priorgrandmaster-level transformer-based agents at 30x fewer FLOPS. <<<
翻译
19.
尹志 (2024-09-30 23:02):
#paper https://doi.org/10.48550/arXiv.2405.20328 mRNA secondary structure prediction using utility-scale quantum computers。 这是今年IBM和Moderna合作的一篇工作。作者用CVaR-based VQE算法对mRNA的二级结构做了预测。RNA由于其单链多变的特性,非常难以预测。当然也正是这个原因,在计算上很容易被归类到组合优化问题的范畴。因此利用量子计算机去设计特定算法来加速解决,并给出最优结构显得顺理成章。文章使用了IBM的量子处理器Eagle和Heron, 得出的结果和经典算法CPLEX保持一致。当然,考虑到使用了NISQ的方式,如何保证机器的校准及错误抑制文章并没有交代的很细致,默认Eagle和Heron已经做到了吧。当然,这也给VQC算法(包括VQE、QAOA)解决组合优化问题做了一个很好的示范,充分证明了变分算法的灵活性。
arXiv, 2024-05-30T17:58:17Z. DOI: 10.48550/arXiv.2405.20328
Abstract:
Recent advancements in quantum computing have opened new avenues for tacklinglong-standing complex combinatorial optimization problems that are intractablefor classical computers. Predicting secondary structure of mRNA is one suchnotoriously difficult problem … >>>
Recent advancements in quantum computing have opened new avenues for tacklinglong-standing complex combinatorial optimization problems that are intractablefor classical computers. Predicting secondary structure of mRNA is one suchnotoriously difficult problem that can benefit from the ever-increasingmaturity of quantum computing technology. Accurate prediction of mRNA secondarystructure is critical in designing RNA-based therapeutics as it dictatesvarious steps of an mRNA life cycle, including transcription, translation, anddecay. The current generation of quantum computers have reached utility-scale,allowing us to explore relatively large problem sizes. In this paper, weexamine the feasibility of solving mRNA secondary structures on a quantumcomputer with sequence length up to 60 nucleotides representing problems in thequbit range of 10 to 80. We use Conditional Value at Risk (CVaR)-based VQEalgorithm to solve the optimization problems, originating from the mRNAstructure prediction problem, on the IBM Eagle and Heron quantum processors. Toour encouragement, even with ``minimal'' error mitigation and fixed-depthcircuits, our hardware runs yield accurate predictions of minimum free energy(MFE) structures that match the results of the classical solver CPLEX. Ourresults provide sufficient evidence for the viability of solving mRNA structureprediction problems on a quantum computer and motivate continued research inthis direction. <<<
翻译
20.
张浩彬 (2024-09-30 17:03):
#paper DOI 10.48550/arXiv.1902.00751 Parameter-Efficient Transfer Learning for NLP 。ICML 2019 Google 提出了Adapter,这算是peft方法中的开篇文章了。最近在整理大模型的peft的经典文章准备给学生上课,这篇作为开篇最为合适。 微调大型预训练模型是NLP中有效的迁移机制。然而,在存在许多下游任务的情况下,微调在参数效率方面不佳:每个任务都需要一个全新的模型。作为替代方案,我们提出了使用适配器模块进行迁移。适配器模块产生紧凑且可扩展的模型;它们只为每个任务添加少量可训练参数,并且可以在不重新访问之前任务的情况下添加新任务。原始网络的参数保持固定,从而产生高度的参数共享。为了证明适配器的有效性,我们将最近提出的BERT Transformer模型迁移到26个不同的文本分类任务,包括GLUE基准测试。适配器达到了接近最先进的性能,同时每个任务只添加少量参数。在GLUE上,我们达到了完全微调性能的0.4%以内,每个任务只增加3.6%的参数。相比之下,微调每个任务训练100%的参数。 论文中提出了以往的领域适应方法,我们都需要单独对模型进行训练,一般来说包括了两种办法,分别是基于特征的迁移和微调。基于特征的迁移就是基于预训练的embedding模型进行作为特征输入,然后输入到特定的下游任务模型中。
arXiv, 2019-02-02T16:29:47Z. DOI: 10.48550/arXiv.1902.00751
Abstract:
Fine-tuning large pre-trained models is an effective transfer mechanism inNLP. However, in the presence of many downstream tasks, fine-tuning isparameter inefficient: an entire new model is required for every task. … >>>
Fine-tuning large pre-trained models is an effective transfer mechanism inNLP. However, in the presence of many downstream tasks, fine-tuning isparameter inefficient: an entire new model is required for every task. As analternative, we propose transfer with adapter modules. Adapter modules yield acompact and extensible model; they add only a few trainable parameters pertask, and new tasks can be added without revisiting previous ones. Theparameters of the original network remain fixed, yielding a high degree ofparameter sharing. To demonstrate adapter's effectiveness, we transfer therecently proposed BERT Transformer model to 26 diverse text classificationtasks, including the GLUE benchmark. Adapters attain near state-of-the-artperformance, whilst adding only a few parameters per task. On GLUE, we attainwithin 0.4% of the performance of full fine-tuning, adding only 3.6% parametersper task. By contrast, fine-tuning trains 100% of the parameters per task. <<<
翻译
回到顶部