文献收藏与分享平台

1.

尹志 (2025-06-30 23:17):

#paper arXiv:2411.09131；Artificial Intelligence for Quantum Computing；2024；Yuri大佬带领的一篇综述，把AI用于量子计算的几个方面都做了分析和展望，虽然不是特别细致，但如果你希望量子计算能更快做出实际问题的优越性，显然不应该错过这篇综述。

arXiv, 2024-11-14T02:11:16Z. DOI: 10.48550/arXiv.2411.09131

Artificial Intelligence for Quantum Computing

翻译

Yuri Alexeev, Marwa H. Farag, Taylor L. Patti, Mark E. Wolf, Natalia Ares, Alán Aspuru-Guzik, Simon C. Benjamin, Zhenyu Cai, Zohim Chandani, Federico Fedele, ... >>>

Abstract:

Artificial intelligence (AI) advancements over the past few years have had anunprecedented and revolutionary impact across everyday application areas. Itssignificance also extends to technical challenges within science andengineering, including the … >>>

翻译

2.

刘馨云 (2025-06-30 20:34):

#paper arXiv:2406.10206；Visual Imitation Enables Contextual Humanoid Control；UC Berkeley, 2024；链接：https://videomimic.net VIDEOMIMIC 是一个从现实视频中学习上下文感知技能的类人机器人控制方法。论文提出一种 real-to-sim-to-real 模型训练管线，首次实现在无任务标签、无奖励函数、无 MoCap 情况下，仅通过日常视频即可训练并部署一个能上下楼梯、坐下、起立、越障的通用控制策略。核心贡献：首次提出从单目日常视频中提取4D人-场景几何信息用于机器人控制学习：同时重建人体运动与环境几何（mesh）；使用人体身高先验解决尺度不确定性，生成物理仿真可用的环境与动作数据。设计了多阶段 RL 策略训练管线，实现从视频到通用策略的学习：采用 MoCap 数据预训练；引入高度图作为环境输入，实现地形感知；利用 DAgger 蒸馏去除目标角依赖，训练单一策略统一执行坐起/上下楼等多任务。所学策略仅依赖机器人自身状态与 LiDAR 高度图即可在真实机器人上运行：使用 Unitree G1 部署，实现在室内外多种楼梯、草地、椅子场景下动作；在未知环境中无需任务标签，通过“地形+方向”自然触发相应行为。相较基线方法，VIDEOMIMIC 重建精度与泛化能力大幅提升：

arXiv, 2025-05-06T17:57:12Z. DOI: 10.48550/arXiv.2505.03729

Visual Imitation Enables Contextual Humanoid Control

翻译

Arthur Allshire, Hongsuk Choi, Junyi Zhang, David McAllister, Anthony Zhang, Chung Min Kim, Trevor Darrell, Pieter Abbeel, Jitendra Malik, Angjoo Kanazawa

Abstract:

How can we teach humanoids to climb staircases and sit on chairs using thesurrounding environment context? Arguably, the simplest way is to just showthem-casually capture a human motion video and … >>>

翻译

3.

林海onrush (2025-06-07 13:27):

#paper, Token-Importance Guided Direct Preference Optimization，DOI: https://arxiv.org/abs/2505.19653, share一下个人最新的大模型微调算法工作，我们针对大语言模型（LLMs）如何更好地对齐人类偏好提出了一种新方法——TI-DPO。以往常用的DPO（直接偏好优化）方法虽然省去了显式奖励模型，直接用人类偏好数据优化模型，但忽略了不同token（词/字）在生成内容中的重要性差异，这可能导致模型在关键token上犯错，从而产生不符合人类价值观的输出。 TI-DPO通过两大创新点解决了这一问题： 1. 在token level层面引入基于梯度归因的Token重要性权重，能动态识别和优先优化对人类偏好最关键的token； 2. 加入基于对比学习的Triplet（三元组）损失，不仅区分“好-坏”样本，还引入“中间”输出，使优化更细致，有助于模型生成更接近人类期望、远离不理想响应的内容。实验表明，TI-DPO在多个任务上（如TruthfulQA、IFEval等）表现优异，准确率和生成多样性均超过DPO及其他对齐方法。消融实验进一步验证了token-importance机制和triplet loss的必要性和有效性。理论分析还证明了TI-DPO在优化上拥有更严格的损失下界，训练过程更加稳定。TI-DPO通过精细化地关注关键token，并结合三元组对齐结构，有效提升了大模型的对齐能力与输出质量，为人机交互中的AI安全和有用性提供了新的解决方案。

arXiv, 2025-05-26T08:11:24Z. DOI: 10.48550/arXiv.2505.19653

Token-Importance Guided Direct Preference Optimization

翻译

Ning Yang, Hai Lin, Yibo Liu, Baoliang Tian, Guoqing Liu, Haijun Zhang

Abstract:

Ensuring that large language models (LLMs) generate outputs aligned withhuman preferences is important for safe and effective AI interactions. WhileDirect Preference Optimization (DPO) employs an implicit reward function tooptimize the … >>>

翻译

4.

刘昊辰 (2025-06-03 16:34):

#paper AlphaZero-Edu Making AlphaZero Accessible to Everyone. AlphaZero-Edu 是基于 AlphaZero 数学框架开发的轻量化强化学习框架，专为教育场景和五子棋设计，具有模块化架构（解耦蒙特卡洛树搜索、自我对弈训练、策略价值网络）、资源高效训练（单块 NVIDIA RTX 3090 GPU 即可运行）和高度并行自我对弈数据生成（8 进程实现 3.2 倍加速）等特点。其状态特征采用 21 层张量（含当前状态和 20 层历史状态），输出包含策略概率分布和价值评估标量，并通过旋转 / 翻转数据增强提升泛化能力。训练中结合循环学习率调度器，使策略损失和价值损失均收敛，且在与 4 名人类玩家的对战中实现最高 100% 胜率，最低 60% 胜率（含 20% 平局）。该框架已开源，为学术研究和工业应用提供了可访问的基准。下载地址：https://arxiv.org/pdf/2504.14636

arXiv, 2025-04-20T14:29:39Z. DOI: 10.48550/arXiv.2504.14636

AlphaZero-Edu: Making AlphaZero Accessible to Everyone

翻译

Abstract:

Recent years have witnessed significant progress in reinforcement learning,especially with Zero-like paradigms, which have greatly boosted thegeneralization and reasoning abilities of large-scale language models.Nevertheless, existing frameworks are often plagued by … >>>

翻译

5.

符毓 (2025-05-31 22:59):

#paper doi: 10.48550/arXiv.2505.21906, 2025, Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge. 视觉-语言-动作 (VLA) 模型已成为机器人领域的下一代模型。然而，尽管现有的端到端 VLA 系统利用了强大的预训练视觉-语言模型 (VLM)，但在微调过程中，由于模型需要适应特定的机器人任务，它们往往会丢失关键功能。我们认为，一个可泛化的 VLA 模型应该保留并扩展 VLM 的核心能力：1）开放世界具身推理——VLA 应该继承 VLM 的知识，即识别 VLM 能够识别的任何事物，能够解决数学问题，并具备视觉空间智能；2）推理跟随——有效地将开放世界推理转化为机器人可执行的步骤。本文推出ChatVLA-2，通过端到端利用预训练视觉语言模型所获得的先天推理和理解能力，赋予视觉-语言-动作 (VLA) 模型执行各种任务的能力。核心贡献是在预训练的视觉语言主干之上集成了一个dynamic Mixture-of-Experts (MoE)模块。该模块可以有效地管理不同的任务需求，其中一些专家共识共享普遍的多模态特征，而其他专家则专注于特定任务的表征。此外，提出了一种两阶段训练策略：首先，引导 VLA 模型建立预训练多模态知识与机器人动作之间的联系；随后，引入推理跟踪阶段，使模型能够理解推理输出并有效地将其转化为相应的动作。

arXiv, 2025-05-28T02:48:42Z. DOI: 10.48550/arXiv.2505.21906

Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge

翻译

Zhongyi Zhou, Yichen Zhu, Junjie Wen, Chaomin Shen, Yi Xu

Abstract:

Vision-language-action (VLA) models have emerged as the next generation ofmodels in robotics. However, despite leveraging powerful pre-trainedVision-Language Models (VLMs), existing end-to-end VLA systems often lose keycapabilities during fine-tuning as the … >>>

翻译

6.

刘馨云 (2025-05-31 21:32):

#paper https://arxiv.org/pdf/2505.20290 人类通过观察他人来学习新任务。受到这一点启发，我们提出了 EgoZero 框架，它可以从人类佩戴智能眼镜拍摄的第三人称视频中学习闭环机器人策略。智能眼镜能够捕捉人类交互的丰富多模态第一人称视角：RGB 视频记录周围场景，IMU（惯性测量单元）提供头部运动信息，麦克风则记录对话和环境声音。我们的方法仅通过观察这些第一人称视频来学习如何行动，无需任何机器人演示。当给定一个人类完成任务的视频时，EgoZero 能预测一系列中间目标和语言子目标，并据此在真实机器人上以闭环方式执行任务。EgoZero 将人类观察压缩为与机器人形态无关的状态表示，这些表示可用于决策和闭环控制。所学策略在不同的机器人形态、环境和任务之间表现出良好的泛化能力。我们在真实的 Franka Panda 机械臂上进行了验证，结果表明 EgoZero 能以 70% 的零样本成功率完成多种具有挑战性的操控任务，每项任务仅需 20 分钟的数据采集时间。

arXiv, 2025-05-26T17:59:17Z. DOI: 10.48550/arXiv.2505.20290

EgoZero: Robot Learning from Smart Glasses

翻译

Vincent Liu, Ademi Adeniji, Haotian Zhan, Raunaq Bhirangi, Pieter Abbeel, Lerrel Pinto

Abstract:

Despite recent progress in general purpose robotics, robot policies still lagfar behind basic human capabilities in the real world. Humans interactconstantly with the physical world, yet this rich data resource … >>>

翻译

7.

尹志 (2025-05-31 21:23):

#paper https://doi.org/10.48550/arXiv.2012.07436 Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting。这是AAAI2021上的一篇关于长序列时序建模的经典工作。文章对传统Transformer进行了改进，提出了一类新的模型Informer，通过对self attention的改进和蒸馏，以及generative style decoder的构建，在时间复杂度、空间复杂度上都改善了传统Transformer存在的问题。该工作在多个数据集上取得了良好的性能。上述的几个思路在后续的时序建模中被频繁使用，非常具有启发性。

arXiv, 2020-12-14T11:43:09Z. DOI: 10.48550/arXiv.2012.07436

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

翻译

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, Wancai Zhang

Abstract:

Many real-world applications require the prediction of long sequencetime-series, such as electricity consumption planning. Long sequencetime-series forecasting (LSTF) demands a high prediction capacity of the model,which is the ability to … >>>

翻译

8.

符毓 (2025-04-30 22:15):

#paper doi: arxiv.org/abs/2504.19193, 2025, Trajectory Planning with Model Predictive Control for Obstacle Avoidance Considering Prediction Uncertainty. 本文介绍了一种用于自主机器人的新型轨迹规划器，在机器人操作系统(ROS2) 和导航框架(Nav2)中融入动态避障功能来增强导航性能。该方法利用模型预测控制 (MPC)，重点处理与动态障碍物运动预测相关的不确定性。与主要处理静态障碍物或对动态障碍物当前位置做出反应的现有Nav2轨迹规划器不同，该规划器预测未来障碍物的位置，从而确保机器人避开可能存在障碍物的区间

arXiv, 2025-04-27T11:00:19Z. DOI: 10.48550/arXiv.2504.19193

Trajectory Planning with Model Predictive Control for Obstacle Avoidance Considering Prediction Uncertainty

翻译

Eric Schöneberg, Michael Schröder, Daniel Görges, Hans D. Schotten

Abstract:

This paper introduces a novel trajectory planner for autonomous robots,specifically designed to enhance navigation by incorporating dynamic obstacleavoidance within the Robot Operating System 2 (ROS2) and Navigation 2 (Nav2)framework. The … >>>

翻译

9.

尹志 (2025-04-30 15:56):

#paper doi:10.48550/arXiv.2407.20516, Machine Unlearning in Generative AI: A Survey. 很有意思的方向，应该是翻译机器遗忘吧。随着模型越做越大，如何通过对模型的处理达到可控的添加与擦除特定信息，是未来一个重要的主题，不管是从隐私保护还是模型控制的层面上

arXiv, 2024-07-30T03:26:09Z. DOI: 10.48550/arXiv.2407.20516

Machine Unlearning in Generative AI: A Survey

翻译

Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, Meng Jiang

Abstract:

Generative AI technologies have been deployed in many places, such as(multimodal) large language models and vision generative models. Theirremarkable performance should be attributed to massive training data andemergent reasoning abilities. … >>>

翻译

10.

Vincent (2025-03-31 16:09):

#paper doi: https://doi.org/10.48550/arXiv.2503.00096 BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology 大语言模型在加速科学发现方面展现出了重要潜力。目前大语言模型智能体在生物信息领域的应用缺乏系统评估，这篇文章整理了近50个真实场景，约300个开放性问题来衡量基于大语言模型的智能体在解决复杂生信问题的能力，作者测试了两个前沿大语言模型(gpt 4o和claude 3.5 sonnet)，发现这些模型在回答开放性问题的准确率都较低，回答多选问题的能力也并不比随机选择策略好。这篇文章的贡献在于提供了测试用例与评估框架，为更搭建性能更好的智能体打下了基础

arXiv, 2025-02-28T18:47:57Z. DOI: 10.48550/arXiv.2503.00096

BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology

翻译

Ludovico Mitchener, Jon M Laurent, Benjamin Tenmann, Siddharth Narayanan, Geemi P Wellawatte, Andrew White, Lorenzo Sani, Samuel G Rodriques

Abstract:

Large Language Models (LLMs) and LLM-based agents show great promise inaccelerating scientific research. Existing benchmarks for measuring thispotential and guiding future development continue to evolve from pure recalland rote knowledge … >>>

翻译

11.

尹志 (2025-03-31 15:06):

#paper：doi：doi.org/10.48550/arXiv.2502.11974, Image Inversion: A Survey from GANs to Diffusion and Beyond(2025). 综述了image inversion常见的算法模型，很新，主要介绍了GAN和diffusion模型，也提了DiT和Rectified Flow框架。image inversion的核心问题涉及latent space, 对其它生成式AI的问题都非常重要。

arXiv, 2025-02-17T16:20:48Z. DOI: 10.48550/arXiv.2502.11974

Image Inversion: A Survey from GANs to Diffusion and Beyond

翻译

Yinan Chen, Jiangning Zhang, Yali Bi, Xiaobin Hu, Teng Hu, Zhucun Xue, Ran Yi, Yong Liu, Ying Tai

Abstract:

Image inversion is a fundamental task in generative models, aiming to mapimages back to their latent representations to enable downstream applicationssuch as editing, restoration, and style transfer. This paper provides … >>>

翻译

12.

Kunji (2025-02-28 23:59):

#paper, https://arxiv.org/pdf/2410.05273, HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers, VLA依赖于数十亿参数的VLM，虽然具有强大的泛化能力，但计算成本高、推理速度慢，限制了其在动态任务中的应用。为了解决这些局限性，文章提出了HiRT框架(Hierarchical Robot Transformer framework)，借鉴了人类认知的双过程理论，采用双系统架构和异步操作机制，实现频率与性能之间的平衡。在模拟和真实环境中的实验结果表明，HiRT取得了显著的改进。在静态任务中，控制频率提高了一倍，并实现了相当的成功率。此外，在之前VLA模型难以应对的真实世界动态操作任务中，HiRT将成功率从48%提高到了75%。

arXiv, 2024-09-12T09:18:09Z. DOI: 10.48550/arXiv.2410.05273

HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers

翻译

Jianke Zhang, Yanjiang Guo, Xiaoyu Chen, Yen-Jen Wang, Yucheng Hu, Chengming Shi, Jianyu Chen

Abstract:

Large Vision-Language-Action (VLA) models, leveraging powerful pre trainedVision-Language Models (VLMs) backends, have shown promise in robotic controldue to their impressive generalization ability. However, the success comes at acost. Their reliance … >>>

翻译

13.

符毓 (2025-02-28 23:00):

#paper doi.org/10.48550/arXiv.2411.13677, 2024, Bimanual Dexterity for Complex Tasks. 遥操作是机器人获取数据的重要方式。文章介绍了一种便携、低成本（总成本约12k美元，其中5k的手，7k的系统；可额外配合双机械臂16k）且极其精确的双手人形机器人手臂系统遥操作方法，展示了该系统在桌面和移动环境中的适用性，并展示了它在执行双手灵巧任务时相较于其他方法（如 SteamVR 和 Vision Pro等）的高效性。但由于缺乏触觉反馈，操作员只能依赖视觉反馈进行遥操作，无法感知机器人手臂的感觉

arXiv, 2024-11-20T19:53:35Z. DOI: 10.48550/arXiv.2411.13677

Bimanual Dexterity for Complex Tasks

翻译

Kenneth Shaw, Yulong Li, Jiahui Yang, Mohan Kumar Srirama, Ray Liu, Haoyu Xiong, Russell Mendonca, Deepak Pathak

Abstract:

To train generalist robot policies, machine learning methods often require asubstantial amount of expert human teleoperation data. An ideal robot forhumans collecting data is one that closely mimics them: bimanual … >>>

翻译

14.

尹志 (2025-02-28 15:55):

#paper doi:10.48550/arXiv.2205.15463 Few-Shot Diffusion Models. 文章提出了一种扩散模型及set-based ViT的方式实现few shot生成的技术。实验表明，该模型仅需5个样本就可以完成新类别的生成。

arXiv, 2022-05-30T23:20:33Z. DOI: 10.48550/arXiv.2205.15463

Few-Shot Diffusion Models

翻译

Giorgio Giannone, Didrik Nielsen, Ole Winther

Abstract:

Denoising diffusion probabilistic models (DDPM) are powerful hierarchicallatent variable models with remarkable sample generation quality and trainingstability. These properties can be attributed to parameter sharing in thegenerative hierarchy, as well … >>>

翻译

15.

刘昊辰 (2025-02-25 22:38):

#paper Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks. 这是一篇关于如何使用强化学习（Reinforcement Learning）和循环神经网络（Recurrent Neural Networks, RNN）来玩六角格战棋游戏（Hex and Counter Wargames）的研究论文。论文提出一种结合AlphaZero强化学习算法和循环神经网络的新系统，以应对六角格战棋游戏的战略复杂性。该系统能够在不同地形和战术情况下进行泛化，并探索其在更大地图上的扩展能力。提出的系统在有限的训练资源和计算能力下，能够在复杂的六角格战棋游戏中取得良好的表现，展示了其在复杂场景中的泛化能力。下载地址：https://arxiv.org/abs/2502.13918

arXiv, 2025-02-19T17:52:45Z. DOI: 10.48550/arXiv.2502.13918

Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks

翻译

Guilherme Palma, Pedro A. Santos, João Dias

Abstract:

Hex and Counter Wargames are adversarial two-player simulations of realmilitary conflicts requiring complex strategic decision-making. Unlikeclassical board games, these games feature intricate terrain/unit interactions,unit stacking, large maps of varying sizes, … >>>

翻译

16.

惊鸿 (2025-02-15 00:02):

#paper DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Pub Date : 2024-05-07 DOI : arxiv-2405.04434 我们提出了 DeepSeek-V2，一种强大的专家混合 (MoE) 语言模型，其特点是经济的训练和高效的推理。它总共包括236B个参数，其中每个令牌激活21B个参数，并支持128K令牌的上下文长度。 DeepSeek-V2采用多头潜在注意力（MLA）和DeepSeekMoE等创新架构。 MLA 通过将键值 (KV) 缓存显着压缩为潜在向量来保证高效推理，而 DeepSeekMoE 则可以通过稀疏计算以经济的成本训练强大的模型。与 DeepSeek 67B 相比，DeepSeek-V2 性能显着增强，同时节省了 42.5% 的训练成本，减少了 93.3% 的 KV 缓存，最大生成吞吐量提升至 5.76 倍。我们在由 8.1T 代币组成的高质量多源语料库上对 DeepSeek-V2 进行预训练，并进一步进行监督微调（SFT）和强化学习（RL）以充分释放其潜力。评估结果表明，即使只有21B个激活参数，DeepSeek-V2及其聊天版本仍然达到了开源模型中顶级的性能。模型检查点位于“https://github.com/deepseek-ai/DeepSeek-V2”。

arXiv, 2024-05-07T15:56:43Z. DOI: 10.48550/arXiv.2405.04434

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

翻译

DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, ... >>>

Abstract:

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language modelcharacterized by economical training and efficient inference. It comprises 236Btotal parameters, of which 21B are activated for each token, and supports acontext … >>>

翻译

17.

林海onrush (2025-01-31 23:53):

#paper, https://doi.org/10.48550/arXiv.2312.01156, Efficient Light Source Placement using Quantum Computing, 这是一个有趣的小问题，如何利用量子计算解决《我的世界》游戏中的火把放置问题，将形式转化为二次无约束二进制优化（QUBO）问题，通过迭代学习拉格朗日乘子来处理约束条件。实验说明该方法能在合理迭代次数内找到有效的火把放置方案，虽然当前量子硬件存在局限性，经典方法在较大地图上表现更优一些。火把放置问题与集合覆盖问题相联系，展示量子计算在资源优化问题中的价值。

arXiv, 2023-12-02T15:28:59Z. DOI: 10.48550/arXiv.2312.01156

Efficient Light Source Placement using Quantum Computing

翻译

Sascha Mücke, Thore Gerlach

Abstract:

NP-hard problems regularly come up in video games, with interestingconnections to real-world problems. In the game Minecraft, players placetorches on the ground to light up dark areas. Placing them in … >>>

翻译

18.

前进 (2025-01-31 22:31):

#paper 10.48550/arxiv.2408.10234 The Unbearable Slowness of Being: Why do we live at 10 bits/s? arXiv:2408.10234v2 [q-bio.NC] Jieyu Zheng, Markus Meiste 论文探讨了人类行为信息处理速度的悖论性缓慢。尽管人类的感官系统能够以每秒约10⁹比特（bits/s）的速度收集信息，但人类的整体信息处理速度却仅为每秒10比特。这种巨大的差异尚未得到充分解释，涉及大脑功能的许多基本方面。通过多种实验和案例，论文展示了人类行为的信息处理速度约为10 bits/s，且这种速度限制可能与大脑的串行处理特性有关。尽管外周神经系统（如视锥细胞和视神经）能够以极高的速率处理信息，但大脑的中枢部分似乎以串行方式处理信息，一次只能专注于一个任务。这种串行处理方式可能是大脑在进化过程中形成的，因为早期神经系统的主要功能是控制运动，而运动决策通常是局部的、单一的。此外，论文还提出大脑可能存在“外脑”和“内脑”两种模式：外脑负责处理高维度的感官输入和运动输出，信息处理速率极高；内脑则负责处理低维度的信息流，用于决策和行为控制，信息处理速率极低（约10 bits/s）。这种内外脑的分工可能是导致信息处理速度受限的重要原因。论文建议未来的研究需要进一步探索大脑内外信息处理的差异，以及如何优化信息处理效率。

arXiv, 2024-08-03T22:56:45Z. DOI: 10.48550/arXiv.2408.10234

The Unbearable Slowness of Being: Why do we live at 10 bits/s?

翻译

Jieyu Zheng, Markus Meister

Abstract:

This article is about the neural conundrum behind the slowness of humanbehavior. The information throughput of a human being is about 10 bits/s. Incomparison, our sensory systems gather data at … >>>

翻译

19.

尹志 (2025-01-31 17:05):

#paper https://doi.org/10.48550/arXiv.2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews 一篇讨论大语言模型使用情况的文章，特别举了在AI顶会评审中使用的具体例子。（包括ICLR 2024、NeurIPS 2023、CoRL 2023和EMNLP 2023。）研究发现，这些论文review中，有6.5%至16.9%可能被LLM大幅修改，而且这些review有很多有趣的特点，比如confidence比较低，接近ddl才提交，而且不太愿意回应作者反驳等。更多有趣的现象可参考原文。文章中贴了最常见的AI喜欢使用的形容词，比如“commendable”, “meticulous”, and “intricate”等，确实很像AI搞的，哈哈哈。看来以后审稿人要对作者更加负责才行噢。

arXiv, 2024-03-11T21:51:39Z. DOI: 10.48550/arXiv.2403.07183

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

翻译

Abstract:

We present an approach for estimating the fraction of text in a large corpuswhich is likely to be substantially modified or produced by a large languagemodel (LLM). Our maximum likelihood … >>>

翻译

20.

Vincent (2025-01-31 14:05):

#paper https://doi.org/10.48550/arXiv.2111.06377 arxiv. 2021. Masked Autoencoders Are Scalable Vision Learners. Computer vision里很经典的一篇文章，提出了一种简单、快速、有效的模型 Masked autoencoder (MAE)。核心思路是随机遮盖图像区域，然后用模型去复原这些被遮盖的区域。MAE由不对称的编码器和解码器构成，编码器将图像的可见区域编码到隐空间，解码器使用隐空间的数据表征和遮盖符还原原始图片。值得注意的是即使遮盖区域达到75%，还原的图像和原始图像仍然很像，也说明图像里面的信息是十分稀疏的。另外由于编码区域只使用了原始图像的一部分，这使得MAE能大大加速训练的过程，同时得益于自监督学习和更好的表征能力，其在下游任务的预测效果也更好。值得注意的是，这种“预测掩盖区域”的技术在语言模型中早有应用，这篇文章只是将其用在了CV领域，展现了CV也可以用NLP的一些研究思路来推进。

arXiv, 2021-11-11T18:46:40Z. DOI: 10.48550/arXiv.2111.06377

Masked Autoencoders Are Scalable Vision Learners

翻译

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick

Abstract:

This paper shows that masked autoencoders (MAE) are scalable self-supervisedlearners for computer vision. Our MAE approach is simple: we mask randompatches of the input image and reconstruct the missing pixels. … >>>

翻译