文献收藏与分享平台

1.

尹志 (2025-07-31 23:59):

#paper doi: 10.48550/arXiv.2507.06216 Unitary designs in nearly optimal depth. 文章设计了一种全新的量子电路，该电路可以接近理论最优深度高效构建unitray k-designs. 如果这个方案足够有效，那么对后续的量子算法的设计无疑非常有帮助。

arXiv, 2025-07-08T17:48:33Z. DOI: 10.48550/arXiv.2507.06216

Unitary designs in nearly optimal depth

翻译

Laura Cui, Thomas Schuster, Fernando Brandao, Hsin-Yuan Huang

Abstract:

We construct $\varepsilon$-approximate unitary $k$-designs on $n$ qubits incircuit depth $O(\log k \log \log n k / \varepsilon)$. The depth isexponentially improved over all known results in all three parameters … >>>

翻译

2.

尹志 (2025-06-30 23:17):

#paper arXiv:2411.09131；Artificial Intelligence for Quantum Computing；2024；Yuri大佬带领的一篇综述，把AI用于量子计算的几个方面都做了分析和展望，虽然不是特别细致，但如果你希望量子计算能更快做出实际问题的优越性，显然不应该错过这篇综述。

arXiv, 2024-11-14T02:11:16Z. DOI: 10.48550/arXiv.2411.09131

Artificial Intelligence for Quantum Computing

翻译

Yuri Alexeev, Marwa H. Farag, Taylor L. Patti, Mark E. Wolf, Natalia Ares, Alán Aspuru-Guzik, Simon C. Benjamin, Zhenyu Cai, Zohim Chandani, Federico Fedele, ... >>>

Abstract:

Artificial intelligence (AI) advancements over the past few years have had anunprecedented and revolutionary impact across everyday application areas. Itssignificance also extends to technical challenges within science andengineering, including the … >>>

翻译

3.

尹志 (2025-05-31 21:23):

#paper https://doi.org/10.48550/arXiv.2012.07436 Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting。这是AAAI2021上的一篇关于长序列时序建模的经典工作。文章对传统Transformer进行了改进，提出了一类新的模型Informer，通过对self attention的改进和蒸馏，以及generative style decoder的构建，在时间复杂度、空间复杂度上都改善了传统Transformer存在的问题。该工作在多个数据集上取得了良好的性能。上述的几个思路在后续的时序建模中被频繁使用，非常具有启发性。

arXiv, 2020-12-14T11:43:09Z. DOI: 10.48550/arXiv.2012.07436

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

翻译

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, Wancai Zhang

Abstract:

Many real-world applications require the prediction of long sequencetime-series, such as electricity consumption planning. Long sequencetime-series forecasting (LSTF) demands a high prediction capacity of the model,which is the ability to … >>>

翻译

4.

尹志 (2025-04-30 15:56):

#paper doi:10.48550/arXiv.2407.20516, Machine Unlearning in Generative AI: A Survey. 很有意思的方向，应该是翻译机器遗忘吧。随着模型越做越大，如何通过对模型的处理达到可控的添加与擦除特定信息，是未来一个重要的主题，不管是从隐私保护还是模型控制的层面上

arXiv, 2024-07-30T03:26:09Z. DOI: 10.48550/arXiv.2407.20516

Machine Unlearning in Generative AI: A Survey

翻译

Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, Meng Jiang

Abstract:

Generative AI technologies have been deployed in many places, such as(multimodal) large language models and vision generative models. Theirremarkable performance should be attributed to massive training data andemergent reasoning abilities. … >>>

翻译

5.

尹志 (2025-03-31 15:06):

#paper：doi：doi.org/10.48550/arXiv.2502.11974, Image Inversion: A Survey from GANs to Diffusion and Beyond(2025). 综述了image inversion常见的算法模型，很新，主要介绍了GAN和diffusion模型，也提了DiT和Rectified Flow框架。image inversion的核心问题涉及latent space, 对其它生成式AI的问题都非常重要。

arXiv, 2025-02-17T16:20:48Z. DOI: 10.48550/arXiv.2502.11974

Image Inversion: A Survey from GANs to Diffusion and Beyond

翻译

Yinan Chen, Jiangning Zhang, Yali Bi, Xiaobin Hu, Teng Hu, Zhucun Xue, Ran Yi, Yong Liu, Ying Tai

Abstract:

Image inversion is a fundamental task in generative models, aiming to mapimages back to their latent representations to enable downstream applicationssuch as editing, restoration, and style transfer. This paper provides … >>>

翻译

6.

尹志 (2025-02-28 15:55):

#paper doi:10.48550/arXiv.2205.15463 Few-Shot Diffusion Models. 文章提出了一种扩散模型及set-based ViT的方式实现few shot生成的技术。实验表明，该模型仅需5个样本就可以完成新类别的生成。

arXiv, 2022-05-30T23:20:33Z. DOI: 10.48550/arXiv.2205.15463

Few-Shot Diffusion Models

翻译

Giorgio Giannone, Didrik Nielsen, Ole Winther

Abstract:

Denoising diffusion probabilistic models (DDPM) are powerful hierarchicallatent variable models with remarkable sample generation quality and trainingstability. These properties can be attributed to parameter sharing in thegenerative hierarchy, as well … >>>

翻译

7.

尹志 (2025-01-31 17:05):

#paper https://doi.org/10.48550/arXiv.2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews 一篇讨论大语言模型使用情况的文章，特别举了在AI顶会评审中使用的具体例子。（包括ICLR 2024、NeurIPS 2023、CoRL 2023和EMNLP 2023。）研究发现，这些论文review中，有6.5%至16.9%可能被LLM大幅修改，而且这些review有很多有趣的特点，比如confidence比较低，接近ddl才提交，而且不太愿意回应作者反驳等。更多有趣的现象可参考原文。文章中贴了最常见的AI喜欢使用的形容词，比如“commendable”, “meticulous”, and “intricate”等，确实很像AI搞的，哈哈哈。看来以后审稿人要对作者更加负责才行噢。

arXiv, 2024-03-11T21:51:39Z. DOI: 10.48550/arXiv.2403.07183

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

翻译

Abstract:

We present an approach for estimating the fraction of text in a large corpuswhich is likely to be substantially modified or produced by a large languagemodel (LLM). Our maximum likelihood … >>>

翻译

8.

尹志 (2024-12-29 21:50):

#paper doi: 10.1021/acs.jcim.4c01107 Journal of Chemical Information and Modeling, 2024, Diffusion Models in De Novo Drug Design。又是一篇diffusion model用于药物设计的综述，很新很全面。这个领域的发展特别快，各种方法层出不穷，有了诺奖这次的加持，应该后面几年会有更多务实的成果。

Journal of Chemical Information and Modeling, 2024-10-14. DOI: 10.1021/acs.jcim.4c01107

Diffusion Models in De Novo Drug Design

翻译

Amira Alakhdar, Barnabas Poczos, Newell Washburn

Abstract: No abstract available.

9.

尹志 (2024-11-30 22:05):

#paper https://doi.org/10.48550/arXiv.1701.08223 2017, The Python-based Simulations of Chemistry Framework (PySCF)。非常重要的量子化学工具PySCF的介绍。2014年启动的项目，从一开始的仅仅有几个函数功能，到现在对各种量化问题的计算的良好支持，其易用性及可扩展性得到了社群的认可。这个特性其实在软件于2015年发布的时候就设定好了。因此，几乎所有功能代码都由python实现，只有遇到特别的time-ciritical的代码部分才去用c实现。当然，这个特性使得目前大量量化计算的库都依赖于pyscf，俨然成为开源领域的gaussion的有力竞争者。

arXiv, 2017-01-27T23:57:43Z. DOI: 10.48550/arXiv.1701.08223

The Python-based Simulations of Chemistry Framework (PySCF)

翻译

Abstract:

PySCF is a general-purpose electronic structure platform designed from theground up to emphasize code simplicity, both to aid new method development, aswell as for flexibility in computational workflow. The package … >>>

翻译

10.

尹志 (2024-10-31 13:55):

#paper doi.org/10.1038/sdata.2014.22 Quantum chemistry structures and properties of 134 kilo molecules， Scientific Data 1, 140022, 2014. 这是著名的数据集QM9的原始论文，最近在做相关计算工作，又好好读了一下。非常重要的工作，给后续各种量化计算提供了特别方便的benchmark。该工作使用DFT方法(B3LYP/6-31G(2df,p))计算了134k种小分子的各种量化性质，比如能量、偶极矩、极化率等。

Scientific Data, 2014-8-5. DOI: 10.1038/sdata.2014.22

Quantum chemistry structures and properties of 134 kilo molecules

翻译

Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld

Abstract: No abstract available.

11.

尹志 (2024-09-30 23:02):

#paper https://doi.org/10.48550/arXiv.2405.20328 mRNA secondary structure prediction using utility-scale quantum computers。这是今年IBM和Moderna合作的一篇工作。作者用CVaR-based VQE算法对mRNA的二级结构做了预测。RNA由于其单链多变的特性，非常难以预测。当然也正是这个原因，在计算上很容易被归类到组合优化问题的范畴。因此利用量子计算机去设计特定算法来加速解决，并给出最优结构显得顺理成章。文章使用了IBM的量子处理器Eagle和Heron，得出的结果和经典算法CPLEX保持一致。当然，考虑到使用了NISQ的方式，如何保证机器的校准及错误抑制文章并没有交代的很细致，默认Eagle和Heron已经做到了吧。当然，这也给VQC算法（包括VQE、QAOA）解决组合优化问题做了一个很好的示范，充分证明了变分算法的灵活性。

arXiv, 2024-05-30T17:58:17Z. DOI: 10.48550/arXiv.2405.20328

mRNA secondary structure prediction using utility-scale quantum computers

翻译

Dimitris Alevras, Mihir Metkar, Takahiro Yamamoto, Vaibhaw Kumar, Triet Friedhoff, Jae-Eun Park, Mitsuharu Takeori, Mariana LaDue, Wade Davis, Alexey Galda

Abstract:

Recent advancements in quantum computing have opened new avenues for tacklinglong-standing complex combinatorial optimization problems that are intractablefor classical computers. Predicting secondary structure of mRNA is one suchnotoriously difficult problem … >>>

翻译

12.

尹志 (2024-08-31 23:47):

#paper doi: 10.1038/s41586-019-1923-7, Nature volume 577, pages706–710 (2020), Improved protein structure prediction using potentials from deep learning， alphafold1的原始文献，在当时是一个非常重要的突破，让深度学习在生物领域开始大放光彩。后续各种围绕深度学习的改进，将AI+生物学推到了风口浪尖。虽然这篇alphafold1的工作现在来看，性能已经无法和当前的版本或者类似模型媲美，但创新性的引入深度学习，同时考虑蛋白质序列信息、二级结构、三维构象信息等多尺度信息建模的方式，都成为了后续的蛋白质折叠问题的研究的data driven的方法的基线模型。当然现在看来，使用potential of mean force这样比较物理的方式处理，可能是一种俘获问题的物理本质的有益尝试，对于data driven的方式的使用反而不是那么大胆。但对比后续越来越依靠大力出奇迹，我也更倾向于通过物理描述去俘获折叠问题的本质及动力学机制。

Nature, 2020-1-30. DOI: 10.1038/s41586-019-1923-7

Improved protein structure prediction using potentials from deep learning

翻译

Abstract: No abstract available.

13.

尹志 (2024-07-31 15:46):

#paper Machine learning-aided generative molecular design, nature machine intelligence, DOI: 10.1038/s42256-024-00843-5. 文章综述了生成模型做分子设计领域的情况。从表征、生成方法和优化策略层面进行了总结，特别清楚。感兴趣的同学可以直接看文章里的几张表格，作为了解该领域发展情况及切入研究问题非常有帮助。

IF:18.800Q1 Nature Machine Intelligence, 2024-6-18. DOI: 10.1038/s42256-024-00843-5

Machine learning-aided generative molecular design

翻译

Yuanqi Du, Arian R. Jamasb, Jeff Guo, Tianfan Fu, Charles Harris, Yingheng Wang, Chenru Duan, Pietro Liò, Philippe Schwaller, Tom L. Blundell

Abstract: No abstract available.

14.

尹志 (2024-06-30 17:56):

#paper DOI: 10.1038/s41534-017-0048-9 Coherent Ising machines—optical neural networks operating at the quantum limit. npj Quantum Inf 3, 49 (2017). 这个工作介绍了一种新型的量子计算方案，相干ising机(CIM)。区别于传统的量子计算方案，CIM有着特别的实用优势，比如对退相干时间没有要求。当然这和它的设计思路相关。从2011年理论方案出现，到目前真机落地的发展，其基本的原理没有变化，区别于传统量子计算在叠加态下进行逻辑计算，再统一进行readout的操作，CIM一开始就采取迭代读取的方式，在每一轮计算（演化）后，进行读读取测量，对系统状态（比如相位情况）进行计算反馈，从而在足够的迭代次数后获得保真度较高的计算结果。该方案特别适合目前的各类优化问题的解决，如果将传统的量子计算方案（量子电路为主）看成瀑布流的开发思路，那么CIM应该就是迭代的敏捷开发。CIM确实是很有意思的想法，希望能够在这个量子计算范式下结合AI做一些有意思的探索。

IF:6.600Q1 npj Quantum Information, 2017. DOI: 10.1038/s41534-017-0048-9

Coherent Ising machines—optical neural networks operating at the quantum limit

翻译

Yoshihisa Yamamoto, Kazuyuki Aihara, Timothee Leleu, Ken-ichi Kawarabayashi, Satoshi Kako, Martin Fejer, Kyo Inoue, Hiroki Takesue

Abstract:

AbstractIn this article, we will introduce the basic concept and the quantum feature of a novel computing system, coherent Ising machines, and describe their theoretical and experimental performance. We start … >>>

翻译

15.

尹志 (2024-05-30 15:52):

#paper Protein Conformation Generation via Force-Guided SE(3) Diffusion Models https://doi.org/10.48550/arXiv.2403.14088 字节跳动的一个新工作，还是蛋白质构象生成，还是SE(3) diffusion model, 不过区别于常见的静态构象的生成，这个工作提出了动态构象的生成，这当然有意义的多，毕竟真实世界的蛋白质构象是动态的，是一个构象分布。文章引入物理信息作为guidance，这个思路很有意思，因为这样既可以兼顾物理系统的先验，又回避了类似md这样的纯模型计算的性能问题，类似将md的计算进行了抽象，形成先验，作为guidance，然后利用生成模型进行生成。

arXiv, 2024. DOI: 10.48550/arXiv.2403.14088

Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

翻译

Yan Wang, Lihao Wang, Yuning Shen, Yiqun Wang, Huizhuo Yuan, Yue Wu, Quanquan Gu

Abstract:

The conformational landscape of proteins is crucial to understanding theirfunctionality in complex biological processes. Traditional physics-basedcomputational methods, such as molecular dynamics (MD) simulations, suffer fromrare event sampling and long equilibration … >>>

翻译

16.

尹志 (2024-04-30 22:48):

#paper doi：https://doi.org/10.48550/arXiv.2211.07697,NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations, 2022. Do Neural Networks Trained with Topological Features Learn Different Internal Representations? 作者主要讨论了使用拓扑特征训练神经网络和使用常规数据直接进行神经网络训练在表征上的区别。结论很有意思，比较容易猜到的是，两者确实有区别，特别是在作者选择的metrics下，这也说明了拓扑机器学习的价值。但作者发现在一些情况下，也存在可以利用简单的表征来替代拓扑特征训练的模型。当然，在具体的数据场景下怎么样提取出合适的拓扑特征显著区别于使用raw data可以提取的特征，这仍是一个开放的主题。

arXiv, 2022. DOI: 10.48550/arXiv.2211.07697

Do Neural Networks Trained with Topological Features Learn Different Internal Representations?

翻译

Sarah McGuire, Shane Jackson, Tegan Emerson, Henry Kvinge

Abstract:

There is a growing body of work that leverages features extracted viatopological data analysis to train machine learning models. While this field,sometimes known as topological machine learning (TML), has seen … >>>

翻译

17.

尹志 (2024-03-31 10:33):

#paper A roadmap for the computation of persistent homology. doi: 10.1140/epjds/s13688-017-0109-5 本文是持续同调计算的经典介绍，tutorial性质。持续同调作为拓扑数据分析或者拓扑深度学习的基本概念，其基于的数据表征、计算方法、计算工具多种多样。本文综述介绍了这些内容，虽然使用的是数学语言，但不晦涩，容易理解，方便非拓扑背景的研究者与学习者对持续同调的学习和使用。

IF:3.000Q1 EPJ Data Science, 2017. DOI: 10.1140/epjds/s13688-017-0109-5

A roadmap for the computation of persistent homology

翻译

Nina Otter, Mason A Porter, Ulrike Tillmann, Peter Grindrod, Heather A Harrington

Abstract: No abstract available.

18.

尹志 (2024-02-28 22:09):

#paper An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists doi: https://doi.org/10.48550/arXiv.1710.04019 生成式AI风光无两，Sora甚嚣尘上，虽然我还做不到这样的效果（对，我就是酸），但我却认为这不是终极方案，特别是对于物理世界、生物系统。The Bitter Lesson中对scaling law的强调甚至信奉，在语言、视频这样的领域有其价值，但生命科学、物理系统有数十亿年的的历史（物理系统应该是创始之初把），生命的演化、物理系统的本源，人类对其千百年来积累的原理性探索，应该是更优的先验。哦，回到这篇paper的主题。拓扑数据分析，是一种将系统的拓扑与几何性质引入分析建模过程，从而对系统获取更深刻的理解的工具。本篇综述对这个工具做了细致的讲解并对它的应用领域做了分析和tutorial。对拓扑数据分析这门技术的数学前置也做了简单但细致的介绍，主要是代数拓扑和计算几何。之所以有前面一段的碎碎念，就是因为我结合最近的一些实践，切实感受到拓扑和几何这些抽象的数学工具与生成式AI的结合，对生物系统和物理世界的描述，也许是优于目前暴力怼计算的一种更高效的建模方式，能够更深入系统的物理本质。如果你也相信物理系统和生命世界的简单高效的，是美丽简洁的，建议尝试一下这些新的技术。对了，这篇综述的revison信息是[Submitted on 11 Oct 2017 (v1), last revised 25 Feb 2021 (this version, v2)]，是不是说明了点什么呢？

arXiv, 2017. DOI: 10.48550/arXiv.1710.04019

An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists

翻译

Frédéric Chazal, Bertrand Michel

Abstract:

Topological Data Analysis is a recent and fast growing field providing a setof new topological and geometric tools to infer relevant features for possiblycomplex data. This paper is a brief … >>>

翻译

19.

尹志 (2024-01-31 10:39):

#paper doi: https://doi.org/10.48550/arXiv.2304.02643 Segment Anything。Meta在2023年的一篇工作，提出了一个CV领域的基础模型。文章的目标很清楚，通过prompt的方式，实现通用的segmentatoin任务。虽然在互联网上爆炒一轮后趋于平淡，但是对CV社区的影响还是非常大的。后续的Grounding-DINO,Grounded-SAM等工作，都有着不错的效果，而且对后续CV任务的解决给出了一套不同的思考范式。整个工作偏工程，或者想法上原创性的亮点不多，网络结构上也充分借鉴了大量基于Transformer的创新工作。值得一提的正是工程上的思路或者说解决方案。meta提出了一个新颖的任务，即：如何通过一个通用的任务来解决图像分割。进而设计训练流程和对应的损失。在过程中，设计了一套有效的数据标注引擎，实现了高效标注数据生产，这对于行业应用有着很强的借鉴价值。从研究角度来看，如何充分利用预训练好的sam模型，大模型中的先验如何提取，从而为特定领域下游任务提供支持是一个重要的研究方向。

arXiv, 2023. DOI: 10.48550/arXiv.2304.02643

Segment Anything

翻译

Abstract:

We introduce the Segment Anything (SA) project: a new task, model, anddataset for image segmentation. Using our efficient model in a data collectionloop, we built the largest segmentation dataset to … >>>

翻译

20.

尹志 (2023-12-31 14:32):

#paper Consistency Models https://doi.org/10.48550/arXiv.2303.01469 扩散模型目前已经是生成式AI的核心技术方案了，但是由于它的迭代生成的性质，使得采样速度一直存在问题，因此在实际应用的场景下就会遇到阻碍。CM(consistency models)作为常规的扩散模型的高效改进方案，基于PE(probability flow) ODE轨道，提出一个针对ODE轨道（可以认为是演化迭代的步骤）上的映射，使得我们能够从任意轨道点，即任意迭代的timestep，映射到初始点，即原图。cm模型的提出，让单步扩散模型采样的质量变得更高，从而带动了大量实际应用的产生，包括图像编辑、图像补全等。目前大量基于扩散模型的实际应用，都已经使用了cm。这个是年初的时候Yang Song大佬和Ilya Sutskever一起的工作，四个作者全部都是来自openAI的扩散模型大佬。

arXiv, 2023. DOI: 10.48550/arXiv.2303.01469

Consistency Models

翻译

Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever

Abstract:

Diffusion models have significantly advanced the fields of image, audio, andvideo generation, but they depend on an iterative sampling process that causesslow generation. To overcome this limitation, we propose consistency … >>>

翻译