来自用户 尹志 的文献。
当前共找到 37 篇文献分享,本页显示第 1 - 20 篇。
1.
尹志 (2024-11-30 22:05):
#paper https://doi.org/10.48550/arXiv.1701.08223 2017, The Python-based Simulations of Chemistry Framework (PySCF)。非常重要的量子化学工具PySCF的介绍。2014年启动的项目,从一开始的仅仅有几个函数功能,到现在对各种量化问题的计算的良好支持,其易用性及可扩展性得到了社群的认可。这个特性其实在软件于2015年发布的时候就设定好了。因此,几乎所有功能代码都由python实现,只有遇到特别的time-ciritical的代码部分才去用c实现。当然,这个特性使得目前大量量化计算的库都依赖于pyscf,俨然成为开源领域的gaussion的有力竞争者。
arXiv, 2017-01-27T23:57:43Z. DOI: 10.48550/arXiv.1701.08223
Abstract:
PySCF is a general-purpose electronic structure platform designed from theground up to emphasize code simplicity, both to aid new method development, aswell as for flexibility in computational workflow. The package … >>>
PySCF is a general-purpose electronic structure platform designed from theground up to emphasize code simplicity, both to aid new method development, aswell as for flexibility in computational workflow. The package provides a widerange of tools to support simulations of finite size systems, extended systemswith periodic boundary conditions, low dimensional periodic systems, and customHamiltonians, using mean-field and post-mean-field methods with standardGaussian basis functions. To ensure easy of extensibility, PySCF uses thePython language to implement almost all its features, while computationallycritical paths are implemented with heavily optimized C routines. Using thiscombined Python/C implementation, the package is as efficient as the bestexisting C or Fortran based quantum chemistry programs. In this paper wedocument the capabilities and design philosophy of the current version of thePySCF package. <<<
翻译
2.
尹志 (2024-10-31 13:55):
#paper doi.org/10.1038/sdata.2014.22 Quantum chemistry structures and properties of 134 kilo molecules, Scientific Data 1, 140022, 2014. 这是著名的数据集QM9的原始论文,最近在做相关计算工作, 又好好读了一下。非常重要的工作,给后续各种量化计算提供了特别方便的benchmark。该工作使用DFT方法(B3LYP/6-31G(2df,p))计算了134k种小分子的各种量化性质,比如能量、偶极矩、极化率等。
3.
尹志 (2024-09-30 23:02):
#paper https://doi.org/10.48550/arXiv.2405.20328 mRNA secondary structure prediction using utility-scale quantum computers。 这是今年IBM和Moderna合作的一篇工作。作者用CVaR-based VQE算法对mRNA的二级结构做了预测。RNA由于其单链多变的特性,非常难以预测。当然也正是这个原因,在计算上很容易被归类到组合优化问题的范畴。因此利用量子计算机去设计特定算法来加速解决,并给出最优结构显得顺理成章。文章使用了IBM的量子处理器Eagle和Heron, 得出的结果和经典算法CPLEX保持一致。当然,考虑到使用了NISQ的方式,如何保证机器的校准及错误抑制文章并没有交代的很细致,默认Eagle和Heron已经做到了吧。当然,这也给VQC算法(包括VQE、QAOA)解决组合优化问题做了一个很好的示范,充分证明了变分算法的灵活性。
arXiv, 2024-05-30T17:58:17Z. DOI: 10.48550/arXiv.2405.20328
Abstract:
Recent advancements in quantum computing have opened new avenues for tacklinglong-standing complex combinatorial optimization problems that are intractablefor classical computers. Predicting secondary structure of mRNA is one suchnotoriously difficult problem … >>>
Recent advancements in quantum computing have opened new avenues for tacklinglong-standing complex combinatorial optimization problems that are intractablefor classical computers. Predicting secondary structure of mRNA is one suchnotoriously difficult problem that can benefit from the ever-increasingmaturity of quantum computing technology. Accurate prediction of mRNA secondarystructure is critical in designing RNA-based therapeutics as it dictatesvarious steps of an mRNA life cycle, including transcription, translation, anddecay. The current generation of quantum computers have reached utility-scale,allowing us to explore relatively large problem sizes. In this paper, weexamine the feasibility of solving mRNA secondary structures on a quantumcomputer with sequence length up to 60 nucleotides representing problems in thequbit range of 10 to 80. We use Conditional Value at Risk (CVaR)-based VQEalgorithm to solve the optimization problems, originating from the mRNAstructure prediction problem, on the IBM Eagle and Heron quantum processors. Toour encouragement, even with ``minimal'' error mitigation and fixed-depthcircuits, our hardware runs yield accurate predictions of minimum free energy(MFE) structures that match the results of the classical solver CPLEX. Ourresults provide sufficient evidence for the viability of solving mRNA structureprediction problems on a quantum computer and motivate continued research inthis direction. <<<
翻译
4.
尹志 (2024-08-31 23:47):
#paper doi: 10.1038/s41586-019-1923-7, Nature volume 577, pages706–710 (2020), Improved protein structure prediction using potentials from deep learning, alphafold1的原始文献,在当时是一个非常重要的突破,让深度学习在生物领域开始大放光彩。后续各种围绕深度学习的改进,将AI+生物学推到了风口浪尖。虽然这篇alphafold1的工作现在来看,性能已经无法和当前的版本或者类似模型媲美,但创新性的引入深度学习,同时考虑蛋白质序列信息、二级结构、三维构象信息等多尺度信息建模的方式,都成为了后续的蛋白质折叠问题的研究的data driven的方法的基线模型。当然现在看来,使用potential of mean force这样比较物理的方式处理,可能是一种俘获问题的物理本质的有益尝试,对于data driven的方式的使用反而不是那么大胆。但对比后续越来越依靠大力出奇迹,我也更倾向于通过物理描述去俘获折叠问题的本质及动力学机制。
5.
尹志 (2024-07-31 15:46):
#paper Machine learning-aided generative molecular design, nature machine intelligence, DOI: 10.1038/s42256-024-00843-5. 文章综述了生成模型做分子设计领域的情况。从表征、生成方法和优化策略层面进行了总结,特别清楚。感兴趣的同学可以直接看文章里的几张表格,作为了解该领域发展情况及切入研究问题非常有帮助。
6.
尹志 (2024-06-30 17:56):
#paper DOI: 10.1038/s41534-017-0048-9 Coherent Ising machines—optical neural networks operating at the quantum limit. npj Quantum Inf 3, 49 (2017). 这个工作介绍了一种新型的量子计算方案,相干ising机(CIM)。区别于传统的量子计算方案,CIM有着特别的实用优势,比如对退相干时间没有要求。当然这和它的设计思路相关。从2011年理论方案出现,到目前真机落地的发展,其基本的原理没有变化,区别于传统量子计算在叠加态下进行逻辑计算,再统一进行readout的操作,CIM一开始就采取迭代读取的方式,在每一轮计算(演化)后,进行读读取测量,对系统状态(比如相位情况)进行计算反馈,从而在足够的迭代次数后获得保真度较高的计算结果。该方案特别适合目前的各类优化问题的解决,如果将传统的量子计算方案(量子电路为主)看成瀑布流的开发思路,那么CIM应该就是迭代的敏捷开发。CIM确实是很有意思的想法,希望能够在这个量子计算范式下结合AI做一些有意思的探索。
Abstract:
AbstractIn this article, we will introduce the basic concept and the quantum feature of a novel computing system, coherent Ising machines, and describe their theoretical and experimental performance. We start … >>>
AbstractIn this article, we will introduce the basic concept and the quantum feature of a novel computing system, coherent Ising machines, and describe their theoretical and experimental performance. We start with the discussion how to construct such physical devices as the quantum analog of classical neuron and synapse, and end with the performance comparison against various classical neural networks implemented in CPU and supercomputers. <<<
翻译
7.
尹志 (2024-05-30 15:52):
#paper  Protein Conformation Generation via Force-Guided SE(3) Diffusion Models  https://doi.org/10.48550/arXiv.2403.14088 字节跳动的一个新工作,还是蛋白质构象生成,还是SE(3) diffusion model, 不过区别于常见的静态构象的生成,这个工作提出了动态构象的生成, 这当然有意义的多,毕竟真实世界的蛋白质构象是动态的,是一个构象分布。文章引入物理信息作为guidance,这个思路很有意思,因为这样既可以 兼顾物理系统的先验,又回避了类似md这样的纯模型计算的性能问题,类似将md的计算进行了抽象,形成先验,作为guidance,然后利用生成模型进行生成。
Abstract:
The conformational landscape of proteins is crucial to understanding theirfunctionality in complex biological processes. Traditional physics-basedcomputational methods, such as molecular dynamics (MD) simulations, suffer fromrare event sampling and long equilibration … >>>
The conformational landscape of proteins is crucial to understanding theirfunctionality in complex biological processes. Traditional physics-basedcomputational methods, such as molecular dynamics (MD) simulations, suffer fromrare event sampling and long equilibration time problems, hindering theirapplications in general protein systems. Recently, deep generative modelingtechniques, especially diffusion models, have been employed to generate novelprotein conformations. However, existing score-based diffusion methods cannotproperly incorporate important physical prior knowledge to guide the generationprocess, causing large deviations in the sampled protein conformations from theequilibrium distribution. In this paper, to overcome these limitations, wepropose a force-guided SE(3) diffusion model, ConfDiff, for proteinconformation generation. By incorporating a force-guided network with a mixtureof data-based score models, ConfDiff can can generate protein conformationswith rich diversity while preserving high fidelity. Experiments on a variety ofprotein conformation prediction tasks, including 12 fast-folding proteins andthe Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our methodsurpasses the state-of-the-art method. <<<
翻译
8.
尹志 (2024-04-30 22:48):
#paper doi:https://doi.org/10.48550/arXiv.2211.07697,NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations, 2022. Do Neural Networks Trained with Topological Features Learn Different Internal Representations? 作者主要讨论了使用拓扑特征训练神经网络和使用常规数据直接进行神经网络训练在表征上的区别。结论很有意思,比较容易猜到的是,两者确实有区别,特别是在作者选择的metrics下,这也说明了拓扑机器学习的价值。但作者发现在一些情况下,也存在可以利用简单的表征来替代拓扑特征训练的模型。当然,在具体的数据场景下怎么样提取出合适的拓扑特征显著区别于使用raw data可以提取的特征,这仍是一个开放的主题。
Abstract:
There is a growing body of work that leverages features extracted viatopological data analysis to train machine learning models. While this field,sometimes known as topological machine learning (TML), has seen … >>>
There is a growing body of work that leverages features extracted viatopological data analysis to train machine learning models. While this field,sometimes known as topological machine learning (TML), has seen some notablesuccesses, an understanding of how the process of learning from topologicalfeatures differs from the process of learning from raw data is still limited.In this work, we begin to address one component of this larger issue by askingwhether a model trained with topological features learns internalrepresentations of data that are fundamentally different than those learned bya model trained with the original raw data. To quantify ``different'', weexploit two popular metrics that can be used to measure the similarity of thehidden representations of data within neural networks, neural stitching andcentered kernel alignment. From these we draw a range of conclusions about howtraining with topological features does and does not change the representationsthat a model learns. Perhaps unsurprisingly, we find that structurally, thehidden representations of models trained and evaluated on topological featuresdiffer substantially compared to those trained and evaluated on thecorresponding raw data. On the other hand, our experiments show that in somecases, these representations can be reconciled (at least to the degree requiredto solve the corresponding task) using a simple affine transformation. Weconjecture that this means that neural networks trained on raw data may extractsome limited topological features in the process of making predictions. <<<
翻译
9.
尹志 (2024-03-31 10:33):
#paper A roadmap for the computation of persistent homology. doi: 10.1140/epjds/s13688-017-0109-5 本文是持续同调计算的经典介绍,tutorial性质。持续同调作为拓扑数据分析或者拓扑深度学习的基本概念,其基于的数据表征、计算方法、计算工具多种多样。本文综述介绍了这些内容,虽然使用的是数学语言,但不晦涩,容易理解,方便非拓扑背景的研究者与学习者对持续同调的学习和使用。
10.
尹志 (2024-02-28 22:09):
#paper An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists doi: https://doi.org/10.48550/arXiv.1710.04019 生成式AI风光无两,Sora甚嚣尘上,虽然我还做不到这样的效果(对,我就是酸),但我却认为这不是终极方案,特别是对于物理世界、生物系统。The Bitter Lesson中对scaling law的强调甚至信奉,在语言、视频这样的领域有其价值,但生命科学、物理系统有数十亿年的的历史(物理系统应该是创始之初把),生命的演化、物理系统的本源,人类对其千百年来积累的原理性探索,应该是更优的先验。哦,回到这篇paper的主题。拓扑数据分析,是一种将系统的拓扑与几何性质引入分析建模过程,从而对系统获取更深刻的理解的工具。本篇综述对这个工具做了细致的讲解并对它的应用领域做了分析和tutorial。对拓扑数据分析这门技术的数学前置也做了简单但细致的介绍,主要是代数拓扑和计算几何。之所以有前面一段的碎碎念,就是因为我结合最近的一些实践,切实感受到拓扑和几何这些抽象的数学工具与生成式AI的结合,对生物系统和物理世界的描述,也许是优于目前暴力怼计算的一种更高效的建模方式,能够更深入系统的物理本质。如果你也相信物理系统和生命世界的简单高效的,是美丽简洁的,建议尝试一下这些新的技术。对了,这篇综述的revison信息是[Submitted on 11 Oct 2017 (v1), last revised 25 Feb 2021 (this version, v2)], 是不是说明了点什么呢?
Abstract:
Topological Data Analysis is a recent and fast growing field providing a setof new topological and geometric tools to infer relevant features for possiblycomplex data. This paper is a brief … >>>
Topological Data Analysis is a recent and fast growing field providing a setof new topological and geometric tools to infer relevant features for possiblycomplex data. This paper is a brief introduction, through a few selectedtopics, to basic fundamental and practical aspects of \tda\ for non experts. <<<
翻译
11.
尹志 (2024-01-31 10:39):
#paper doi: https://doi.org/10.48550/arXiv.2304.02643 Segment Anything。Meta在2023年的一篇工作,提出了一个CV领域的基础模型。文章的目标很清楚,通过prompt的方式,实现通用的segmentatoin任务。虽然在互联网上爆炒一轮后趋于平淡,但是对CV社区的影响还是非常大的。后续的Grounding-DINO,Grounded-SAM等工作,都有着不错的效果,而且对后续CV任务的解决给出了一套不同的思考范式。整个工作偏工程,或者想法上原创性的亮点不多,网络结构上也充分借鉴了大量基于Transformer的创新工作。值得一提的正是工程上的思路或者说解决方案。meta提出了一个新颖的任务,即:如何通过一个通用的任务来解决图像分割。进而设计训练流程和对应的损失。在过程中,设计了一套有效的数据标注引擎,实现了高效标注数据生产,这对于行业应用有着很强的借鉴价值。 从研究角度来看,如何充分利用预训练好的sam模型,大模型中的先验如何提取,从而为特定领域下游任务提供支持是一个重要的研究方向。
Abstract:
We introduce the Segment Anything (SA) project: a new task, model, anddataset for image segmentation. Using our efficient model in a data collectionloop, we built the largest segmentation dataset to … >>>
We introduce the Segment Anything (SA) project: a new task, model, anddataset for image segmentation. Using our efficient model in a data collectionloop, we built the largest segmentation dataset to date (by far), with over 1billion masks on 11M licensed and privacy respecting images. The model isdesigned and trained to be promptable, so it can transfer zero-shot to newimage distributions and tasks. We evaluate its capabilities on numerous tasksand find that its zero-shot performance is impressive -- often competitive withor even superior to prior fully supervised results. We are releasing theSegment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and11M images at https://segment-anything.com to foster research into foundationmodels for computer vision. <<<
翻译
12.
尹志 (2023-12-31 14:32):
#paper Consistency Models https://doi.org/10.48550/arXiv.2303.01469 扩散模型目前已经是生成式AI的核心技术方案了,但是由于它的迭代生成的性质,使得采样速度一直存在问题,因此在实际应用的场景下就会遇到阻碍。CM(consistency models)作为常规的扩散模型的高效改进方案,基于PE(probability flow) ODE轨道,提出一个针对ODE轨道(可以认为是演化迭代的步骤)上的映射,使得我们能够从任意轨道点,即任意迭代的timestep,映射到初始点,即原图。cm模型的提出,让单步扩散模型采样的质量变得更高,从而带动了大量实际应用的产生,包括图像编辑、图像补全等。目前大量基于扩散模型的实际应用,都已经使用了cm。这个是年初的时候Yang Song大佬和Ilya Sutskever一起的工作,四个作者全部都是来自openAI的扩散模型大佬。
Abstract:
Diffusion models have significantly advanced the fields of image, audio, andvideo generation, but they depend on an iterative sampling process that causesslow generation. To overcome this limitation, we propose consistency … >>>
Diffusion models have significantly advanced the fields of image, audio, andvideo generation, but they depend on an iterative sampling process that causesslow generation. To overcome this limitation, we propose consistency models, anew family of models that generate high quality samples by directly mappingnoise to data. They support fast one-step generation by design, while stillallowing multistep sampling to trade compute for sample quality. They alsosupport zero-shot data editing, such as image inpainting, colorization, andsuper-resolution, without requiring explicit training on these tasks.Consistency models can be trained either by distilling pre-trained diffusionmodels, or as standalone generative models altogether. Through extensiveexperiments, we demonstrate that they outperform existing distillationtechniques for diffusion models in one- and few-step sampling, achieving thenew state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 forone-step generation. When trained in isolation, consistency models become a newfamily of generative models that can outperform existing one-step,non-adversarial generative models on standard benchmarks such as CIFAR-10,ImageNet 64x64 and LSUN 256x256. <<<
翻译
13.
尹志 (2023-11-30 16:36):
#paper Hamed Khakzad, Ilia Igashov, Arne Schneuing, et al. A new age in protein design empowered by deep learning. Cell Systems 14, 925–939 (2023). https://doi.org/10.1016/j.cels.2023.10.006. 蛋白质作为细胞的主要组成,参与了包括酶促反应、信号转导等在内的各种生命反应,其意义毋庸置疑。但是如何通过人工的方式设计特定的蛋白质,从而解决疾病治疗、药物研发等一系列生命科学问题,一直是科学家的追求。人工智能的发展,特别是深度学习的发展,给这个主题带来了特别巨大的进展。这篇最新的综述就是对使用深度学习进行蛋白质设计的几类范式和sota方法进行了介绍。从方法角度看,介绍的非常全面。有意思的是,我们会发现目前生成式模型在AI的冲击已经迁移到蛋白质设计领域,并孵化出独有的味道。图神经网络、物理启发的模型、语言模型的模仿、深度生成模型的利用在蛋白质设计领域都展现出不错的性能,特别是当把几何先验通过数学的手段,比如群轮与深度学习进行结合,往往可以较好的捕获蛋白质精巧晦涩的结构信息。当然,考虑到蛋白质设计所涉及的序列、结构、功能三者的精密联系,如何协调序列建模、结构建模等方法,也成为未来发展的关键问题。文章中对数据、benchmark等方面的讨论也很有价值。当然,问题也是一大堆,最令人不爽的是,拥有生命科学基本属性的蛋白质设计,最终的效果需要实验甚至实际效果进行验证,因此计算方法论上再优秀的设计,也需要湿实验、临床实验的验证。希望随着技术的进步,这个领域的自动化agent技术会带来全新的范式。
IF:9.000Q1 Cell systems, 2023-11-15. DOI: 10.1016/j.cels.2023.10.006 PMID: 37972559
Abstract:
The rapid progress in the field of deep learning has had a significant impact on protein design. Deep learning methods have recently produced a breakthrough in protein structure prediction, leading … >>>
The rapid progress in the field of deep learning has had a significant impact on protein design. Deep learning methods have recently produced a breakthrough in protein structure prediction, leading to the availability of high-quality models for millions of proteins. Along with novel architectures for generative modeling and sequence analysis, they have revolutionized the protein design field in the past few years remarkably by improving the accuracy and ability to identify novel protein sequences and structures. Deep neural networks can now learn and extract the fundamental features of protein structures, predict how they interact with other biomolecules, and have the potential to create new effective drugs for treating disease. As their applicability in protein design is rapidly growing, we review the recent developments and technology in deep learning methods and provide examples of their performance to generate novel functional proteins. <<<
翻译
14.
尹志 (2023-10-31 19:35):
#paper https://doi.org/10.1063/5.0006074 J. Chem. Phys. 153, 024109 (2020) Recent developments in the PySCF program package, 这是pyscf的一篇介绍性文章,是pyscf主创团队写的,全面介绍了pyscf的目标、功能、应用领域,更重要的是作者详细讲述了pyscf库的设计理念,这个部分相信会比较吸引对科学计算感兴趣的小伙伴。pyscf是一个基于python的量子化学库,对于分子及固体的第一性原理模拟非常友好。自从2014年作者创建该库之后,越来越多从事量子模拟,电子结构计算的小伙伴为这个库做出贡献,现在pyscf不仅在量化领域,在数据科学、机器学习、量子计算领域也占据一席之地。文章写的很细,着重表达了作者团队希望pyscf能够更加松耦合,小结构驱动,成为更大项目的脚手架等设计理念,该理念也使得越来越多的量化项目优先使用pyscf,更大的项目吸取pyscf作为其核心组成部分;除了可用性,团队对性能的追求也使得pyscf成为众多量化软件中出色的候选。文章通过很多例子对上述观点进行了说明,极具可读性和参考价值,比如使用后HF对哈密顿量进行定制,使用一般化的CASSCF solverx实现轨道优化MP2方法,这些例子的代码都在20-30行代码左右,却能比很多书本都讲得清楚。最后,作者也展望了pyscf在机器学习,量子计算等领域的发展。考虑到本人在使用pyscf过程中的良好体验,推荐感兴趣的小伙伴读读这篇文章并尝试使用pyscf。对了,pyscf的作者也是传奇,真正做到了经营着量化基金,开发着量化软件,哈哈哈哈哈哈哈
IF:3.100Q1 The Journal of chemical physics, 2020-Jul-14. DOI: 10.1063/5.0006074 PMID: 32668948
Abstract:
PySCF is a Python-based general-purpose electronic structure platform that supports first-principles simulations of molecules and solids as well as accelerates the development of new methodology and complex computational workflows. This … >>>
PySCF is a Python-based general-purpose electronic structure platform that supports first-principles simulations of molecules and solids as well as accelerates the development of new methodology and complex computational workflows. This paper explains the design and philosophy behind PySCF that enables it to meet these twin objectives. With several case studies, we show how users can easily implement their own methods using PySCF as a development environment. We then summarize the capabilities of PySCF for molecular and solid-state simulations. Finally, we describe the growing ecosystem of projects that use PySCF across the domains of quantum chemistry, materials science, machine learning, and quantum information science. <<<
翻译
15.
尹志 (2023-09-30 22:52):
#paper https://doi.org/10.1073/pnas.2112677119 Thoughts on how to think (and talk) about RNA structure。mRNA疫苗的出现,再一次唤起了生物学家们对RNA结构与功能研究的热情。本文强调了重新审视rna,开展未来更多研究的重要性,反思了当前对rna结构与功能理解上可能存在的误区,并结合自己的经验,提出了6条rna研究上值得注意的点,这些观点对未来rna研究提供了一个很有价值的方向。rna的复杂性及目前的开放性是一个特别吸引计算研究者入坑的原因。传统上将rna看做一条松垮的面条的方式目前看来恐怕不能很好的描述rna的结构,rna折叠的预测目前存在非常大的挑战或者说研究空间,据我所知,目前rna折叠连二级结构都做不好,那三级结构呢?即使结构测定相对容易的情况下,作为计算工作者,能不能很好的跟进?作者在文中多次强调了rna的“unstructed”的表述问题,所谓的非结构给rna的结构预测反而提出了更大挑战:所谓“Inherently Structured”Does Not Mean “Static”, 从计算角度而言,rna的骨架约束变少,自由能landscape梯度较低,那么给计算优化带来了很多有趣的问题,面对大量的局部最优,有不有更合适的优化算法?特别是rna的动态敏感性,怎么在实际预测中考虑 这些因素,如何建模这些环境影响?Non-Watson–Crick Pairing在在RNA的功能和调控中的作用如何被考虑,等等问题。正如作者在文末呼吁的:RNA has gone mainstream, solet’s make sure RNA structure properties return to thefront seat。
Abstract:
Recent events have pushed RNA research into the spotlight. Continued discoveries of RNA with unexpected diverse functions in healthy and diseased cells, such as the role of RNA as both … >>>
Recent events have pushed RNA research into the spotlight. Continued discoveries of RNA with unexpected diverse functions in healthy and diseased cells, such as the role of RNA as both the source and countermeasure to a severe acute respiratory syndrome coronavirus 2 infection, are igniting a new passion for understanding this functionally and structurally versatile molecule. Although RNA structure is key to function, many foundational characteristics of RNA structure are misunderstood, and the default state of RNA is often thought of and depicted as a single floppy strand. The purpose of this perspective is to help adjust mental models, equipping the community to better use the fundamental aspects of RNA structural information in new mechanistic models, enhance experimental design to test these models, and refine data interpretation. We discuss six core observations focused on the inherent nature of RNA structure and how to incorporate these characteristics to better understand RNA structure. We also offer some ideas for future efforts to make validated RNA structural information available and readily used by all researchers. <<<
翻译
16.
尹志 (2023-08-31 22:11):
#paper https://doi.org/10.48550/arXiv.1812.07907 PnP-AdaNet: Plug-and-Play Adversarial Domain Adaptation Network at Unpaired Cross-Modality Cardiac Segmentation。调研高效生成模型的过程中偶遇的论文,发现还是有点意思的。文章提出了一个网络结构:PnP-AdaNet,实现了无监督的不同模态间分割任务领域适应。考虑到是2018年的老文章,其替换网络结构和利用对抗学习的想法现在已经比较常见,但我认为替换网络的思想在大模型盛行的今天有着更深刻的内涵,本人手头的一个研究主题也是沿着这条线索,目前看部分实验结果还是很不错的。
Abstract:
Deep convolutional networks have demonstrated the state-of-the-art performance on various medical image computing tasks. Leveraging images from different modalities for the same analysis task holds clinical benefits. However, the generalization … >>>
Deep convolutional networks have demonstrated the state-of-the-art performance on various medical image computing tasks. Leveraging images from different modalities for the same analysis task holds clinical benefits. However, the generalization capability of deep models on test data with different distributions remain as a major challenge. In this paper, we propose the PnPAdaNet (plug-and-play adversarial domain adaptation network) for adapting segmentation networks between different modalities of medical images, e.g., MRI and CT. We propose to tackle the significant domain shift by aligning the feature spaces of source and target domains in an unsupervised manner. Specifically, a domain adaptation module flexibly replaces the early encoder layers of the source network, and the higher layers are shared between domains. With adversarial learning, we build two discriminators whose inputs are respectively multi-level features and predicted segmentation masks. We have validated our domain adaptation method on cardiac structure segmentation in unpaired MRI and CT. The experimental results with comprehensive ablation studies demonstrate the excellent efficacy of our proposed PnP-AdaNet. Moreover, we introduce a novel benchmark on the cardiac dataset for the task of unsupervised cross-modality domain adaptation. We will make our code and database publicly available, aiming to promote future studies on this challenging yet important research topic in medical imaging. <<<
翻译
17.
尹志 (2023-07-31 22:52):
#paper doi: https://doi.org/10.48550/arXiv.2210.13695 Structure-based Drug Design with Equivariant Diffusion Models 又读了一遍这篇文献,用等变扩散模型进行结构化药物设计确实是一种有效的药物设计方式,越来越多的工作也在不断证明它的价值。这篇工作挺经典的(虽然貌似被iclr拒了),它基于蛋白质口袋利用se3等变扩散模型进行了分子生成。大量实验证明它生成药物分子的新颖性和多样性在效率和有效性上都很不错。文章还讨论了使用该方法对现有分子的优化,基于补全进行分子设计等问题,虽然在效果上还存在很多缺陷,但这些思路对于小分子药物设计及现有方法的改进都非常有价值。
Abstract:
Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. In this paper, we formulate SBDD as a 3D-conditional generation … >>>
Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. In this paper, we formulate SBDD as a 3D-conditional generation problem and present DiffSBDD, an SE(3)-equivariant 3D-conditional diffusion model that generates novel ligands conditioned on protein pockets. Comprehensive in silico experiments demonstrate the efficiency and effectiveness of DiffSBDD in generating novel and diverse drug-like ligands with competitive docking scores. We further explore the flexibility of the diffusion framework for a broader range of tasks in drug design campaigns, such as off-the-shelf property optimization and partial molecular design with inpainting. <<<
翻译
18.
尹志 (2023-06-30 21:30):
#paper https://doi.org/10.1002/ange.202210001, Angewandte Chemie 134.40 (2022), A Carbon-Carbon Bond Cleavage-Based Prodrug Activation Strategy Applied to β-Lapachone for Cancer-Specific Targeting。基本过了一遍,特别有意思的工作。文章提出了一种新型的前药(prodrug)设计策略,利用C-C键断裂来生成父药(parent drug)。在文章中这个父药是β-Lapachone, 一种对胰腺癌和肺癌有靶向效果的药物分子。前药设计策略是靶向药设计的一种现代方法,它的思路是,很多药物直接服用或者使用对患者的毒性较高,因此治疗窗口就较小。而前药设计策略是,将父药包装成前药分子,然后通过前药分子的摄入进入体内,然后在到达靶点之后,通过某种方式,转变为父药,进而被激活,产生药物活性。这个过程降低了药物毒性的影响,延长了治疗窗口。传统上,前药激活的方式是通过断裂C-N/C-O键,但是很多可修饰基团没有C-N/C-O bond。作者的新策略是进行C-C bond的断裂,从而产生父药,从而产生药效。脑洞一下啊,今年初,张锋组的一篇工作,我之前在paper群有写,通过一种叫外胞质收缩注射系统的纳米机器,进行各种蛋白质负载的传递,我感觉和前药设计的思想很像啊,都是通过间接的方式,避开某种问题,实现最终效果,类似构建一套传递体系或者传递策略。这点上很值得借鉴。只能说,药物设计生物设计简直泰裤辣
Abstract:
Prodrugs are one of the most common strategies for the design of targeted anticancer agents. However, their application is often hampered by the modifiable groups available on parent drugs. Herein, … >>>
Prodrugs are one of the most common strategies for the design of targeted anticancer agents. However, their application is often hampered by the modifiable groups available on parent drugs. Herein, a carbon-carbon (C−C) bond cleavage-based prodrug activation strategy is reported, which was successfully used to design prodrugs of β-lapachone (β-lap), an ortho-quinone natural product without traditional modifiable groups for the construction of C−N/C−O bond cleavage-based prodrugs. The designed β-lap prodrug with a reactive oxygen species-specific trigger was quickly activated, releasing β-lap. It exerted anticancer efficacy via NAD(P)H:quinone oxidoreductase 1 (NQO1)-mediated futile redox cycling, resulting in potent cytotoxicity that was highly selective for NQO1-rich cancer cells over normal cells both in vitro and in vivo. This significantly amplified the therapeutic window of β-lap. This study provides a practical strategy for the design of prodrugs for parent drugs that do not contain traditional modifiable groups. <<<
翻译
19.
尹志 (2023-05-31 22:12):
#paper doi: https://doi.org/10.1016/j.drudis.2021.05.019 Drug Discovery Today, 2021, De novo molecular design and generative models. 文章是来自业界的Benevolent AI写的,对从头的分子设计进行了综述。主要从颗粒度的角度进行 了分类,讨论了atom based, fragment based, reaction based三种不同的分子表示的视角下分子设计的方法。对于分子设计中的优化方法,文章分为无梯度和基于梯度的方法进行讨论,前者主要集中在演化算法和群体智能算法,而后者则是目前基于深度生成模型的主流。文章还强调了该领域建立合适评价标准和benchmark的重要性,不过考虑到分子设计务实的属性,这里还有非常多亟待解决的问题。文章的总结的思路很清楚,但是这个领域的发展实在是太快太快,因此2021年的综述显然是太老了,最近几年基于各种深度生成模型的分子设计很多已经相当实用化,还是建议大家看最新的文章,当然这篇综述还是可以当做一条不错的线索的。
Abstract:
Molecular design strategies are integral to therapeutic progress in drug discovery. Computational approaches for de novo molecular design have been developed over the past three decades and, recently, thanks in … >>>
Molecular design strategies are integral to therapeutic progress in drug discovery. Computational approaches for de novo molecular design have been developed over the past three decades and, recently, thanks in part to advances in machine learning (ML) and artificial intelligence (AI), the drug discovery field has gained practical experience. Here, we review these learnings and present de novo approaches according to the coarseness of their molecular representation: that is, whether molecular design is modeled on an atom-based, fragment-based, or reaction-based paradigm. Furthermore, we emphasize the value of strong benchmarks, describe the main challenges to using these methods in practice, and provide a viewpoint on further opportunities for exploration and challenges to be tackled in the upcoming years. <<<
翻译
20.
尹志 (2023-04-30 10:32):
#paper Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models doi: https://doi.org/10.1101/2022.12.09.519842 这篇文章提出了一种全新的蛋白质设计方法,叫做rf diffusion,它使用深度生成学习生成全新的蛋白质结构。文章主要使用的是 diffusion model,考虑到蛋白质骨架的复杂几何性质以及氨基酸序列-结构的复杂关系,蛋白质生成任务一直以来的挑战很大。这篇工作 使用diffusion model的思路如下:1.使用RoseTTAFold作为去噪网络,考虑到RoseTTA本来就是baker组用来做蛋白质设计的(更多的是基于物理的),这个去噪网络的选择还是很巧妙的;2.整个加噪去噪过程主要针对alpha碳原子的坐标进行,因此rf diffusion的思路是先对骨架结构进行生成的;3.然后full 的protein structure是通过backbone tracking的技术来实现的,这个过程可以理解为基于一些几何约束、bond的长度角度参数等等为已经预测的alpha碳原子添加缺失的bond和原子,4.侧链是通过rotamer实现的,rotamer是一个已经对 每个氨基酸残基做了预先计算的库,它可以为你选择符合能量最优的构象的侧链结构。 因此整个蛋白质生成的过程可以认为是深度生成模型+物理约束+后处理(预先计算)来实现的。当然,这篇工作也做了很多的实验对设计进行验证。baker组在之后使用了rfdiffusion做了后续的一些设计工作,包括De novo design of high-affinity protein binders to bioactive helical peptides这个工作,并在不久前开源了rf diffusion的代码,也有很多蛋白质设计的研究人员开始大量尝试 基于rfdiffusion的设计,并尝试进行湿实验的验证,因此这绝对是一篇开创性的工作,值得各位小伙伴关注。
Abstract:
AbstractThere has been considerable recent progress in designing new proteins using deep learning methods1–9. Despite this progress, a general deep learning framework for protein design that enables solution of a … >>>
AbstractThere has been considerable recent progress in designing new proteins using deep learning methods1–9. Despite this progress, a general deep learning framework for protein design that enables solution of a wide range of design challenges, includingde novobinder design and design of higher order symmetric architectures, has yet to be described. Diffusion models10,11have had considerable success in image and language generative modeling but limited success when applied to protein modeling, likely due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding, and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold Diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of new designs. In a manner analogous to networks which produce images from user-specified inputs, RFdiffusionenables the design of diverse, complex, functional proteins from simple molecular specifications. <<<
翻译
回到顶部