李翛然 (2025-07-28 13:44):
#paper doi:10.1126/science.adv9817,Science,Sarah Lewis https://orcid.org/0009-0009-6484-0352, et al. Scalable emulation of protein equilibrium ensembles with generative deep learning 蛋白质功能依赖其动态构象变化(如结构域运动、局部解折叠),但现有技术存在瓶颈: ​​静态模型局限​​:AlphaFold等仅预测单一结构,无法捕捉动态过程。 ​​传统方法缺陷​​:实验技术(冷冻电镜、单分子实验)通量低;分子动力学(MD)模拟计算成本极高(毫秒级模拟需数月GPU时间)。 ​​BioEmu的核心创新​​ 微软团队提出​​BioEmu​​,一种基于生成式扩散模型的系统,实现​​高效、高精度蛋白质构象集合模拟​​: ​​架构设计​​:融合AlphaFold的evoformer编码器与扩散模型,输入蛋白质序列,通过30–50步去噪生成三维构象集合。 ​​三阶段训练策略​​: ​​预训练​​:使用聚类后的AlphaFold数据库学习构象多样性; ​​微调​​:整合>200毫秒全原子MD数据(覆盖1100+CATH结构域),逼近热力学平衡分布; ​​精调​​:引入PPFT算法,利用50万实验稳定性数据(ΔG/ΔΔG)优化模型与实验一致性。
Science, 2025-7-10. DOI: 10.1126/science.adv9817
Scalable emulation of protein equilibrium ensembles with generative deep learning
翻译
Abstract:
Following the sequence and structure revolutions, predicting functionally relevant protein structure changes at scale remains an outstanding challenge. We introduce BioEmu, a deep learning system that emulates protein equilibrium ensembles by generating thousands of statistically independent structures per hour on a single GPU. BioEmu integrates over 200 milliseconds of molecular dynamics (MD) simulations, static structures and experimental protein stabilities using novel training algorithms. It captures diverse functional motions—including cryptic pocket formation, local unfolding, and domain rearrangements—and predicts relative free energies with 1 kcal/mol accuracy compared to millisecond-scale MD and experimental data. BioEmu provides mechanistic insights by jointly modelling structural ensembles and thermodynamic properties. This approach amortizes the cost of MD and experimental data generation, demonstrating a scalable path toward understanding and designing protein function.
翻译
回到顶部