Paper-Hub

2022, bioRxiv. DOI: 10.1101/2022.12.09.519842

Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models

Joseph L. Watson , David Juergens , Nathaniel R. Bennett , Brian L. Trippe , Jason Yim , Helen E. Eisenach , Woody Ahern , Andrew J. Borst , Robert J. Ragotte , Lukas F. Milles , Basile I. M. Wicky , Nikita Hanikel , Samuel J. Pellock , Alexis Courbet , William Sheffler , Jue Wang , Preetham Venkatesh , Isaac Sappington , Susana Vázquez Torres , Anna Lauko , Valentin De Bortoli , Emile Mathieu , Regina Barzilay , Tommi S. Jaakkola , Frank DiMaio , Minkyung Baek , David Baker

Abstract:

AbstractThere has been considerable recent progress in designing new proteins using deep learning methods1–9. Despite this progress, a general deep learning framework for protein design that enables solution of a wide range of design challenges, includingde novobinder design and design of higher order symmetric architectures, has yet to be described. Diffusion models10,11have had considerable success in image and language generative modeling but limited success when applied to protein modeling, likely due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding, and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold Diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of new designs. In a manner analogous to networks which produce images from user-specified inputs, RFdiffusionenables the design of diverse, complex, functional proteins from simple molecular specifications.

Related Links:

https://syndication.highwire.org/content/doi/10.1101/2022.12.09.519842

2023-04-30 10:32:00

尹志:

#paper Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models doi: https://doi.org/10.1101/2022.12.09.519842 这篇文章提出了一种全新的蛋白质设计方法，叫做rf diffusion，它使用深度生成学习生成全新的蛋白质结构。文章主要使用的是 diffusion model，考虑到蛋白质骨架的复杂几何性质以及氨基酸序列-结构的复杂关系，蛋白质生成任务一直以来的挑战很大。这篇工作使用diffusion model的思路如下：1.使用RoseTTAFold作为去噪网络，考虑到RoseTTA本来就是baker组用来做蛋白质设计的（更多的是基于物理的），这个去噪网络的选择还是很巧妙的；2.整个加噪去噪过程主要针对alpha碳原子的坐标进行，因此rf diffusion的思路是先对骨架结构进行生成的；3.然后full 的protein structure是通过backbone tracking的技术来实现的，这个过程可以理解为基于一些几何约束、bond的长度角度参数等等为已经预测的alpha碳原子添加缺失的bond和原子，4.侧链是通过rotamer实现的，rotamer是一个已经对每个氨基酸残基做了预先计算的库，它可以为你选择符合能量最优的构象的侧链结构。因此整个蛋白质生成的过程可以认为是深度生成模型+物理约束+后处理（预先计算）来实现的。当然，这篇工作也做了很多的实验对设计进行验证。baker组在之后使用了rfdiffusion做了后续的一些设计工作，包括De novo design of high-affinity protein binders to bioactive helical peptides这个工作，并在不久前开源了rf diffusion的代码，也有很多蛋白质设计的研究人员开始大量尝试基于rfdiffusion的设计，并尝试进行湿实验的验证，因此这绝对是一篇开创性的工作，值得各位小伙伴关注。