李翛然
(2025-08-30 11:09):
#paper Atom level enzyme active site scaffolding using RFdiffusion2 doi://10.1101/2025.04.09.648075
RFdiffusion2 是由 David Baker 团队开发的革命性蛋白质设计模型,专注于原子级酶活性位点的精准构建,实现了从催化机制到功能酶结构的端到端生成。以下是其核心功能及相比第一代(RFdiffusion)的突破性改进:
------
核心功能
1. 原子级活性位点设计
◦ 直接输入催化反应的关键原子坐标(如侧链功能基团、金属离子或底物),模型自动生成容纳该活性位点的完整蛋白质支架,无需预先指定残基类型、位置或构象(rotamer)。
◦ 支持 "部分配体输入":仅提供部分底物原子坐标,模型可补全未知构象,并控制小分子埋藏深度(通过原子级RASA条件)。
2. 多样性酶生成
◦ 基于最小化反应机制描述(如DFT优化的过渡态几何),生成结构新颖且功能多样的酶,实验验证中仅需筛选 ≤96个设计 即可获得高活性酶。
3. 广泛适用性
◦ 成功应用于逆醛缩酶、半胱氨酸水解酶、金属水解酶等设计,其中锌水解酶的催化效率达53,000 M⁻¹s⁻¹,比此前设计高几个数量级。
------
相比RFdiffusion的五大突破
1. 原子级输入取代残基级输入
◦ RFdiffusion 仅支持指定残基骨架(N-Cα-C),需人工枚举侧链构象和序列位置,计算量大且限制设计空间。
◦ RFdiffusion2 直接接受原子坐标(如His的ND1原子),自动推断残基类型、构象和序列位置,极大提升自由度。
2. 无索引基序支持
◦ 无需预先固定催化残基的序列编号(index),模型可自主分配位置,解决传统方法中指数级增长的搜索难题。
3. 流匹配(Flow Matching)框架
◦ 替换传统扩散模型,训练更稳定、推理更高效,支持原子坐标与蛋白结构同步生成。
4. 条件控制能力增强
◦ 新增 RASA条件(控制配体原子暴露度)、ORI条件(指定活性位点质心位置),实现活性位点埋藏深度与方向的精准调控。
5. 实验成功率显著提升
◦ 在原子基序酶基准(AME)测试中,RFdiffusion2在 41/41个挑战任务 中生成有效结构,而RFdiffusion仅成功 16/41个 。
◦ 生成的结构与天然蛋白相似度低(TM-score≤0.4),证明其高度创新性。
bioRxiv,
2025-4-10.
DOI: 10.1101/2025.04.09.648075
Atom level enzyme active site scaffolding using RFdiffusion2
翻译
Abstract:
AbstractDe novoenzyme design starts from ideal active site descriptions consisting of constellations of catalytic residue functional groups around reaction transition state(s), and seeks to generate protein structures that can accurately hold the site in place. Highly active enzymes have been designed starting from such descriptions using the generative AI method RFdiffusion [1–3], but there are two current methodological limitations. First, the geometry of the active site can only be specified at the residue level, so for each catalytic residue functional group placed around the reaction transition state, the possible locations of the residue backbone must be enumerated by building side chain rotamers back from the functional group. Second, the location of the catalytic residues along the sequence must be specified in advance, which considerably limits the space of solutions which can be sampled. Here we describe a new deep generative method, Rosetta Fold diffusion 2 (RFdiffusion2), that solves both problems, enabling enzymes to be designed from sequence agnostic descriptions of functional group locations without inverse rotamer generation. We first evaluate RFdiffusion2 on anin silicoenzyme design benchmark of 41 diverse active sites and find that it is able to successfully build proteins scaffolding all 41 sites, compared to 16/41 with prior state-of-the-art deep learning methods. Next, we design enzymes around three diverse catalytic sites and characterize the designs experimentally; in each case we identify active catalysts in testing less than 96 sequences. RFdiffusion2 demonstrates the potential of atomic resolution generative models for the design ofde novoenzymes directly from their reaction mechanisms.
翻译
Related Links: