李翛然
(2025-03-31 10:04):
#paper doi:doi.org/10.1038/s41467-025-58038-4 Robust enzyme discovery and engineering with deep learning using CataPro.
深度学习赋能酶工程——CataPro模型
1. 研究背景与挑战
酶作为高效生物催化剂在工业中应用广泛,但野生酶性能不足且传统改造方法成本高、效率低。现有深度学习模型在酶动力学参数(如kcat、Km)预测中存在数据偏差和泛化能力不足的问题,阻碍了理性设计进程。
2. 模型创新与优势
研究团队开发的CataPro模型通过整合预训练语言模型(如ProtT5、MolT5)与分子指纹,显著提升了酶动力学参数的预测精度。其核心突破在于采用无偏十折交叉验证数据集(按序列相似性聚类划分),避免模型对训练数据的“记忆性”过拟合,泛化能力优于现有工具。
3. 实际应用验证
在香兰素生物合成案例中,CataPro成功挖掘出活性提升的SsCSO酶,并通过预测指导突变设计获得活性提高3.34倍的突变体。这一成果展示了模型在酶定向进化与工业酶库筛选中的实用性,为生物制造提供高效工具。
4. 局限与未来方向
当前模型对复杂催化机制的表征仍有不足,且kcat预测精度受限于数据覆盖度。未来需融合更多物理化学机制特征,并拓展反应类型数据以增强普适性。
5. 总结评价
CataPro通过深度学习与无偏数据策略的结合,为酶工程提供了高可信度预测工具,推动生物催化从经验驱动向数据驱动转型。其成功案例为绿色化工、合成生物学等领域的高效酶设计开辟了新路径,标志着AI在生物制造中的深度渗透。
Nature Communications,
2025-3-20.
DOI: 10.1038/s41467-025-58038-4
Robust enzyme discovery and engineering with deep learning using CataPro
翻译
Abstract:
Abstract Accurate prediction of enzyme kinetic parameters is crucial for enzyme exploration and modification. Existing models face the problem of either low accuracy or poor generalization ability due to overfitting. In this work, we first developed unbiased datasets to evaluate the actual performance of these methods and proposed a deep learning model, CataPro, based on pre-trained models and molecular fingerprints to predict turnover number (k c a t ), Michaelis constant (K m ), and catalytic efficiency (k c a t /K m ). Compared with previous baseline models, CataPro demonstrates clearly enhanced accuracy and generalization ability on the unbiased datasets. In a representational enzyme mining project, by combining CataPro with traditional methods, we identified an enzyme (SsCSO) with 19.53 times increased activity compared to the initial enzyme (CSO2) and then successfully engineered it to improve its activity by 3.34 times. This reveals the high potential of CataPro as an effective tool for future enzyme discovery and modification.
翻译
Related Links: