颜林林 (2022-07-23 22:05):
#paper doi:10.1101/2022.07.21.500999 bioRxiv, 2022, High-resolution de novo structure prediction from primary sequence. 这篇预发表的文章,开发了一个工具,OmegaFold,可以基于单个蛋白的一级序列信息,预测三级结构。现在主流的方法,都需要依赖演化信息,即通过多序列比对作为辅助,进行蛋白质折叠结构的预测。而本文认为,蛋白从被翻译合成出来后,就会经历从一级序列自动折叠成为三级结构,因而这些演化信息对于结构预测而言并非必要。本文采取的深度模型,会依赖于一组预训练模型,帮助识别出一级序列中哪些氨基酸更为重要(即赋予不同的注意力),并采取基于BERT的语言模型技术,帮助进行蛋白质折叠的模型训练。最终实现的方法,可以有效解决孤儿蛋白(即当前结构数据库中缺乏其他可供参考的相近蛋白)的结构预测问题,且与AlphaFold等工具相比,在准确度上又有显著提升。
High-resolution de novo structure prediction from primary sequence
翻译
Abstract:
Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such as with orphan proteins and fast-evolving proteins like antibodies, and a protein typically folds in a natural setting from its primary amino acid sequence into its three-dimensional structure, suggesting that evolutionary information and MSAs should not be necessary to predict a protein's folded form. Here, we introduce OmegaFold, the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures. OmegaFold enables accurate predictions on orphan proteins that do not belong to any functionally characterized protein family and antibodies that tend to have noisy MSAs due to fast evolution. Our study fills a much-needed structure prediction gap and brings us a step closer to understanding protein folding in nature.
翻译
回到顶部