Vincent (2024-02-29 17:06):
#paper Transfer learning enables predictions in network biology. Nature. 2023. doi: https://doi.org/10.1038/s41586-023-06139-9. 学习基因互作网络通常需要大量数据,对于数据较少的生物研究来说,利用迁移学习和预训练模型能够有效降低对数据量的需求。这篇文章提出了一种基于transformer的深度学习模型geneformer,其使用了大量的单细胞数据集进行预训练(自监督学习)。在模型训练中,geneformer 并未使用gene的原始表达值,而是使用了gene expression rank(相当于数据降噪)来学习基因网络。对于下游任务,利用少量数据对模型微调就能够很好的增强预测准确率。文章列举了geneformer在基因剂量, 染色质,基因网络方面的例子,预测准确性相较传统的机器学习模型均有明显提升。
IF:50.500Q1 Nature, 2023-Jun. DOI: 10.1038/s41586-023-06139-9 PMID: 37258680
Transfer learning enables predictions in network biology
翻译
Abstract:
Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding and computer vision by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.
翻译
回到顶部