颜林林 (2024-02-29 09:02):
#paper doi:10.1038/s41592-024-02201-0. Nature Methods, 2024, scGPT: toward building a foundation model for single-cell multi-omics using generative AI. 这篇文章使用了生成式AI大模型,来进行单细胞测序数据分析。文章并未自己收集样本和测序,而仅仅依靠已发表的公开数据或来自公共数据库的数据,进行模型训练、工具开发和性能验证,属于典型的纯生信文章,借着生成式AI概念的火热,加上结果性能表现良好,这篇文章发表到了Nature Methods杂志,很值得生信专业者学习和模仿。文章在九个多月前,就已预发表在bioRxiv上,当时整合使用了1000万个细胞的数据,在这次的正式发表版本中,整合的细胞数量增加到了3300万,模型性能也得到了进一步的改进。文章开发的模型名为scGPT,它基于生成式预训练变换器(Transformer)架构的单细胞基础模型,旨在处理和解析大规模的单细胞数据。scGPT模型展示了在多种下游任务中,如细胞类型注释、遗传扰动反应预测、多批次整合以及多组学数据整合等方面的卓越性能。研究的创新点在于首次将基础模型概念应用于单细胞生物学领域,通过自监督预训练和任务特定的微调,有效捕获和理解细胞和基因之间复杂的生物学关系。scGPT利用其强大的学习能力揭示了特定条件下的基因-基因互作,展现了转移学习中的扩展性和上下文效应。相比传统的机器学习模型,大模型能够捕捉到更为细致和全面的生物学特征,尤其是一些长距离依赖和复杂的数据关系,比如隐藏在数据背后的未知细胞类型或细胞相互作用,这大概也是这篇文章将其用于单细胞数据分析的重要出发点。
IF:36.100Q1 Nature methods, 2024-Aug. DOI: 10.1038/s41592-024-02201-0 PMID: 38409223
scGPT: toward building a foundation model for single-cell multi-omics using generative AI
翻译
scGPT:利用生成式 AI 构建单细胞多组学基础模型
Abstract:
Generative pretrained models have achieved remarkable success in various domains such as language and computer vision. Specifically, the combination of large-scale diverse datasets and pretrained transformers has emerged as a promising approach for developing foundation models. Drawing parallels between language and cellular biology (in which texts comprise words; similarly, cells are defined by genes), our study probes the applicability of foundation models to advance cellular biology and genetic research. Using burgeoning single-cell sequencing data, we have constructed a foundation model for single-cell biology, scGPT, based on a generative pretrained transformer across a repository of over 33 million cells. Our findings illustrate that scGPT effectively distills critical biological insights concerning genes and cells. Through further adaptation of transfer learning, scGPT can be optimized to achieve superior performance across diverse downstream applications. This includes tasks such as cell type annotation, multi-batch integration, multi-omic integration, perturbation response prediction and gene network inference.
翻译
回到顶部