徐炳祥
(2023-11-28 11:05):
#paper doi: 10.1186/s13059-023-03088-4 Genome Biology, 2023, CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure。本文介绍了一套针对最新人类基因组完整序列(T2T genome)的完整人类基因组编码序列注释。作者通过收集和分析来自54个组织位点的超过10000项RNA-seq数据组装了所有可能的转录本,在此基础上,通过综合利用基于序列特征和基于机器学习的编码能力预测模型,结合转录本表达的组织特异性,编码蛋白质空间构象的合理性(基于alphaFold2的预测)对其进行质控,最终获得了41,356个基因和158,377个转录本。本文的结果是基因组研究的重要基础资料,其研究方法对基于RNA测序的研究有一定参考价值。
IF:10.100Q1
Genome biology,
2023-10-30.
DOI: 10.1186/s13059-023-03088-4
PMID: 37904256
PMCID:PMC10614308
CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure
翻译
Abstract:
CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available at http://ccb.jhu.edu/chess .
翻译