小年
(2024-05-30 11:08):
#paper A deep catalogue of protein-coding variation in 983,578 individuals. Nature. 2024 May 29. doi: 10.1038/s41586-024-07556-0.
在本篇文章中,作者通过对983,578个不同人群进行外显子测序,建立了一个涵盖多种人群的蛋白质编码变异目录。研究数据中,23%的样本来自非欧洲人群,包括非洲、东亚、美洲土著、中东和南亚血统。这一目录包含了超过1040万个错义变异和110万个预测的功能缺失变异(pLOF)。作者识别出了4848个基因中的罕见双等位基因pLOF变异,其中1751个基因是首次报道。此外,研究还识别出了3988个对功能缺失不耐受的基因,这些基因中包括86个以前被评估为耐受的基因和1153个缺乏已知疾病注释的基因。这项研究通过对大规模多样人群的外显子测序,丰富了我们对人类蛋白质编码变异的理解,并为精准医学提供了宝贵资源。特别是该研究强调了基因约束和变异频率在不同人群中的差异,揭示了基因功能与疾病风险之间的复杂关系,尤其是在识别和解释罕见的有害变异方面。然而,该研究的一个限制是其主要依赖于短读测序数据,可能对某些变异类型的准确性有所不足。
A deep catalogue of protein-coding variation in 983,578 individuals
翻译
Abstract:
Rare coding variants that substantially affect function provide insights into the biology of a gene. However, ascertaining the frequency of such variants requires large sample sizes. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.
翻译