小年
(2023-09-30 08:20):
#paper draft human pangenome reference, Nature, 10 May 2023. doi.org/10.1038/s41586-023-05896-x.
这篇文章中人类泛基因组参考联盟(Human Pangenome Reference Consortium)首次呈现了人类泛基因组参考图谱的初步版本。该泛基因组参考图谱由来自遗传多样性人群的47个单倍体定相的二倍体组装基因组序列构成。这些组装序列覆盖了每个基因组中超过99%的序列,并且在结构和碱基水平上的准确性也超过99%。基于这些组装序列的比对,本篇文章作者生成了一个初步版本的泛基因组参考图谱,其中包含已知变异和单倍型,并揭示了在结构复杂位点上的新等位基因。此外,相对于现有的GRCh38参考基因组,泛基因组参考图谱添加了11,900万个常染色体多态位点和1,115个基因重复,其中约有9,000万个额外的碱基对来自结构变异。基于初步版本泛基因组参考图谱分析短读长测序数据,相对于基于GRCh38的工作流程,小突变的检测误差降低了34%,每个单倍型序列检测到的结构变异数量增加了104%,并且实现了对大多数样本的结构变异等位基因的分型。
思考:目前通用的人类参考基因组(GRCh38)是基于多个捐献者的DNA组装成而成线性参考基因组,捐献者主要以非裔和欧裔为主,亚裔成分较少。由于世界各地区人群中存在大量的遗传多态性,GRCh38并不能代表各个群体内所有的遗传多态性。本篇文章生成了来自世界各地区人群的47个单倍型定相的组装基因组,从而构建了人类泛基因组的初步版本。通过对短读长测序数据进行分析,发现相较于GRCh38的检测流程,对各类型的遗传突变都有了更好的检测效果。不过本篇文章构建的泛基因组主要是基于美洲和非洲人群,亚洲人群的比例只有13%,可能并不能很好的代表亚洲人群的遗传多样性。
A draft human pangenome reference
翻译
Abstract:
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
翻译