文献收藏与分享平台

颜林林 (2022-07-29 08:21):

#paper doi:10.1093/nar/gkac586 Nucleic Acid Research, 2022, De novo assembly of human genome at single-cell levels. 作者之前开发的一项名为 SMOOTH-seq 的技术，大致原理是：用 Tn5 转座子插入基因组DNA，使其随机片段化，然后用带有 barcode 的引物对片段进行链置换和扩增，再将双链末端分别连入一段序列以成环，进行滚环扩增，得到可供长读长测序的长片段，该长片段上带有多份原始序列片段，因而可以准确校正序列碱基。本文在此基础上进行了改进，使用 PacBio HiFi 和 Oxford Nanopore Technologies（ONT）两种测序平台，对 K562 和 HG002 两个细胞系进行单细胞测序。首次在单细胞水平上完成了具有高连续性的人类基因组组装。其结果包括：95 个 K562 细胞，总测序深度约37x（如果没理解错，应该每个细胞的测序深度为 37/95 = 0.4 x），NG50 约 2 Mb；30 个 HG002 细胞，每个细胞的测序深度约为 1G（相当于是 0.33x），NG50 约 1.3 Mb。按文章摘要的说法“开启了单细胞基因组从头组装实践的新篇章”。这个主题看似创新度很高，仔细推敲却不禁有些疑问：单细胞基因组测序很难区分不同类群细胞，因而应该只能在单细胞水平上分别进行组装，否则大量不同类群细胞混合起来组装，则又失去了原本的立意。但是，单个细胞的基因组覆盖度是不可能很全面的（文章提到平均覆盖率约是 41.7%，我猜提升测序数据量也未必对此会有大幅改善），这又很大程度上会限制组装本身，因而最终只能关注其中的结构变异鉴定结果。此外，单细胞基因组结果其实很难验证，很难用其他细胞的结果来评判当前被测细胞的结果是否准确，这应该也是一个逻辑上的硬伤。所以，最终这篇文章的贡献，除了两个细胞系的单细胞基因组测序数据本身外，大概主要还是在于实验方法摸索优化和技术方法建立吧，当然其数据分析方法过程也是值得参考的。

IF:16.600Q1 Nucleic acids research, 2022-07-22. DOI: 10.1093/nar/gkac586 PMID: 35819189 PMCID:PMC9303314

De novo assembly of human genome at single-cell levels

翻译

Haoling Xie, Wen Li, Yuqiong Hu, Cheng Yang, Jiansen Lu, Yuqing Guo, Lu Wen, Fuchou Tang

Abstract:

Genome assembly has been benefited from long-read sequencing technologies with higher accuracy and higher continuity. However, most human genome assembly require large amount of DNAs from homogeneous cell lines without keeping cell heterogeneities, since cell heterogeneity could profoundly affect haplotype assembly results. Herein, using single-cell genome long-read sequencing technology (SMOOTH-seq), we have sequenced K562 and HG002 cells on PacBio HiFi and Oxford Nanopore Technologies (ONT) platforms and conducted de novo genome assembly. For the first time, we have completed the human genome assembly with high continuity (with NG50 of ∼2 Mb using 95 individual K562 cells) at single-cell levels, and explored the impact of different assemblers and sequencing strategies on genome assembly. With sequencing data from 30 diploid individual HG002 cells of relatively high genome coverage (average coverage ∼41.7%) on ONT platform, the NG50 can reach over 1.3 Mb. Furthermore, with the assembled genome from K562 single-cell dataset, more complete and accurate set of insertion events and complex structural variations could be identified. This study opened a new chapter on the practice of single-cell genome de novo assembly.

翻译

Related Links:

https://academic.oup.com/nar/article-pdf/50/13/7479/45034411/gkac586.pdf