半面阳光 (2022-07-26 14:25):
#paper DOI: 10.1073/pnas.2019768118, 2021 Feb 2;118(5):e2019768118. Genome-wide detection of cytosine methylation by single molecule real-time sequencing. 这篇文章并非一篇最新发表的文献,是香港中文大学卢煜明团队于2021年发表在PANS上一篇研究文献。因为近期在一个学术会议上听到卢煜明教授介绍了这篇文献有关的研究结果,因此拿来研读。这篇文章的核心内容是利用PacBio的SMRT三代测序技术和卷积神经网络来检测DNA的甲基化。胞嘧啶的甲基化修饰,5-Methylcytosine (5mC) 是表观修饰中最重要的一种类型。应用比较广泛的检测CpG测序方法是亚硫酸盐测序(BS-seq)。但是BS-seq有一些不足之处,比如亚硫酸盐会导致DNA降解、还会将DNA序列中非甲基化的胞嘧啶(C)转化为胸腺嘧啶(T),影响后续的比对;而原始序列中C->T的点突变则又无法被亚硫酸盐所修饰。因此,在这篇文献中,作者采用单分子实时测序(Single molecular rea-time sequencing, SMRT sequencing)技术,开发了一个直接检测5mC的方法。这个方法将SMRT测序中的两个关键信息作为输入数据,结合卷积神经网络(CNN)构建了一个称为Holistic Kinetic (HK)Model 的检测方法。关键输入数据包括两个:一是SMRT测序中DNA聚合酶的动态信号(包括单个碱基发出荧光信号的时间和两个连续碱基之间的间隔时间),二是“序列背景”信息,即待检测的一段固定长度的DNA序列信息,这段固定长度的序列被称为一个“检测窗口”。作者首先用全基因组扩增的方法构建了一个非甲基化的数据集(阴性数据集,所有序列几乎都没有甲基化),同时用M.SssI 转甲基酶处理DNA样本构建了一个甲基化(阳性数据集, M.SssI 能够对双链DNA上的所有CpG位点进行甲基化);接着从这两个数据集中各取出一半数据来训练卷积神经网络,剩下的数据用于验证HK model的检测效果。结果显示,用HK model来区分甲基化状态的AUC最高达到了0.97。全基因组范围内在单碱基分辨率水平上检测5mC的敏感性和特异性分别达到90%和94%。研究结果还发现通过调节检测窗口大小和测序深度能够改变HK模型的检测效果。为了平衡下游数据分析与准确性之间的关系,最后选定21nt作为检测窗口的默认值,将10×作为测序深度的默认值。后续,作者采用一段人和小鼠杂交序列验证了HK模型在检测“杂合甲基化”序列(即同一段序列中同时包括甲基化和非甲基化的CpG )的可行性。此外,作者还对BS-seq的检测效果和HK model的检测效果进行了简单的比较研究。看这篇文献的感受一方面是工作量大,二是体现了作者对分子生物学的理论知识和测序技术特点的充分理解和应用。另外,这篇文献的整体研究框架和卢煜明团队以往的研究在思维上有着一脉相承的感觉,都体现了透彻地理解基本理论、灵活地运用测序技术来解决临床检测的难题。
Genome-wide detection of cytosine methylation by single molecule real-time sequencing
翻译
Abstract:
5-Methylcytosine (5mC) is an important type of epigenetic modification. Bisulfite sequencing (BS-seq) has limitations, such as severe DNA degradation. Using single molecule real-time sequencing, we developed a methodology to directly examine 5mC. This approach holistically examined kinetic signals of a DNA polymerase (including interpulse duration and pulse width) and sequence context for every nucleotide within a measurement window, termed the holistic kinetic (HK) model. The measurement window of each analyzed double-stranded DNA molecule comprised 21 nucleotides with a cytosine in a CpG site in the center. We used amplified DNA (unmethylated) and M.SssI-treated DNA (methylated) (M.SssI being a CpG methyltransferase) to train a convolutional neural network. The area under the curve for differentiating methylation states using such samples was up to 0.97. The sensitivity and specificity for genome-wide 5mC detection at single-base resolution reached 90% and 94%, respectively. The HK model was then tested on human-mouse hybrid fragments in which each member of the hybrid had a different methylation status. The model was also tested on human genomic DNA molecules extracted from various biological samples, such as buffy coat, placental, and tumoral tissues. The overall methylation levels deduced by the HK model were well correlated with those by BS-seq ( = 0.99; < 0.0001) and allowed the measurement of allele-specific methylation patterns in imprinted genes. Taken together, this methodology has provided a system for simultaneous genome-wide genetic and epigenetic analyses.
翻译
回到顶部