笑对人生
(2022-07-31 09:15):
#paper Phasing analysis of lung cancer genomes using a long read sequencer. Nat Commun. 2022 Jun 16;13(1):3464. doi: 10.1038/s41467-022-31133-6
背景知识:SNV(单核苷酸位点变异,single nucleotide variant)是指基因组上发生单碱基改变的位点。SNP(单核苷酸多态性,single nucleotide polymorphism)是指基因组上由单个核苷酸变异引起的DNA序列多态性。SNP描述的是个体基因组上发生碱基改变,而SNP更倾向于是一种群体属性。更加易懂的英文:A haplotype is a physical grouping of genomic variants (or polymorphisms) that tend to be inherited together. A specific haplotype typically reflects a unique combination of variants that reside near each other on a chromosome. 单倍型(Haplotype)是指位于一条染色体上某个区域,具有一定相关联等位变异位点的组合。一种组合就代表一种单倍型。对单倍型进行分型,判断变异是否来自同一条染色体的过程称为phasing(又称haplotype estimation)。这里提到的分型或变异,常常是经过比较后得出的结果,在群体遗传学中,这种比较可能是某个个体与群体其他人的比较,或子代和亲本之间的比较,讲述的是进化或变异的结果(自己理解)。等位基因(allele,又称allelomorph)一般指位于一对同源染色体(一条来自父本,一条来自母本)的相同位置上控制相同性状不同状态的一对基因。英文解释:Any one of a series of two or more different genes that may occupy the same locus on a specific chromosome;An allele is a variant form of a gene. 目前的二代或三代测序,测到的reads是来自同一条染色体,因此无法区分某一条序列来自父源还是母源。不过,相对于二代,三代测序可凭借长读长优势,能覆盖大部分相邻的单核苷酸多态性位点,实现更为准确的单倍型分型。另外,三代测序可精准检测拷贝数变异(copy number variant,CNV),以及在进行对序列进行定相的同时,携带甲基化等碱基修饰信息。
研究目的:既往对肿瘤内SNVs、indels和CNV的检测大多是基于二代测序。然而,二代测序因短读长的特点,无法对基因组上高GC、重复序列区域以及染色体大片段变异进行准确识别。因此,利用三代测序技术超长读长的优势,将有助于更加全面地揭示肿瘤内发生的变异事件。最近公布的ONT-Q20+测序技术,可实现>99%的原始reads(单链)准确度,或约Q30的双链(Duplex)准确度。本研究的研究目的就是利用ONT的nanopore技术对非小细胞肺癌进行组织和细胞层面的定相分析、拷贝数变异和染色体碎裂等研究。
研究方法:对来自20名非小细胞肺癌患者的正常组织同时进行二代和三代的全基因组测序,对肿瘤组织只进行三代测序。另外,利用测序中甲基化信号和基于ONT平台的全长转录组测序探究变异与表型的关系。为了进一步探究肿瘤细胞的克隆结构,还对2例样本完成了基于ONT平台的scDNAseq。
研究结果:本研究通过利用二代测序对三代测序数据进行校正,在N50长度超过834 kb定相区块中,实现与公开二代测序的WGS数据库一致性接近99%的SNVs检测。结合甲基化数据和全长转录组测序,仅在两个样本中发现定相区块的变异(单倍型变异)与甲基化修饰和转录调控存在相关。另外,对染色体大片段变异进行分析,发现EGFR突变阳性肺腺癌肿瘤组织存在特有的染色体碎裂事件,揭示了EGFR通路的异常可能会影响端粒酶活性。
Phasing analysis of lung cancer genomes using a long read sequencer
翻译
Abstract:
Chromosomal backgrounds of cancerous mutations still remain elusive. Here, we conduct the phasing analysis of non-small cell lung cancer specimens of 20 Japanese patients. By the combinatory use of short and long read sequencing data, we obtain long phased blocks of 834 kb in N50 length with >99% concordance rate. By analyzing the obtained phasing information, we reveal that several cancer genomes harbor regions in which mutations are unevenly distributed to either of two haplotypes. Large-scale chromosomal rearrangement events, which resemble chromothripsis events but have smaller scales, occur on only one chromosome, and these events account for the observed biased distributions. Interestingly, the events are characteristic of EGFR mutation-positive lung adenocarcinomas. Further integration of long read epigenomic and transcriptomic data reveal that haploid chromosomes are not always at equivalent transcriptomic/epigenomic conditions. Distinct chromosomal backgrounds are responsible for later cancerous aberrations in a haplotype-specific manner.
翻译