响马读paper

一个要求成员每月至少读一篇文献并打卡的学术交流社群

社群文献分享列表

来自用户 颜林林 的文献。

当前共有 113 篇文献分享,本页显示第 1 - 20 篇。
1.
颜林林 (2024-06-19 06:01):
#paper doi:10.1038/s41559-024-02420-w, Nature Ecology & Evolution, 2024, African elephants address one another with individually specific name-like calls. 这篇研究很有意思,作者通过对非洲象发出的声音及其行为进行分析,首次确认了非洲象能够使用个体特定的叫声来识别和称呼其他象。之所以开展这项研究,是因为作者作为生态学家,观察到非洲象拥有广泛的声音交流和丰富的社会关系,因此他推测大象很可能会给彼此起名字。于是他们录制了大象发出的声音,并使用随机森林等机器学习方法,将声音片段与该声音的接收对象建立联系,预测准确度达到27.5%,显著超过作为对照的随机声音的效果。更有趣的是,他们将声音回放给大象,并观察其反应,确认了当大象听到“自己的名字”时,它们会发出更大的叫声,并更快地向扬声器移动。作者认为,这项研究是一个“非常有希望的开端”,将“引出一系列可以研究的其他问题”,比如大象是否也会说出地点的名字,甚至会用第三人称谈论彼此,而这种称呼个体同类的社会需求,很可能会是语言起源的前身。
Abstract:
Personal names are a universal feature of human language, yet few analogues exist in other species. While dolphins and parrots address conspecifics by imitating the calls of the addressee, human … >>>
Personal names are a universal feature of human language, yet few analogues exist in other species. While dolphins and parrots address conspecifics by imitating the calls of the addressee, human names are not imitations of the sounds typically made by the named individual. Labelling objects or individuals without relying on imitation of the sounds made by the referent radically expands the expressive power of language. Thus, if non-imitative name analogues were found in other species, this could have important implications for our understanding of language evolution. Here we present evidence that wild African elephants address one another with individually specific calls, probably without relying on imitation of the receiver. We used machine learning to demonstrate that the receiver of a call could be predicted from the call's acoustic structure, regardless of how similar the call was to the receiver's vocalizations. Moreover, elephants differentially responded to playbacks of calls originally addressed to them relative to calls addressed to a different individual. Our findings offer evidence for individual addressing of conspecifics in elephants. They further suggest that, unlike other non-human animals, elephants probably do not rely on imitation of the receiver's calls to address one another. <<<
翻译
2.
颜林林 (2024-05-25 18:44):
#paper doi:10.1126/science.adh2602, Science, 2024, Molecular cascades and cell type-specific signatures in ASD revealed by single-cell genomics. 这周的Science集中发表了一大波脑科学的单细胞或空间转录组文章,本文是其中一篇。本文是联合使用了单核RNA-seq、单核ATAC-seq和空间转录组,研究遗传变异如何在健康和疾病状态下影响人脑,特别是在孤独症谱系障碍(ASD)中的作用。这是对ASD队列进行了迄今为止规模最大的单细胞分析,包括33例ASD和31例对照(共计64例),取样了这些受试者死后的大脑皮层组织,进行的实验。分析结果揭示了在ASD中从稳态到反应性的细胞状态转变过程,识别了涉及数千个基因的表达差异和转录因子网络,为理解ASD在人脑中的分子变化提供了有力的因果锚点和分子表型。
Abstract:
Genomic profiling in postmortem brain from autistic individuals has consistently revealed convergent molecular changes. What drives these changes and how they relate to genetic susceptibility in this complex condition are … >>>
Genomic profiling in postmortem brain from autistic individuals has consistently revealed convergent molecular changes. What drives these changes and how they relate to genetic susceptibility in this complex condition are not well understood. We performed deep single-nucleus RNA sequencing (snRNA-seq) to examine cell composition and transcriptomics, identifying dysregulation of cell type-specific gene regulatory networks (GRNs) in autism spectrum disorder (ASD), which we corroborated using single-nucleus assay for transposase-accessible chromatin with sequencing (snATAC-seq) and spatial transcriptomics. Transcriptomic changes were primarily cell type specific, involving multiple cell types, most prominently interhemispheric and callosal-projecting neurons, interneurons within superficial laminae, and distinct glial reactive states involving oligodendrocytes, microglia, and astrocytes. Autism-associated GRN drivers and their targets were enriched in rare and common genetic risk variants, connecting autism genetic susceptibility and cellular and circuit alterations in the human brain. <<<
翻译
3.
颜林林 (2024-04-28 14:25):
#paper doi:10.1101/2022.11.29.518309, bioRxiv, 2024, NanoTrans: an integrated computational framework for comprehensive transcriptome analysis with Nanopore direct-RNA sequencing. 这篇预发表文章,开发了一套分析流程NanoTrans,用于Nanopore直接RNA测序(DRS)数据,进行全面的转录组分析,包括各基因及其转录本的聚类、定量、poly-A尾巴长度profiling、RNA修饰分析、融合基因检测等。文章本身在技术上并没有特别的创新,但将各方面的分析步骤,比较全面地整合到一起,提供一站式的功能封装,并以单HTML形式输出结果报告,这对于使用者还是很友好且很有用的。同时,文章在多种真实数据集(包括酵母、拟南芥、人胚胎肾和癌细胞系)上进行了测试,以证明其适用于不同的生物学应用场景。我个人觉得,这种流程开发的工作,其实很难发表得比较好(当经常地,我们又不得不花大量时间来做),想要进一步提升价值,需要更深入地在某些特定场景下进行改进和优化,而不是一味求全,但相应地,针对特定场景的数据所做的优化,会进一步限制流程软件的适用范围,这种时候如果结果不出彩(比如没有一些新奇发现),最终价值也同样会非常受限。
Abstract:
<jats:p>Nanopore direct RNA sequencing (DRS) provides the direct access to native RNA strands with full-length information, shedding light on rich qualitative and quantitative properties of gene expression profiles. Here with … >>>
<jats:p>Nanopore direct RNA sequencing (DRS) provides the direct access to native RNA strands with full-length information, shedding light on rich qualitative and quantitative properties of gene expression profiles. Here with NanoTrans, we present an integrated computational framework that comprehensively covers all major DRS-based application scopes, including isoform clustering and quantification, poly(A) tail length estimation, RNA modification profiling, and fusion gene detection. In addition to its merit in providing such a streamlined one-stop solution, NanoTrans also shines in its workflow-orientated modular design, batch processing capability, all-in-one tabular and graphic report output, as well as automatic installation and configuration supports. Finally, by applying NanoTrans to real DRS datasets of yeast, Arabidopsis, as well as human embryonic kidney and cancer cell lines, we further demonstrated its utility, effectiveness, and efficacy across a wide range of DRS-based application settings.</jats:p> <<<
翻译
4.
颜林林 (2024-03-13 05:35):
#paper doi:10.1101/2024.02.18.580107, 2024, FECDO-Flexible and Efficient Coding for DNA Odyssey. 这篇文献提出了一种新的DNA数据存储编码方法,FECDO(缩写自 Flexible and Efficient Coding for DNA Odyssey),旨在通过高效的数据压缩和灵活的编码策略来减少DNA合成成本,从而促进DNA数据存储技术的实用化。该方法首先使用深度学习方法(分别尝试了无任何先验知识的独立神经网络,以及预训练的语言模型)来提取数据特征,从而把要存储的数据,从独热编码张量(one-hot encoded tensor)转换成为边际概率序列,实现了压缩的过程;该概率序列被映射成为4字母(A、C、G、T)的碱基序列,进而再使用一个层次有限状态机(hierarchical finite state machine)排除掉不适合DNA存储的特殊编码(如连续相同碱基、有特殊二级结构等)。通过上述过程,本文方法通过实测文本和图像数据,对比bzip2方法,提高了12%-26%的压缩效率,这种压缩效率将反映到DNA合成成本的显著降低上,是DNA存储技术的关键问题。同时,本文还尝试将其中一组文字所编码的结果,实际合成为DNA(进行保存),之后使用PCR将目标片段扩增出来,使用NanoPore测序,再解码还原得到原始数据,从整个流程上对方法进行了验证。由于目前文章尚处于bioRxiv preprint(文章提交版本v2),只提供了正文全文和正文图表,并未提供补充材料、方法描述和程序源码,尚有许多实现和结果的细节未公布,我个人比较怀疑该方法的信息容错能力和实测效果,正文中图表上展现的非英语文本和图像的压缩效果看起来也不是很理想,这些都有待文章正式发表后看到相应解答。
Abstract:
<jats:p>DNA has been pursued as a compelling medium for digital data storage during the past decade. While large-scale data storage and random access have been achieved in artificial DNA, the … >>>
<jats:p>DNA has been pursued as a compelling medium for digital data storage during the past decade. While large-scale data storage and random access have been achieved in artificial DNA, the synthesis cost keeps hindering DNA data storage from popularizing into daily life. In this study, we proposed a more efficient paradigm for digital data compressing to DNA, while excluding arbitrary sequence constraints. Both standalone neural networks and pre-trained language models were used to extract the intrinsic patterns of data, and generated probabilistic portrayal, which was then transformed into constraint-free nucleotide sequences with a hierarchical finite state machine. Utilizing these methods, a 12%-26% improvement of compression ratio was realized for various data, which directly translated to up to 26% reduction in DNA synthesis cost. Combined with the progress in DNA synthesis, our methods are expected to facilitate the realization of practical DNA data storage.</jats:p> <<<
翻译
5.
颜林林 (2024-02-29 09:02):
#paper doi:10.1038/s41592-024-02201-0. Nature Methods, 2024, scGPT: toward building a foundation model for single-cell multi-omics using generative AI. 这篇文章使用了生成式AI大模型,来进行单细胞测序数据分析。文章并未自己收集样本和测序,而仅仅依靠已发表的公开数据或来自公共数据库的数据,进行模型训练、工具开发和性能验证,属于典型的纯生信文章,借着生成式AI概念的火热,加上结果性能表现良好,这篇文章发表到了Nature Methods杂志,很值得生信专业者学习和模仿。文章在九个多月前,就已预发表在bioRxiv上,当时整合使用了1000万个细胞的数据,在这次的正式发表版本中,整合的细胞数量增加到了3300万,模型性能也得到了进一步的改进。文章开发的模型名为scGPT,它基于生成式预训练变换器(Transformer)架构的单细胞基础模型,旨在处理和解析大规模的单细胞数据。scGPT模型展示了在多种下游任务中,如细胞类型注释、遗传扰动反应预测、多批次整合以及多组学数据整合等方面的卓越性能。研究的创新点在于首次将基础模型概念应用于单细胞生物学领域,通过自监督预训练和任务特定的微调,有效捕获和理解细胞和基因之间复杂的生物学关系。scGPT利用其强大的学习能力揭示了特定条件下的基因-基因互作,展现了转移学习中的扩展性和上下文效应。相比传统的机器学习模型,大模型能够捕捉到更为细致和全面的生物学特征,尤其是一些长距离依赖和复杂的数据关系,比如隐藏在数据背后的未知细胞类型或细胞相互作用,这大概也是这篇文章将其用于单细胞数据分析的重要出发点。
Abstract:
Generative pretrained models have achieved remarkable success in various domains such as language and computer vision. Specifically, the combination of large-scale diverse datasets and pretrained transformers has emerged as a … >>>
Generative pretrained models have achieved remarkable success in various domains such as language and computer vision. Specifically, the combination of large-scale diverse datasets and pretrained transformers has emerged as a promising approach for developing foundation models. Drawing parallels between language and cellular biology (in which texts comprise words; similarly, cells are defined by genes), our study probes the applicability of foundation models to advance cellular biology and genetic research. Using burgeoning single-cell sequencing data, we have constructed a foundation model for single-cell biology, scGPT, based on a generative pretrained transformer across a repository of over 33 million cells. Our findings illustrate that scGPT effectively distills critical biological insights concerning genes and cells. Through further adaptation of transfer learning, scGPT can be optimized to achieve superior performance across diverse downstream applications. This includes tasks such as cell type annotation, multi-batch integration, multi-omic integration, perturbation response prediction and gene network inference. <<<
翻译
6.
颜林林 (2024-01-30 09:58):
#paper doi:10.1101/2023.12.20.570816. bioRxiv, 2024, Neoantigen Cancer Vaccines and Different Immune Checkpoint Therapies Each Utilize Both Converging and Distinct Mechanisms that in Combination Enable Synergistic Therapeutic Efficacy. 本文使用甲基胆蒽烷(Methylcholanthrene,简称MCA,一种化学致癌物)诱导基因工程小鼠,构造了肉瘤动物模型,并以此作为研究体系,比较了新抗原疫苗和抗CTLA-4/抗PD-1的疗效。两种疗法都能促进肿瘤内特定CD8 T细胞的扩张,而新抗原疫苗的效应更为显著。文章通过单细胞转录组测序和单细胞免疫组库测序,分析了不同疗法导致了免疫微环境变化,揭示了这些细胞克隆型扩张与特定免疫治疗相关的表型和功能状态。新抗原疫苗与ICT联合使用显示出比单独使用任一治疗更高的疗效,为联合使用肿瘤免疫和免疫检查点治疗方法提供了证据支持。
Abstract:
The goal of therapeutic cancer vaccines and immune checkpoint therapy (ICT) is to eliminate cancer by expanding and/or sustaining T cells with anti-tumor capabilities. However, whether cancer vaccines and ICT … >>>
The goal of therapeutic cancer vaccines and immune checkpoint therapy (ICT) is to eliminate cancer by expanding and/or sustaining T cells with anti-tumor capabilities. However, whether cancer vaccines and ICT enhance anti-tumor immunity by distinct or overlapping mechanisms remains unclear. Here, we compared effective therapeutic tumor-specific mutant neoantigen (NeoAg) cancer vaccines with anti-CTLA-4 and/or anti-PD-1 ICT in preclinical models. Both NeoAg vaccines and ICT induce expansion of intratumoral NeoAg-specific CD8 T cells, though the degree of expansion and acquisition of effector activity was much more substantial following NeoAg vaccination. Further, we found that NeoAg vaccines are particularly adept at inducing proliferating and stem-like NeoAg-specific CD8 T cells. Single cell T cell receptor (TCR) sequencing revealed that TCR clonotype expansion and diversity of NeoAg-specific CD8 T cells relates to their phenotype and functional state associated with specific immunotherapies employed. Effective NeoAg vaccines and ICT required both CD8 and CD4 T cells. While NeoAg vaccines and anti-PD-1 affected the CD4 T cell compartment, it was to less of an extent than observed with anti-CTLA-4, which notably induced ICOSBhlhe40 Th1-like CD4 T cells and, when combined with anti-PD-1, a small subset of Th2-like CD4 T cells. Although effective NeoAg vaccines or ICT expanded intratumoral M1-like iNOS macrophages, NeoAg vaccines expanded rather than suppressed (as observed with ICT) M2-like CX3CR1CD206 macrophages, associated with the vaccine adjuvant. Further, combining NeoAg vaccination with ICT induced superior efficacy compared to either therapy in isolation, highlighting the utility of combining these modalities to eliminate cancer. <<<
翻译
7.
颜林林 (2023-12-27 12:42):
#paper doi:10.1101/2023.10.04.560604. bioRxiv, 2023, Federated Learning for multi-omics: a performance evaluation in Parkinson's disease. 这篇文章基于两个帕金森病研究的数据集(PPMI和PDBP),这两个数据集都入组了数百例患者和对照健康人,分别都进行了WGS和RNA-seq,获得了多组学检测的分析特征结果。通过将PPMI拆分为K折,留出一折后所剩余K-1折用于模型训练,再将模型放到PPMI预先留出的一折数据和PBMP上进行测试和性能评估。建模分别使用了集中化的机器学习方法,以及将数据拆分到多个节点(site)以采取联邦学习法,并使用了不同的联邦学习策略。结果显示,虽然样本在不同site的分散程度、联邦学习的策略等都会对最终性能有所影响,但联邦学习的最优结果,能与集中化训练的性能相当。此外,本文对联邦学习的训练时间进行评估,比集中化的方法至少高出一个数量级。虽然如此,由于联邦学习可以避免大规模数据在不同sites之间分享和传输,对于整合更广泛的数据,提升模型性能,还是有优势的。提供了对联邦学习在多组学和特别是在帕金森病预测中的应用的深入分析,展示了其作为一种协作工具在处理大规模异构数据时的潜力和挑战。
Abstract:
While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML … >>>
While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated Learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's Disease prediction. We find that FL model performance tracks centrally trained ML models, where the most performant FL model achieves an AUC-PR of 0.876 ± 0.009, 0.014 ± 0.003 less than its centrally trained variation. We also determine that the dispersion of samples within a federation plays a meaningful role in model performance. Our study implements several open source FL frameworks and aims to highlight some of the challenges and opportunities when applying these collaborative methods in multi-omics studies. <<<
翻译
8.
颜林林 (2023-11-25 09:20):
#paper doi:10.1016/S2589-7500(23)00219-4. The Lancet Digital Health. 2023, Operational greenhouse-gas emissions of deep learning in digital pathology: a modelling study. 这篇文章其实是针对数字病理学中大量应用深度学习技术的现状,对不同模型和分析策略的使用,进行能耗分析。而在能耗评估时,文章通过IP地址关联到相应电厂,结合该地区能源供应中可持续能源的占比等信息,换算成二氧化碳排放量,作为对环境可持续的影响评估。其研究结果显示,全球范围内深度学习在病理学中的广泛应用,可能导致大量的温室气体排放,最高可达16兆吨二氧化碳当量,需要大约86,590平方公里(0.22%)的森林来吸收这些排放,并由此呼吁大家关注相关问题。对于模型之间的比较,结果自然是使用升级后的模型或更简单的模型,可以降低能耗,从而有助于应对环境可持续问题。我个人对这种将能耗外推至碳排放的做法难以苟同,毕竟这中间的影响因素太多,将其过度简化的过程中,很容易人为操纵和引导结论,不过,这篇论文在能耗对比评估的部分还是有可取之处的。
Abstract:
BACKGROUND: Deep learning is a promising way to improve health care. Image-processing medical disciplines, such as pathology, are expected to be transformed by deep learning. The first clinically applicable deep-learning … >>>
BACKGROUND: Deep learning is a promising way to improve health care. Image-processing medical disciplines, such as pathology, are expected to be transformed by deep learning. The first clinically applicable deep-learning diagnostic support tools are already available in cancer pathology, and their number is increasing. However, data on the environmental sustainability of these tools are scarce. We aimed to conduct an environmental-sustainability analysis of a theoretical implementation of deep learning in patient-care pathology. METHODS: For this modelling study, we first assembled and calculated relevant data and parameters of a digital-pathology workflow. Data were breast and prostate specimens from the university clinic at the Institute of Pathology of the Rheinisch-Westfälische Technische Hochschule Aachen (Aachen, Germany), for which commercially available deep learning was already available. Only specimens collected between Jan 1 and Dec 31, 2019 were used, to omit potential biases due to the COVID-19 pandemic. Our final selection was based on 2 representative weeks outside holidays, covering different types of specimens. To calculate carbon dioxide (CO2) or CO2 equivalent (CO2 eq) emissions of deep learning in pathology, we gathered relevant data for exact numbers and sizes of whole-slide images (WSIs), which were generated by scanning histopathology samples of prostate and breast specimens. We also evaluated different data input scenarios (including all slide tiles, only tiles containing tissue, or only tiles containing regions of interest). To convert estimated energy consumption from kWh to CO2 eq, we used the internet protocol address of the computational server and the Electricity Maps database to obtain information on the sources of the local electricity grid (ie, renewable vs non-renewable), and estimated the number of trees and proportion of the local and world's forests needed to sequester the CO2 eq emissions. We calculated the computational requirements and CO2 eq emissions of 30 deep-learning models that varied in task and size. The first scenario represented the use of one commercially available deep-learning model for one task in one case (1-task), the second scenario considered two deep-learning models for two tasks per case (2-task), the third scenario represented a future, potentially automated workflow that could handle 7 tasks per case (7-task), and the fourth scenario represented the use of a single potential, large, computer-vision model that could conduct multiple tasks (multitask). We also compared the performance (ie, accuracy) and CO2 eq emissions of different deep-learning models for the classification of renal cell carcinoma on WSIs, also from Rheinisch-Westfälische Technische Hochschule Aachen. We also tested other approaches to reducing CO2 eq emissions, including model pruning and an alternative method for histopathology analysis (pathomics). FINDINGS: The pathology database contained 35 552 specimens (237 179 slides), 6420 of which were prostate specimens (10 115 slides) and 11 801 of which were breast specimens (19 763 slides). We selected and subsequently digitised 140 slides from eight breast-cancer cases and 223 slides from five prostate-cancer cases. Applying large deep-learning models on all WSI tiles of prostate and breast pathology cases would result in yearly CO2 eq emissions of 7·65 metric tons (t; 95% CI 7·62-7·68) with the use of a single deep-learning model per case; yearly CO2 eq emissions were up to 100·56 t (100·21-100·99) with the use of seven deep-learning models per case. CO2 eq emissions for different deep-learning model scenarios, data inputs, and deep-learning model sizes for all slides varied from 3·61 t (3·59-3·63) to 2795·30 t (1177·51-6482·13. For the estimated number of overall pathology cases worldwide, the yearly CO2 eq emissions varied, reaching up to 16 megatons (Mt) of CO2 eq, requiring up to 86 590 km2 (0·22%) of world forest to sequester the CO2 eq emissions. Use of the 7-task scenario and small deep-learning models on slides containing tissue only could substantially reduce CO2 eq emissions worldwide by up to 141 times (0·1 Mt, 95% CI 0·1-0·1). Considering the local environment in Aachen, Germany, the maximum CO2 eq emission from the use of deep learning in digital pathology only would require 32·8% (95% CI 13·8-76·6) of the local forest to sequester the CO2 eq emissions. A single pathomics run on a tissue could provide information that was comparable to or even better than the output of multitask deep-learning models, but with 147 times reduced CO2 eq emissions. INTERPRETATION: Our findings suggest that widespread use of deep learning in pathology might have considerable global-warming potential. The medical community, policy decision makers, and the public should be aware of this potential and encourage the use of CO2 eq emissions reduction strategies where possible. FUNDING: German Research Foundation, European Research Council, German Federal Ministry of Education and Research, Health, Economic Affairs and Climate Action, and the Innovation Fund of the Federal Joint Committee. <<<
翻译
9.
颜林林 (2023-10-27 12:22):
#paper doi:10.1038/s41592-023-02043-2. Nature Methods, 2023, Comprehensive benchmarking and guidelines of mosaic variant calling strategies. 本文是一篇方法学评估对比的文章,对11个嵌合体突变鉴定工具(这其中也包括我读博期间参与的MosaicHunter)进行了系统评估。嵌合体突变是精卵结合形成合子后,在生物个体发育早期发生的一类体细胞突变,这类突变会随着发育和器官形成,被携带并分布到生物个体的不同部位。本文使用预先确定了胚系突变信息的细胞系,分步骤进行混合,以模拟生物个体早期不同阶段发生的嵌合体突变,由此得到一组拥有不同频率嵌合体突变结果(ground truth)的参考样品,用来测试和评估各鉴定工具(这个参考品制备方法,在过去几年里,也被我们用于癌症基因检测产品研发,对体细胞突变鉴定进行技术验证)。本文的评估结果显示,嵌合体突变鉴定,很大程度上取决于研究目的(及由此考虑的假设条件),根据不同目的所选择的工具及参数,可能对结果产生较大影响,本文根据评估结果对不同工具的特点进行了描述,为后续其他关于嵌合体突变的研究,以及分析工具开发,提供了参考指导和建议。
Abstract:
Rapid advances in sequencing and analysis technologies have enabled the accurate detection of diverse forms of genomic variants represented as heterozygous, homozygous and mosaic mutations. However, the best practices for … >>>
Rapid advances in sequencing and analysis technologies have enabled the accurate detection of diverse forms of genomic variants represented as heterozygous, homozygous and mosaic mutations. However, the best practices for mosaic variant calling remain disorganized owing to the technical and conceptual difficulties faced in evaluation. Here we present our benchmark of 11 feasible mosaic variant detection approaches based on a systematically designed whole-exome-level reference standard that mimics mosaic samples, supported by 354,258 control positive mosaic single-nucleotide variants and insertion-deletion mutations and 33,111,725 control negatives. We identified not only the best practice for mosaic variant detection but also the condition-dependent strengths and weaknesses of the current methods. Furthermore, feature-level evaluation and their combinatorial usage across multiple algorithms direct the way for immediate to prolonged improvements in mosaic variant detection. Our results will guide researchers in selecting suitable calling algorithms and suggest future strategies for developers. <<<
翻译
10.
颜林林 (2023-09-27 09:43):
#paper doi:10.1038/s41467-023-41690-z. 2023, Nature Communications, Genome-wide enhancer-gene regulatory maps link causal variants to target genes underlying human cancer risk. 这篇文章使用了一种名为 Activity-by-Contact (ABC) 的计算方法,在20个癌种的已发表的多组学测序数据中进行分析,识别出54万多个“增强子-基因调控(Enhancer-gene regulation)”关系对,为解释这其中的非编码区突变功能提供了基础。此后又入组10例结直肠癌(CRC)临床样本,也进行多组学检测和上述调控关系对的鉴别,并将发现结果放到数万例的大规模人群中进行验证。此外,还进一步对其中发现的与CRC风险相关的调控区突变位点rs4810856,使用细胞系、小鼠模型等,在基因表达、蛋白表达等层面,分别进行了功能上的验证。整篇文章从逻辑上看并不特别连贯,但工作量比较大,更像是一开始入组了10例癌症患者的临床样本,做了多组学测序,之后在分析数据结果基础上不断扩展完善,最后拼凑出来的故事。
Abstract:
Genome-wide association studies have identified numerous variants associated with human complex traits, most of which reside in the non-coding regions, but biological mechanisms remain unclear. However, assigning function to the … >>>
Genome-wide association studies have identified numerous variants associated with human complex traits, most of which reside in the non-coding regions, but biological mechanisms remain unclear. However, assigning function to the non-coding elements is still challenging. Here we apply Activity-by-Contact (ABC) model to evaluate enhancer-gene regulation effect by integrating multi-omics data and identified 544,849 connections across 20 cancer types. ABC model outperforms previous approaches in linking regulatory variants to target genes. Furthermore, we identify over 30,000 enhancer-gene connections in colorectal cancer (CRC) tissues. By integrating large-scale population cohorts (23,813 cases and 29,973 controls) and multipronged functional assays, we demonstrate an ABC regulatory variant rs4810856 associated with CRC risk (Odds Ratio = 1.11, 95%CI = 1.05-1.16, P = 4.02 × 10) by acting as an allele-specific enhancer to distally facilitate PREX1, CSE1L and STAU1 expression, which synergistically activate p-AKT signaling. Our study provides comprehensive regulation maps and illuminates a single variant regulating multiple genes, providing insights into cancer etiology. <<<
翻译
11.
颜林林 (2023-08-30 08:09):
#paper doi:10.1016/j.crmeth.2023.100547. Cell Reports Methods, 2023, An introduction to representation learning for single-cell data analysis. 机器学习方法的效果常依赖于数据质量,也与所选择的特征(即数据的表示方法)有关,而表示学习(representation learning)能够通过模型自身去学习数据的表示,这在有足够数据的情况下是非常适合的。单细胞测序数据分析正好是这样一个场景。本文综述了单细胞测序数据分析各个环节(包括数据预处理、超参数优化、下游分析、生物学验证等)中,表示学习方法的应用及应注意的关键点。
Abstract:
Single-cell-resolved systems biology methods, including omics- and imaging-based measurement modalities, generate a wealth of high-dimensional data characterizing the heterogeneity of cell populations. Representation learning methods are routinely used to analyze … >>>
Single-cell-resolved systems biology methods, including omics- and imaging-based measurement modalities, generate a wealth of high-dimensional data characterizing the heterogeneity of cell populations. Representation learning methods are routinely used to analyze these complex, high-dimensional data by projecting them into lower-dimensional embeddings. This facilitates the interpretation and interrogation of the structures, dynamics, and regulation of cell heterogeneity. Reflecting their central role in analyzing diverse single-cell data types, a myriad of representation learning methods exist, with new approaches continually emerging. Here, we contrast general features of representation learning methods spanning statistical, manifold learning, and neural network approaches. We consider key steps involved in representation learning with single-cell data, including data pre-processing, hyperparameter optimization, downstream analysis, and biological validation. Interdependencies and contingencies linking these steps are also highlighted. This overview is intended to guide researchers in the selection, application, and optimization of representation learning strategies for current and future single-cell research applications. <<<
翻译
12.
颜林林 (2023-07-25 00:17):
#paper doi:10.1038/s41588-023-01452-5. Nature Genetics, 2023, Crosstalk between RNA m6A and DNA methylation regulates transposable element chromatin activation and cell fate in human pluripotent stem cells. 这篇文章开发了一种名为CARGO-BioID的方法,基于CRISPR技术 和 蛋白质邻位连接技术(Proximity Ligation Assay,PLA),能够抓取与基因组上特定转座元件(Transposable elements,TEs)区域序列相结合的蛋白,并通过质谱和ChIP-seq等实验,对这些蛋白进行鉴定和定量检测。文章以LTR7/HRV-H为目标,这是个灵长目特有的TE序列,通过上述技术方法,识别出与之结合的蛋白,其中包括 YTHDC2 和 TET1 这两个蛋白,前者是RNA m6A甲基化的读取器(reader),后者则是DNA 5mC甲基化的去甲基酶。随后,文章又利用一系列细胞实验,证实了这两个蛋白在该基因组区域上的生物学作用,包括相应的RNA甲基化与DNA甲基化之间的相互作用(crosstalk)、它们对TE活性的调控、以及对hPSC(人多能干细胞)分化命运的影响等。
Abstract:
Transposable elements (TEs) are parasitic DNA sequences accounting for over half of the human genome. Tight control of the repression and activation states of TEs is critical for genome integrity, … >>>
Transposable elements (TEs) are parasitic DNA sequences accounting for over half of the human genome. Tight control of the repression and activation states of TEs is critical for genome integrity, development, immunity and diseases, including cancer. However, precisely how this regulation is achieved remains unclear. Here we develop a targeted proteomic proximity labeling approach to capture TE-associated proteins in human embryonic stem cells (hESCs). We find that the RNA N-methyladenosine (mA) reader, YTHDC2, occupies genomic loci of the primate-specific TE, LTR7/HERV-H, specifically through its interaction with mA-modified HERV-H RNAs. Unexpectedly, YTHDC2 recruits the DNA 5-methylcytosine (5mC)-demethylase, TET1, to remove 5mC from LTR7/HERV-H and prevent epigenetic silencing. Functionally, the YTHDC2/LTR7 axis inhibits neural differentiation of hESCs. Our results reveal both an underappreciated crosstalk between RNA mA and DNA 5mC, the most abundant regulatory modifications of RNA and DNA in eukaryotes, and the fact that in hESCs this interplay controls TE activity and cell fate. <<<
翻译
13.
颜林林 (2023-06-24 21:59):
#paper doi:10.1093/nar/gkad526 Nucleic Acids Research, 2023, Precise characterization of somatic complex structural variations from tumor/control paired long-read sequencing data with nanomonsv. 这是一篇生信文章,作者开发了一个工具nanomonsv,基于配对的肿瘤和对照样本的三代测序数据,鉴定构变异(SV)。该程序包括两个模块:Canonical SV module 和 Single breakend SV module,前者采取寻找跨越断点的多条支持reads的策略,后者则先对断点单侧的序列进行合并,再通过soft clip部分去寻找(可能在基因组上缺失或难以判定)的另一侧序列。通过对这两种策略的实现、优化和整合,提高了对SV的鉴定性能。本文在三个肿瘤细胞系样本(及其对应对照样本)的三代数据上,对所开发的工具进行了实测和评估,并使用PCR方法对部分结果进行了验证。此外,本文还对甲基化、重复序列、移动元件、病毒序列整合等序列特性进行了分析,以进一步充实文章的内容。
Abstract:
We present our novel software, nanomonsv, for detecting somatic structural variations (SVs) using tumor and matched control long-read sequencing data with a single-base resolution. The current version of nanomonsv includes … >>>
We present our novel software, nanomonsv, for detecting somatic structural variations (SVs) using tumor and matched control long-read sequencing data with a single-base resolution. The current version of nanomonsv includes two detection modules, Canonical SV module, and Single breakend SV module. Using tumor/control paired long-read sequencing data from three cancer and their matched lymphoblastoid lines, we demonstrate that Canonical SV module can identify somatic SVs that can be captured by short-read technologies with higher precision and recall than existing methods. In addition, we have developed a workflow to classify mobile element insertions while elucidating their in-depth properties, such as 5' truncations, internal inversions, as well as source sites for 3' transductions. Furthermore, Single breakend SV module enables the detection of complex SVs that can only be identified by long-reads, such as SVs involving highly-repetitive centromeric sequences, and LINE1- and virus-mediated rearrangements. In summary, our approaches applied to cancer long-read sequencing data can reveal various features of somatic SVs and will lead to a better understanding of mutational processes and functional consequences of somatic SVs. <<<
翻译
14.
颜林林 (2023-05-11 22:05):
#paper doi:10.3389/fneur.2023.1036453 Frontiers in Neurology, 2023, Mechanism of Qihuang needle therapy in the management of tic disorders: a clinical trial protocol. 这是一篇关于针灸治疗方法开展注册临床试验的文章(在中国临床试验注册中心注册,编号:ChiCTR2200057723,网址:https://www.chictr.org.cn/showproj.html?proj=161252)。该临床试验于2022年3月至2023年9月期间开展,通过岐黄针对抽动障碍的儿童进行干预性治疗(预计入组干预组和对照组各20例),并随访患儿12周,除相关临床量表评估外,还采集外周血并通过质谱和ELISA实验测定其血浆中的代谢物和蛋白,以及采集粪便进行16S rRNA的微生物组分析,以期通过这些多组学数据来探索针灸治疗的作用机制。临床试验尚未结束,本文的投稿时间为2022年9月(历经半年至2023年4月接收),为提前公开详细披露其临床试验具体操作流程,因此仅有方法描述,而无结果数据。虽然很难想象代谢物、蛋白及肠道菌群等信息,要与针灸对神经类疾病治疗的作用机制该如何关联,但中医讲究整体系统论,万事万物皆有联系,既然在肉眼可见的解剖学水平难以建立直接联系,那间接通过不同多组学的尝试来观察和早期探索,或许也是一种不错的尝试。先期待下临床试验完成后的数据分析结果的公开发表吧。
Abstract:
Background: Qihuang needle therapy is a newly developed acupuncture therapy to treat tic disorders in clinical practice. However, the mechanism to reduce tic severity remains unknown. Changes in intestinal flora … >>>
Background: Qihuang needle therapy is a newly developed acupuncture therapy to treat tic disorders in clinical practice. However, the mechanism to reduce tic severity remains unknown. Changes in intestinal flora and circulation metabolites are perhaps the potential pathogenesis of tic disorders. As a result, we present a protocol for a controlled clinical trial using multi-omics analysis to probe the mechanism of the Qihuang needle in managing tic disorders. Methods: This is a matched-pairs design, controlled, clinical trial for patients with tic disorders. Participants will be allocated to either an experimental group or a healthy control group. The main acupoints are Baihui (GV20), Yintang (EX-HN3), and Jueyinshu (BL14). The experimental group will receive Qihuang needle therapy for a month, while the control group will receive no interventions. Expected outcomes: The change in the severity of the tic disorder is set as the main outcome. Secondary outcomes include gastrointestinal severity index and recurrence rate, which will be calculated after a 12-week follow-up. Gut microbiota, measured by 16S rRNA gene sequencing; serum metabolomics, assessed via LC/MS; and serum zonulin, assessed by enzyme-linked immunosorbent assay (ELISA), will be used as biological specimen analysis outcomes. The present study will investigate the possible interactions between intestinal flora and serum metabolites and the improvement of clinical profiles, which may elucidate the mechanism of Qihuang needle therapy for tic disorders. Trial registration: This trial is registered at the Chinese Clinical Trial Registry (http://www.chictr.org.cn/). Registration number: ChiCTR2200057723, Date: 2022-04-14. <<<
翻译
15.
颜林林 (2023-05-06 00:47):
#paper doi:10.1056/NEJMoa2212856 The New England Journal of Medicine, 2023, Interrupting Endocrine Therapy to Attempt Pregnancy after Breast Cancer. 乳腺癌患者在接受手术后,若为HR阳性,则会继续开展数年的内分泌辅助治疗,以巩固和改善疗效,避免和降低疾病复发。在辅助治疗期间,患者若打算怀孕生娃,则需要停止原定的内分泌治疗,这是否会导致严重的负面影响,尚无明确结论。这篇研究就是针对此情况开展的注册临床试验(NCT02308085),入组了2014至2019年期间符合条件的518位患者,她们都因怀孕计划中断了内分泌治疗,对她们进行持续随访,分析其乳腺癌复发事件及生育情况。目前已达到次要终点,复发事件数未超过预定安全阈值,数据进行锁定和分析。与外部一个1499例的未中断内分泌治疗的乳腺癌队列进行对比,复发事件的发生率并无明显区别。初步支持为尝试怀孕而暂停治疗不会产生明显的短期负面影响。对这些患者的随访还在继续,以便未来对相应产生的长期影响做出评估和结论。
Abstract:
BACKGROUND: Prospective data on the risk of recurrence among women with hormone receptor-positive early breast cancer who temporarily discontinue endocrine therapy to attempt pregnancy are lacking. METHODS: We conducted a … >>>
BACKGROUND: Prospective data on the risk of recurrence among women with hormone receptor-positive early breast cancer who temporarily discontinue endocrine therapy to attempt pregnancy are lacking. METHODS: We conducted a single-group trial in which we evaluated the temporary interruption of adjuvant endocrine therapy to attempt pregnancy in young women with previous breast cancer. Eligible women were 42 years of age or younger; had had stage I, II, or III disease; had received adjuvant endocrine therapy for 18 to 30 months; and desired pregnancy. The primary end point was the number of breast cancer events (defined as local, regional, or distant recurrence of invasive breast cancer or new contralateral invasive breast cancer) during follow-up. The primary analysis was planned to be performed after 1600 patient-years of follow-up. The prespecified safety threshold was the occurrence of 46 breast cancer events during this period. Breast cancer outcomes in this treatment-interruption group were compared with those in an external control cohort consisting of women who would have met the entry criteria for the current trial. RESULTS: Among 516 women, the median age was 37 years, the median time from breast cancer diagnosis to enrollment was 29 months, and 93.4% had stage I or II disease. Among 497 women who were followed for pregnancy status, 368 (74.0%) had at least one pregnancy and 317 (63.8%) had at least one live birth. In total, 365 babies were born. At 1638 patient-years of follow-up (median follow-up, 41 months), 44 patients had a breast cancer event, a result that did not exceed the safety threshold. The 3-year incidence of breast cancer events was 8.9% (95% confidence interval [CI], 6.3 to 11.6) in the treatment-interruption group and 9.2% (95% CI, 7.6 to 10.8) in the control cohort. CONCLUSIONS: Among select women with previous hormone receptor-positive early breast cancer, temporary interruption of endocrine therapy to attempt pregnancy did not confer a greater short-term risk of breast cancer events, including distant recurrence, than that in the external control cohort. Further follow-up is critical to inform longer-term safety. (Funded by ETOP IBCSG Partners Foundation and others; POSITIVE ClinicalTrials.gov number, NCT02308085.). <<<
翻译
16.
颜林林 (2023-04-30 10:31):
#paper doi:10.1109/TNB.2023.3254514 IEEE transactions on nanobioscience, 2023, RBS: A Rotational Coding Based on Blocking Strategy for DNA Storage. 利用DNA作为介质研发数据存储方案,是近几年的热点之一,许多研究所和公司都竞相开展,但投入和进展却层次不齐。这也是我个人比较感兴趣的方向之一,因此关注到最近刚发表出来的这篇文章,顺便点评一下。虽然这篇文章并不算多出彩,也没有什么重大突破,但它是一篇纯算法的概念验证工作,不涉及到分子实验,倒是比较适合我这种业余感兴趣者效仿。用DNA介质存储数据,面临各种现实问题,比如GC含量需要限制在一定范围,过高或过低的GC含量,都会在合成和测序上导致问题,再比如不能有连续重复片段等。也因此,对数据进行DNA字母的编码,不能简单随便设置某种一一对应规则,而需要同时考虑各类分子特性限制。本文提出了一种数据编解码算法RBS,并使用文本、图片数据测试,评估诸如GC含量、重复片段数量、汉明距离、自由能等,以确认该算法用于DNA存储的可行性和效率。
Abstract:
The data volume of global information has grown exponentially in recent years, but the development of silicon-based memory has entered a bottleneck period. Deoxyribonucleic acid (DNA) storage is drawing attention … >>>
The data volume of global information has grown exponentially in recent years, but the development of silicon-based memory has entered a bottleneck period. Deoxyribonucleic acid (DNA) storage is drawing attention owing to its advantages of high storage density, long storage time, and easy maintenance. However, the base utilization and information density of existing DNA storage methods are insufficient. Therefore, this study proposes a rotational coding based on blocking strategy (RBS) for encoding digital information such as text and images in DNA data storage. This strategy satisfies multiple constraints and produces low error rates in synthesis and sequencing. To illustrate the superiority of the proposed strategy, it was compared and analyzed with existing strategies in terms of entropy value change, free energy size, and Hamming distance. The experimental results show that the proposed strategy has higher information storage density and better coding quality in DNA storage, so it will improve the efficiency, practicality, and stability of DNA storage. <<<
翻译
17.
颜林林 (2023-03-12 15:29):
#paper doi:10.1016/j.celrep.2023.112230 Cell Reports, 2023, FAM193A is a positive regulator of p53 activity. 这是一篇典型的关于药物敏感机制探索的细胞学研究,通过分子细胞生物学方法和高通量筛选技术,找到一个新调控基因,并确认其功能。癌症研究中最著名的基因当属TP53(其蛋白则称为p53),这是个抑癌基因,在癌组织中常表现出发生突变或被异常调控。针对其抑制型调控蛋白(如MDM2和MDM4),设计的化合物抑制剂,可激活或促进p53功能,进而达到治疗癌症的目的。Nutlin正是这样的候选药物分子。然而Nutlin在不同细胞系或患者中的表现却差异巨大,其作用机制尚待深入研究。这篇论文通过对药物敏感数据库的分析,以及采用CRISPR screening技术,在多个不同细胞系中进行高通量筛选,识别出FAM193A蛋白,其与Nutlin药物敏感性密切相关,并通过一系列证据,证明FAM193A在p53通路中起到正向调节作用,为后续机制研究和药物开发提供了新的方向。
Abstract:
Inactivation of the p53 tumor suppressor, either by mutations or through hyperactivation of repressors such as MDM2 and MDM4, is a hallmark of cancer. Although many inhibitors of the p53-MDM2/4 … >>>
Inactivation of the p53 tumor suppressor, either by mutations or through hyperactivation of repressors such as MDM2 and MDM4, is a hallmark of cancer. Although many inhibitors of the p53-MDM2/4 interaction have been developed, such as Nutlin, their therapeutic value is limited by highly heterogeneous cellular responses. We report here a multi-omics investigation of the cellular response to MDM2/4 inhibitors, leading to identification of FAM193A as a widespread regulator of p53 function. CRISPR screening identified FAM193A as necessary for the response to Nutlin. FAM193A expression correlates with Nutlin sensitivity across hundreds of cell lines. Furthermore, genetic codependency data highlight FAM193A as a component of the p53 pathway across diverse tumor types. Mechanistically, FAM193A interacts with MDM4, and FAM193A depletion stabilizes MDM4 and inhibits the p53 transcriptional program. Last, FAM193A expression is associated with better prognosis in multiple malignancies. Altogether, these results identify FAM193A as a positive regulator of p53. <<<
翻译
18.
颜林林 (2023-03-02 07:38):
#paper doi:10.1016/j.csbj.2023.02.016 Computational and Structural Biotechnology Journal, 2023, DNAsmart: Multiple attribute ranking tool for DNA data storage systems. 将DNA用作存储介质,已经逐渐成为一个热门的研究方向。由于DNA在读取(测序)和写入(合成)过程中,受到其自身特性和其他环境体系不同因素的影响,存在各类错误。这篇研究提供了一个网站工具DNAsmart,以交互式的方式,可视化地展示核酸片段之间诸如GC含量、汉明距离等不同属性,帮助研究者探索如何有效利用和平衡这些属性的影响,以设计出更合适的DNA存储的编解码方案。
Abstract:
In an ever-growing need for data storage capacity, the Deoxyribonucleic Acid (DNA) molecule gains traction as a new storage medium with a larger capacity, higher density, and a longer lifespan … >>>
In an ever-growing need for data storage capacity, the Deoxyribonucleic Acid (DNA) molecule gains traction as a new storage medium with a larger capacity, higher density, and a longer lifespan over conventional storage media. To effectively use DNA for data storage, it is important to understand the different methods of encoding information in DNA and compare their effectiveness. This requires evaluating which decoded DNA sequences carry the most encoded information based on various attributes. However, navigating the field of coding theory requires years of experience and domain expertise. For instance, domain experts rely on various mathematical functions and attributes to score and evaluate their encodings. To enable such analytical tasks, we provide an interactive and visual analytical framework for multi-attribute ranking in DNA storage systems. Our framework follows a three-step view with user-settable parameters. It enables users to find the optimal en-/de-coding approaches by setting different weights and combining multiple attributes. We assess the validity of our work through a task-specific user study on domain experts by relying on three tasks. Results indicate that all participants completed their tasks successfully under two minutes, then rated the framework for design choices, perceived usefulness, and intuitiveness. In addition, two real-world use cases are shared and analyzed as direct applications of the proposed tool. DNAsmart enables the ranking of decoded sequences based on multiple attributes. In sum, this work unveils the evaluation of en-/de-coding approaches accessible and tractable through visualization and interactivity to solve comparison and ranking tasks. <<<
翻译
19.
颜林林 (2023-02-27 21:12):
#paper doi:10.3390/ijms24043588 International Journal of Molecular Sciences, 2023, A DNA Finite-State Machine Based on the Programmable Allosteric Strategy of DNAzyme. 本研究利用核酶(一种具有特定序列和构象的DNA分子,本身具有切割特定核酸片段的催化能力)的特性,构建了一个具有不同状态的纳米机器体系,通过加入不同的核酸分子(作为输入),使体系中发生链置换反应,从而使人工设计的核酶分子,可逆地改变为不同状态,并通过切割报告核酸分子输出荧光信号进行确认,从实验上验证了用DNA分子实现有限状态机的可行性。除了实时监测反应体系的荧光信号外,本研究也通过电泳对体系中存在的各个核酸分子进行了确认。本研究分别实现了两状态和五状态的有限状态机,从概念上验证了,可以通过增加不同序列的核酸分子,实现状态机的状态数量扩展,可据此进一步研发更复杂的DNA纳米分子机器。
Abstract:
Living organisms can produce corresponding functions by responding to external and internal stimuli, and this irritability plays a pivotal role in nature. Inspired by such natural temporal responses, the development … >>>
Living organisms can produce corresponding functions by responding to external and internal stimuli, and this irritability plays a pivotal role in nature. Inspired by such natural temporal responses, the development and design of nanodevices with the ability to process time-related information could facilitate the development of molecular information processing systems. Here, we proposed a DNA finite-state machine that can dynamically respond to sequential stimuli signals. To build this state machine, a programmable allosteric strategy of DNAzyme was developed. This strategy performs the programmable control of DNAzyme conformation using a reconfigurable DNA hairpin. Based on this strategy, we first implemented a finite-state machine with two states. Through the modular design of the strategy, we further realized the finite-state machine with five states. The DNA finite-state machine endows molecular information systems with the ability of reversible logic control and order detection, which can be extended to more complex DNA computing and nanomachines to promote the development of dynamic nanotechnology. <<<
翻译
20.
颜林林 (2023-01-01 22:47):
#paper doi:10.1186/s13059-022-02816-6 Genome Biology, 2022, Structural variant analysis of a cancer reference cell line sample using multiple sequencing technologies. 结构变异(SV)检测一直是基因组研究中充满挑战的一项工作。本文来自SEQC2(Sequencing Quality Control Phase 2)consortium。通过来自同一捐献者的乳腺癌组织及对照样本(外周血白细胞),分别构建了细胞系,作为研究材料。分别使用Illumina短读长测序、10x linked-reads测序、PacBio 和 Nanopore 长读长测序,以及 Hi-C测序,由此整合并最终鉴定出1788个SV。之后,又使用PCR方法、芯片方法、Bionano光学图谱、RNA-seq鉴别融合断点等独立的技术方法,对其中一部分结果进行验证,并评估了各技术平台对SV鉴定的性能。文章最终输出了一套SV参考集合,可用于各类SV方法的基准评估。
Abstract:
<jats:title>Abstract</jats:title><jats:sec> <jats:title>Background</jats:title> <jats:p>The cancer genome is commonly altered with thousands of structural rearrangements including insertions, deletions, translocation, inversions, duplications, and copy number variations. Thus, structural variant (SV) characterization plays a … >>>
<jats:title>Abstract</jats:title><jats:sec> <jats:title>Background</jats:title> <jats:p>The cancer genome is commonly altered with thousands of structural rearrangements including insertions, deletions, translocation, inversions, duplications, and copy number variations. Thus, structural variant (SV) characterization plays a paramount role in cancer target identification, oncology diagnostics, and personalized medicine. As part of the SEQC2 Consortium effort, the present study established and evaluated a consensus SV call set using a breast cancer reference cell line and matched normal control derived from the same donor, which were used in our companion benchmarking studies as reference samples.</jats:p> </jats:sec><jats:sec> <jats:title>Results</jats:title> <jats:p>We systematically investigated somatic SVs in the reference cancer cell line by comparing to a matched normal cell line using multiple NGS platforms including Illumina short-read, 10X Genomics linked reads, PacBio long reads, Oxford Nanopore long reads, and high-throughput chromosome conformation capture (Hi-C). We established a consensus SV call set of a total of 1788 SVs including 717 deletions, 230 duplications, 551 insertions, 133 inversions, 146 translocations, and 11 breakends for the reference cancer cell line. To independently evaluate and cross-validate the accuracy of our consensus SV call set, we used orthogonal methods including PCR-based validation, Affymetrix arrays, Bionano optical mapping, and identification of fusion genes detected from RNA-seq. We evaluated the strengths and weaknesses of each NGS technology for SV determination, and our findings provide an actionable guide to improve cancer genome SV detection sensitivity and accuracy.</jats:p> </jats:sec><jats:sec> <jats:title>Conclusions</jats:title> <jats:p>A high-confidence consensus SV call set was established for the reference cancer cell line. A large subset of the variants identified was validated by multiple orthogonal methods.</jats:p> </jats:sec> <<<
翻译
回到顶部