来自杂志 Nucleic acids research 的文献。
当前共找到 7 篇文献分享。
1.
颜林林 (2023-06-24 21:59):
#paper doi:10.1093/nar/gkad526 Nucleic Acids Research, 2023, Precise characterization of somatic complex structural variations from tumor/control paired long-read sequencing data with nanomonsv. 这是一篇生信文章,作者开发了一个工具nanomonsv,基于配对的肿瘤和对照样本的三代测序数据,鉴定构变异(SV)。该程序包括两个模块:Canonical SV module 和 Single breakend SV module,前者采取寻找跨越断点的多条支持reads的策略,后者则先对断点单侧的序列进行合并,再通过soft clip部分去寻找(可能在基因组上缺失或难以判定)的另一侧序列。通过对这两种策略的实现、优化和整合,提高了对SV的鉴定性能。本文在三个肿瘤细胞系样本(及其对应对照样本)的三代数据上,对所开发的工具进行了实测和评估,并使用PCR方法对部分结果进行了验证。此外,本文还对甲基化、重复序列、移动元件、病毒序列整合等序列特性进行了分析,以进一步充实文章的内容。
IF:16.600Q1 Nucleic acids research, 2023-08-11. DOI: 10.1093/nar/gkad526 PMID: 37336583
Abstract:
We present our novel software, nanomonsv, for detecting somatic structural variations (SVs) using tumor and matched control long-read sequencing data with a single-base resolution. The current version of nanomonsv includes … >>>
We present our novel software, nanomonsv, for detecting somatic structural variations (SVs) using tumor and matched control long-read sequencing data with a single-base resolution. The current version of nanomonsv includes two detection modules, Canonical SV module, and Single breakend SV module. Using tumor/control paired long-read sequencing data from three cancer and their matched lymphoblastoid lines, we demonstrate that Canonical SV module can identify somatic SVs that can be captured by short-read technologies with higher precision and recall than existing methods. In addition, we have developed a workflow to classify mobile element insertions while elucidating their in-depth properties, such as 5' truncations, internal inversions, as well as source sites for 3' transductions. Furthermore, Single breakend SV module enables the detection of complex SVs that can only be identified by long-reads, such as SVs involving highly-repetitive centromeric sequences, and LINE1- and virus-mediated rearrangements. In summary, our approaches applied to cancer long-read sequencing data can reveal various features of somatic SVs and will lead to a better understanding of mutational processes and functional consequences of somatic SVs. <<<
翻译
2.
白鸟 (2023-05-30 09:20):
#paper https://doi.org/10.1093/nar/gkaa1027 Open Targets Platform: supporting systematic drug–target identification and prioritisation 1.靶标-疾病知识库: (1)20 个不同数据源的靶标-疾病关系的证据; (2)关键数据集的新证据:全基因组CRISPR敲除筛选数据, GWAS/UK BioBank统计遗传分析证据; (3)已知药物不良信息:上市后药物不良反应的评估,以及有关靶标成药性和安全性的新精选信息; 2.改进证据评分: 改进了证据评分框架以改进靶标识别 3.Open Targets平台开发: 更新10个版本,开发了用户界面和后端技术以提高性能和可用性
IF:16.600Q1 Nucleic acids research, 2021-01-08. DOI: 10.1093/nar/gkaa1027 PMID: 33196847
Abstract:
The Open Targets Platform (https://www.targetvalidation.org/) provides users with a queryable knowledgebase and user interface to aid systematic target identification and prioritisation for drug discovery based upon underlying evidence. It is … >>>
The Open Targets Platform (https://www.targetvalidation.org/) provides users with a queryable knowledgebase and user interface to aid systematic target identification and prioritisation for drug discovery based upon underlying evidence. It is publicly available and the underlying code is open source. Since our last update two years ago, we have had 10 releases to maintain and continuously improve evidence for target-disease relationships from 20 different data sources. In addition, we have integrated new evidence from key datasets, including prioritised targets identified from genome-wide CRISPR knockout screens in 300 cancer models (Project Score), and GWAS/UK BioBank statistical genetic analysis evidence from the Open Targets Genetics Portal. We have evolved our evidence scoring framework to improve target identification. To aid the prioritisation of targets and inform on the potential impact of modulating a given target, we have added evaluation of post-marketing adverse drug reactions and new curated information on target tractability and safety. We have also developed the user interface and backend technologies to improve performance and usability. In this article, we describe the latest enhancements to the Platform, to address the fundamental challenge that developing effective and safe drugs is difficult and expensive. <<<
翻译
3.
颜林林 (2022-07-29 08:21):
#paper doi:10.1093/nar/gkac586 Nucleic Acid Research, 2022, De novo assembly of human genome at single-cell levels. 作者之前开发的一项名为 SMOOTH-seq 的技术,大致原理是:用 Tn5 转座子插入基因组DNA,使其随机片段化,然后用带有 barcode 的引物对片段进行链置换和扩增,再将双链末端分别连入一段序列以成环,进行滚环扩增,得到可供长读长测序的长片段,该长片段上带有多份原始序列片段,因而可以准确校正序列碱基。本文在此基础上进行了改进,使用 PacBio HiFi 和 Oxford Nanopore Technologies(ONT)两种测序平台,对 K562 和 HG002 两个细胞系进行单细胞测序。首次在单细胞水平上完成了具有高连续性的人类基因组组装。其结果包括:95 个 K562 细胞,总测序深度约37x(如果没理解错,应该每个细胞的测序深度为 37/95 = 0.4 x),NG50 约 2 Mb;30 个 HG002 细胞,每个细胞的测序深度约为 1G(相当于是 0.33x),NG50 约 1.3 Mb。按文章摘要的说法“开启了单细胞基因组从头组装实践的新篇章”。这个主题看似创新度很高,仔细推敲却不禁有些疑问:单细胞基因组测序很难区分不同类群细胞,因而应该只能在单细胞水平上分别进行组装,否则大量不同类群细胞混合起来组装,则又失去了原本的立意。但是,单个细胞的基因组覆盖度是不可能很全面的(文章提到平均覆盖率约是 41.7%,我猜提升测序数据量也未必对此会有大幅改善),这又很大程度上会限制组装本身,因而最终只能关注其中的结构变异鉴定结果。此外,单细胞基因组结果其实很难验证,很难用其他细胞的结果来评判当前被测细胞的结果是否准确,这应该也是一个逻辑上的硬伤。所以,最终这篇文章的贡献,除了两个细胞系的单细胞基因组测序数据本身外,大概主要还是在于实验方法摸索优化和技术方法建立吧,当然其数据分析方法过程也是值得参考的。
IF:16.600Q1 Nucleic acids research, 2022-07-22. DOI: 10.1093/nar/gkac586 PMID: 35819189 PMCID:PMC9303314
人类基因组在单细胞水平上的从头组装
Abstract:
Genome assembly has been benefited from long-read sequencing technologies with higher accuracy and higher continuity. However, most human genome assembly require large amount of DNAs from homogeneous cell lines without … >>>
Genome assembly has been benefited from long-read sequencing technologies with higher accuracy and higher continuity. However, most human genome assembly require large amount of DNAs from homogeneous cell lines without keeping cell heterogeneities, since cell heterogeneity could profoundly affect haplotype assembly results. Herein, using single-cell genome long-read sequencing technology (SMOOTH-seq), we have sequenced K562 and HG002 cells on PacBio HiFi and Oxford Nanopore Technologies (ONT) platforms and conducted de novo genome assembly. For the first time, we have completed the human genome assembly with high continuity (with NG50 of ∼2 Mb using 95 individual K562 cells) at single-cell levels, and explored the impact of different assemblers and sequencing strategies on genome assembly. With sequencing data from 30 diploid individual HG002 cells of relatively high genome coverage (average coverage ∼41.7%) on ONT platform, the NG50 can reach over 1.3 Mb. Furthermore, with the assembled genome from K562 single-cell dataset, more complete and accurate set of insertion events and complex structural variations could be identified. This study opened a new chapter on the practice of single-cell genome de novo assembly. <<<
翻译
4.
尹志 (2022-06-28 22:16):
#paper doi:10.1093/nar/gkac010 Nucleic Acids Research, Volume 50, Issue 8, 6 May 2022, AggMapNet: enhanced and explainable low-sample omics deep learning with feature-aggregated multi-channel networks 基于组学的生物医学数据的学习,通常依赖于高维特征及小样本,而这对于目前的深度学习主流方法而言则是一项挑战。本文首先提出了一种无监督的特征聚合技术AggMap,其作用是基于组学特征的内在固有关联,将组学特征聚合并映射为多通道的二维空间关联特征图(Fmaps)。AggMap在基准数据集上,相较于现有的算法,具有很强的特征重构能力;接着,文章利用AggMap的多通道Fmap作为输入,通过构建多通道深度学习模型AggMapNet,在18个小样本组学基准数据集上取得超过SOTA的性能。而且AggMapNet在噪声数据和疾病分类的问题上展现了良好的鲁棒性。另外,在可解释性方面,AggMapNet的的解释性模块Simply-explainer可以识别COVID19的检测和严重性预测的关键代谢分子和蛋白。 总体上看,文章提出了一个组学小样本数据建模的pipeline:通过无监督算法AggMap的特征重构能力+基于监督信息的可解释的AggMapNet深度学习模型。 几点启发:这个工作将小样本组学数据通过一个pipeline完成学习,我们可以将这个pipeline理解为特征重表示(AggMap)+DL网络(AggMapNet)。我们看到,这个过程不是端到端的,而是充分利用了对特征的重表示,挖掘新的特征空间的表征能力。有点返璞归真的意思,但又考虑到高维性质,不容易手工构造特征,因此在特征部分,用到了很多无监督聚类的方法,比如利用了基于pairwise关联距离的流形学习方法UMAP将组学数据点嵌入二维空间,同时,通过团聚层级聚类方法将组学数据点团聚为多特征簇。有趣的是,这几类方法是已有的通用的无监督算法。感觉基于流形的这类聚类算法,能很好的在保度规的情况下达到降维的效果,提取有效特征,为下游任务服务。对于小样本而言,这类方法的效果似乎是比较不错的。那么一个想法是,能不能利用生成的方式,合成数据,然后learning的方式去构建这个embedding表示,再去做下游任务?有点想试试看,不过考虑到在18个基准数据集上做pk,多少有点心累
IF:16.600Q1 Nucleic acids research, 2022-05-06. DOI: 10.1093/nar/gkac010 PMID: 35100418
Abstract:
Omics-based biomedical learning frequently relies on data of high-dimensions (up to thousands) and low-sample sizes (dozens to hundreds), which challenges efficient deep learning (DL) algorithms, particularly for low-sample omics investigations. … >>>
Omics-based biomedical learning frequently relies on data of high-dimensions (up to thousands) and low-sample sizes (dozens to hundreds), which challenges efficient deep learning (DL) algorithms, particularly for low-sample omics investigations. Here, an unsupervised novel feature aggregation tool AggMap was developed to Aggregate and Map omics features into multi-channel 2D spatial-correlated image-like feature maps (Fmaps) based on their intrinsic correlations. AggMap exhibits strong feature reconstruction capabilities on a randomized benchmark dataset, outperforming existing methods. With AggMap multi-channel Fmaps as inputs, newly-developed multi-channel DL AggMapNet models outperformed the state-of-the-art machine learning models on 18 low-sample omics benchmark tasks. AggMapNet exhibited better robustness in learning noisy data and disease classification. The AggMapNet explainable module Simply-explainer identified key metabolites and proteins for COVID-19 detections and severity predictions. The unsupervised AggMap algorithm of good feature restructuring abilities combined with supervised explainable AggMapNet architecture establish a pipeline for enhanced learning and interpretability of low-sample omics data. <<<
翻译
5.
吴增丁 (2022-05-30 14:45):
#paper DOI: 10.1093/nar/gkaa379 分享这篇2020年发表在NAR上的文章:NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data。该文章是丹麦科技大学对NetMHCpan系列预测系列软件的更新。人体免疫系统工作的一个很重要的工作原理是:细胞通过组织相容性复合体MHC将细胞内被蛋白酶降解的多肽呈递到细胞表面,从而被T细胞识别,进而激发免疫级联反应。按照呈递抗原表位的来源可将MHC分类为 呈递内源性多肽的MHCI 和呈递外源性多肽的MHCII。现在随着肿瘤免疫治疗的兴起,在治疗性疫苗设计中,关于抗原序列的设计是非常关键。然而设计的抗原是否真的有免疫反应?这个抗原呈递的预测就非常关键,这也是本文章要不断打磨提升抗原呈递算法的核心驱动力。 本文章的相对上一版本的提升之处有两点:1.改进了机器学习的framework,将之前的核心框架NNAlign提升为NNAlign_MA,即更加适应了质谱的训练数据;2.扩大了训练数据集,并且对数据进行了更新标签。做了这些更新后,在性能上相比上一版本及其他类似软件,都获得了更有的PPV.
IF:16.600Q1 Nucleic acids research, 2020-07-02. DOI: 10.1093/nar/gkaa379 PMID: 32406916 PMCID:PMC7319546
Abstract:
Major histocompatibility complex (MHC) molecules are expressed on the cell surface, where they present peptides to T cells, which gives them a key role in the development of T-cell immune … >>>
Major histocompatibility complex (MHC) molecules are expressed on the cell surface, where they present peptides to T cells, which gives them a key role in the development of T-cell immune responses. MHC molecules come in two main variants: MHC Class I (MHC-I) and MHC Class II (MHC-II). MHC-I predominantly present peptides derived from intracellular proteins, whereas MHC-II predominantly presents peptides from extracellular proteins. In both cases, the binding between MHC and antigenic peptides is the most selective step in the antigen presentation pathway. Therefore, the prediction of peptide binding to MHC is a powerful utility to predict the possible specificity of a T-cell immune response. Commonly MHC binding prediction tools are trained on binding affinity or mass spectrometry-eluted ligands. Recent studies have however demonstrated how the integration of both data types can boost predictive performances. Inspired by this, we here present NetMHCpan-4.1 and NetMHCIIpan-4.0, two web servers created to predict binding between peptides and MHC-I and MHC-II, respectively. Both methods exploit tailored machine learning strategies to integrate different training data types, resulting in state-of-the-art performance and outperforming their competitors. The servers are available at http://www.cbs.dtu.dk/services/NetMHCpan-4.1/ and http://www.cbs.dtu.dk/services/NetMHCIIpan-4.0/. <<<
翻译
6.
洪媛媛 (2022-05-25 18:31):
#paper doi: 10.1093/nar/gkx345 Nucleic Acids Research, 2017, Vol. 45, No. 13 7655–7665. APOBEC3A efficiently deaminates methylated, but not TET-oxidized, cytosine bases in DNA.推荐理由:这篇文章主要研究了AID/APOBEC家族的human APOBEC3A (A3A)脱氨酶,对不同氧化程度的胞嘧啶核苷酸,包括C, 5mC, 5hmC, 5fC, and 5caC的脱氨能力,还细致研究了DNA底物的序列特征对酶活的影响。研究发现APOBEC3A对C的脱氨能力最强,其次是5mC,对5hmC和5fc的脱氨能力只有C的~5600分之一和~3700分之一,对5caC几乎不起作用。当APOBEC3A酶过量时,所有的C和5mC都能够被脱氨,无论其前后是何种碱基;当酶量不足时,C和5mC -1位的碱基种类对脱氨效果影响最大,其次是C和5mC -2位的碱基种类。
IF:16.600Q1 Nucleic acids research, 2017-Jul-27. DOI: 10.1093/nar/gkx345 PMID: 28472485 PMCID:PMC5570014
Abstract:
AID/APOBEC family enzymes are best known for deaminating cytosine bases to uracil in single-stranded DNA, with characteristic sequence preferences that can produce mutational signatures in targets such as retroviral and … >>>
AID/APOBEC family enzymes are best known for deaminating cytosine bases to uracil in single-stranded DNA, with characteristic sequence preferences that can produce mutational signatures in targets such as retroviral and cancer cell genomes. These deaminases have also been proposed to function in DNA demethylation via deamination of either 5-methylcytosine (mC) or TET-oxidized mC bases (ox-mCs), which include 5-hydroxymethylcytosine, 5-formylcytosine and 5-carboxylcytosine. One specific family member, APOBEC3A (A3A), has been shown to readily deaminate mC, raising the prospect of broader activity on ox-mCs. To investigate this claim, we developed a novel assay that allows for parallel profiling of activity on all modified cytosines. Our steady-state kinetic analysis reveals that A3A discriminates against all ox-mCs by >3700-fold, arguing that ox-mC deamination does not contribute substantially to demethylation. A3A is, by contrast, highly proficient at C/mC deamination. Under conditions of excess enzyme, C/mC bases can be deaminated to completion in long DNA segments, regardless of sequence context. Interestingly, under limiting A3A, the sequence preferences observed with targeting unmodified cytosine are further exaggerated when deaminating mC. Our study informs how methylation, oxidation, and deamination can interplay in the genome and suggests A3A's potential utility as a biotechnological tool to discriminate between cytosine modification states. <<<
翻译
7.
吴增丁 (2022-01-20 17:31):
#paper doi: 10.1093/nar/gkt178,这篇文章是2013年发表在nucleic acids research上的,标题“Translating mRNAs strongly correlate to proteins in a multivariate manner and their translation ratios are phenotype specific” 核心卖点:用一种RNC-seq的方法,证明了RNC-mRNA与蛋白组定量存在显著相关性(R2=0.94) 文章意义:1、尝试探索中心法则中的定量关系:定性上我们都知道DNA到RNA到protein,但是前期研究发现。有些mRNA有表达甚至量也不低,怎么在protein上就没有呢?前期有人尝试用total mRNA 和蛋白质组做相关性,但是结果很不理想。本文作者张弓发现通过RNC-mRNA和 SILAC-based MS 表征的蛋白组相关性,在引入了mRNA-length这个变量后,得到相关系数达到0.94。 2、开发了一个NGS-based 研究方法——RNC-seq (mRNAs bound to ribosome-nascent chain complex) 个人认为第1点意义很大,相当于在RNA层面找到了一个蛋白质组研究的替代方法,这个大大简便了研究,尤其是在转化医学要求检测技术手段越简单操作越好的时代。但是问题来了,为什么这个技术follow的人怎么少呢?
IF:16.600Q1 Nucleic acids research, 2013-May. DOI: 10.1093/nar/gkt178 PMID: 23519614 PMCID:PMC3643591
Abstract:
As a well-known phenomenon, total mRNAs poorly correlate to proteins in their abundances as reported. Recent findings calculated with bivariate models suggested even poorer such correlation, whereas focusing on the … >>>
As a well-known phenomenon, total mRNAs poorly correlate to proteins in their abundances as reported. Recent findings calculated with bivariate models suggested even poorer such correlation, whereas focusing on the translating mRNAs (ribosome nascent-chain complex-bound mRNAs, RNC-mRNAs) subset. In this study, we analysed the relative abundances of mRNAs, RNC-mRNAs and proteins on genome-wide scale, comparing human lung cancer A549 and H1299 cells with normal human bronchial epithelial (HBE) cells, respectively. As discovered, a strong correlation between RNC-mRNAs and proteins in their relative abundances could be established through a multivariate linear model by integrating the mRNA length as a key factor. The R(2) reached 0.94 and 0.97 in A549 versus HBE and H1299 versus HBE comparisons, respectively. This correlation highlighted that the mRNA length significantly contributes to the translational modulation, especially to the translational initiation, favoured by its correlation with the mRNA translation ratio (TR) as observed. We found TR is highly phenotype specific, which was substantiated by both pathway analysis and biased TRs of the splice variants of BDP1 gene, which is a key transcription factor of transfer RNAs. These findings revealed, for the first time, the intrinsic and genome-wide translation modulations at translatomic level in human cells at steady-state, which are tightly correlated to the protein abundance and functionally relevant to cellular phenotypes. <<<
翻译
回到顶部