来自杂志 Nature biotechnology 的文献。
当前共找到 14 篇文献分享。
1.
翁凯 (2024-05-31 22:29):
#paper doi: 10.1038/s41587-021-01033-z. Differential abundance testing on single-cell data using k-nearest neighbor graphs. 这个研究跳出了对细胞分群的框架,而是从一个细胞的邻居入手,比较组间的细胞比例差异
IF:33.100Q1 Nature biotechnology, 2022-02. DOI: 10.1038/s41587-021-01033-z PMID: 34594043
Abstract:
Current computational workflows for comparative analyses of single-cell datasets typically use discrete clusters as input when testing for differential abundance among experimental conditions. However, clusters do not always provide the … >>>
Current computational workflows for comparative analyses of single-cell datasets typically use discrete clusters as input when testing for differential abundance among experimental conditions. However, clusters do not always provide the appropriate resolution and cannot capture continuous trajectories. Here we present Milo, a scalable statistical framework that performs differential abundance testing by assigning cells to partially overlapping neighborhoods on a k-nearest neighbor graph. Using simulations and single-cell RNA sequencing (scRNA-seq) data, we show that Milo can identify perturbations that are obscured by discretizing cells into clusters, that it maintains false discovery rate control across batch effects and that it outperforms alternative differential abundance testing strategies. Milo identifies the decline of a fate-biased epithelial precursor in the aging mouse thymus and identifies perturbations to multiple lineages in human cirrhotic liver. As Milo is based on a cell-cell similarity structure, it might also be applicable to single-cell data other than scRNA-seq. Milo is provided as an open-source R software package at https://github.com/MarioniLab/miloR . <<<
翻译
2.
盼盼 (2024-04-30 22:50):
#paper doi:https://doi.org/10.1038/s41587-021-00830-w Robust decomposition of cell type mixtures in spatial transcriptomics 空间转录组学技术的局限在于,每个spot的测序数值可能来自于不同细胞的贡献,这样不利于细胞特异性和空间表达模式特异性的挖掘。本篇文章的作者开发了一个稳定性比较高的软件:RCTD,它利用从单细胞数据中细胞特异性谱图的表达水平,预测每个spot中细胞类型,并计算出每种细胞的权重。RCTD计算了小鼠Slide-seq跟visium数据集中准确的再现了已知的细胞类型和亚型细胞定位模式。不过这个方法结果的可靠程度依赖于注释好的单细胞数据集的质量,因此选择质量好的单细胞数据集,或者细胞注释准确度高的与空间数据匹配好的单细胞数据集是非常重要的。选择RCTD对空间数据spot的细胞类型的空间成分,揭示生物组织中细胞组织的新原理。
IF:33.100Q1 Nature biotechnology, 2022-04. DOI: 10.1038/s41587-021-00830-w PMID: 33603203
Abstract:
A limitation of spatial transcriptomics technologies is that individual measurements may contain contributions from multiple cells, hindering the discovery of cell-type-specific spatial patterns of localization and expression. Here, we develop … >>>
A limitation of spatial transcriptomics technologies is that individual measurements may contain contributions from multiple cells, hindering the discovery of cell-type-specific spatial patterns of localization and expression. Here, we develop robust cell type decomposition (RCTD), a computational method that leverages cell type profiles learned from single-cell RNA-seq to decompose cell type mixtures while correcting for differences across sequencing technologies. We demonstrate the ability of RCTD to detect mixtures and identify cell types on simulated datasets. Furthermore, RCTD accurately reproduces known cell type and subtype localization patterns in Slide-seq and Visium datasets of the mouse brain. Finally, we show how RCTD's recovery of cell type localization enables the discovery of genes within a cell type whose expression depends on spatial environment. Spatial mapping of cell types with RCTD enables the spatial components of cellular identity to be defined, uncovering new principles of cellular organization in biological tissue. RCTD is publicly available as an open-source R package at https://github.com/dmcable/RCTD . <<<
翻译
3.
cellsarts (2023-04-30 23:11):
#paper SignalP 6.0使用蛋白质语言模型预测所有五种类型的信号肽https://doi.org/10.1038/s41587-021-01156-3 信号肽(SPs)是控制所有生物体中蛋白质分泌和转运的短氨基酸序列。SPs可以从序列数据中预测,但现有算法无法检测到所有已知类型的SPs。我们介绍了SignalP 6.0,这是一个机器学习模型,可以检测所有五种SP类型,并适用于宏基因组数据。SPs是一种短的n端氨基酸序列,在真核生物中将蛋白定向到分泌(Sec)途径,并在原核生物中跨血浆(内)膜进行转运。由于SPs的综合实验鉴定是不现实的,因此SPs的计算预测与细胞生物学的研究具有很高的相关性。SP预测工具能够识别遵循一般分泌或双精氨酸易位(Tat)途径的蛋白质,并预测信号肽酶(SPase)在序列中切割sp2,3的位置。SignalP 5.0能够预测SPase I (Sec/SPI)或SPase II (Sec/ SPII,原核脂蛋白)切割的Sec底物和SPase I (Tat/SPI)切割的Tat底物4。然而,由于缺乏注释数据,SignalP 5.0无法检测由SPase II切割的Tat底物或由SPase III (prepilin peptide ase,有时称为SPase IV2)加工的Sec底物。此类Sec/SPIII SPs控制IV型匹林样蛋白的易位,而IV型匹林样蛋白在原核生物的粘附、运动和DNA摄取中起关键作用。此外,SignalP 5.0对SP结构是不可知的,因为它不能定义构成SP生物学功能的子区(n端n区、疏水h区和c端c区)。 在这里,我们提出了基于蛋白质语言模型(LMs) 6-9的SignalP 6.0,该模型使用了来自生命所有领域数百万未注释的蛋白质序列的信息。LMs创建捕获其生物特性和结构的蛋白质的语义表示。使用这些蛋白质表示,SignalP 6.0可以预测以前版本无法检测到的其他类型的SPs,同时更好地推断与用于创建模型的蛋白质和来源未知的宏基因组数据有远亲性的蛋白质。此外,它还能够确定SPs的分区域.
IF:33.100Q1 Nature biotechnology, 2022-07. DOI: 10.1038/s41587-021-01156-3 PMID: 34980915
Abstract:
Signal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable … >>>
Signal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data. <<<
翻译
4.
James (2023-04-21 10:41):
#paper Ali Madani, Ben Krause, Eric R Greene, Subu Subramanian, Benjamin P Mohr, James M Holton, Jose Luis Olmos Jr, Caiming Xiong, Zachary Z Sun, Richard Socher, James S Fraser, Nikhil Naik Large language models generate functional protein sequences across diverse families PMID: 36702895 DOI: 10.1038/s41587-022-01618-2。 文章通过对超过1万9千个家族的2.8亿条蛋白序列进行训练从而构建 和LLM类似的深度学习模型 ProGen。其可以进一步微调到精选的序列和标签,以提高来自具有足够同源样本的家族的蛋白质的可控生成性能。针对五个不同的溶菌酶家族进行微调的人工蛋白质显示出与天然溶菌酶相似的催化效率,且与天然蛋白质的序列同一性只有 31.4%。就在论文登上Nature Biotechnology的同一天,由论文第一作者Ali Madani创办的公司Profluent Bio宣布获得由Insight Partners领投的900万美元种子轮融资。该笔融资的将用于在加利福尼亚州伯克利建立一个湿实验室,使Profluent能够在通过实验方法产生的数据与其AI系统之间创建一个紧密的反馈循环,为设计任何蛋白质提供强大的验证,并不断改进他们的AI。
IF:33.100Q1 Nature biotechnology, 2023-08. DOI: 10.1038/s41587-022-01618-2 PMID: 36702895
Abstract:
Deep-learning language models have shown promise in various biotechnological applications, including protein design and engineering. Here we describe ProGen, a language model that can generate protein sequences with a predictable … >>>
Deep-learning language models have shown promise in various biotechnological applications, including protein design and engineering. Here we describe ProGen, a language model that can generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics. The model was trained on 280 million protein sequences from >19,000 families and is augmented with control tags specifying protein properties. ProGen can be further fine-tuned to curated sequences and tags to improve controllable generation performance of proteins from families with sufficient homologous samples. Artificial proteins fine-tuned to five distinct lysozyme families showed similar catalytic efficiencies as natural lysozymes, with sequence identity to natural proteins as low as 31.4%. ProGen is readily adapted to diverse protein families, as we demonstrate with chorismate mutase and malate dehydrogenase. <<<
翻译
5.
白鸟 (2023-02-28 21:23):
#paper doi:https://doi.org/10.1038/s41587-021-00895-7, 2021, Nonvolatile Memory Based on Nonlinear Magnetoelectric Effects. 单细胞多模态检测技术:通过各种实验技术进行多模态检测,即在同一个细胞中同时探测不同的分子特征,在高分辨率下,成千上万的细胞拥有越来越多的分子维度,包括基因组、转录组和表观遗传修饰。虽然没有一个单一的“全能”技术可以完全捕捉到复杂的分子机制,但这些数据有可能提供一个基本的生物过程,有机会从描述性的 "快照 "向对基因调控的机械性理解推进。 意义:单细胞多模态检测技术的发展为研究细胞异质性的多个维度提供了强有力的工具,使我们对发育、组织稳态和疾病有了新的认识。通过结合关于分子层之间层次关系的先验知识(即生物学的中心法则),多模式分析将在识别基因调控网络中事件的因果链方面发挥重要作用。 挑战:设计适当的策略,将不同模式的数据联系起来。术语 "数据整合 "(data integration)被用来描述这项工作,这个定义很广泛,从单个组学数据集的批量校正到染色质可及性和遗传变异与转录的关联。 三种类型的数据整合策略:基因组特征作为锚点(水平整合);细胞为锚(垂直整合);高维空间没有锚点(对角线整合); 展望:回顾了数据整合策略的既定原则、局限性,尽管现有的整合策略利用了类似的数学思想,但它们通常有不同的目标,并依赖于不同的原则和假设。因此,需要新的定义和概念,以使单细胞数据整合技术具有本身的背景性,并能开发新的方法。
IF:33.100Q1 Nature biotechnology, 2021-10. DOI: 10.1038/s41587-021-00895-7 PMID: 33941931
Abstract:
The development of single-cell multimodal assays provides a powerful tool for investigating multiple dimensions of cellular heterogeneity, enabling new insights into development, tissue homeostasis and disease. A key challenge in … >>>
The development of single-cell multimodal assays provides a powerful tool for investigating multiple dimensions of cellular heterogeneity, enabling new insights into development, tissue homeostasis and disease. A key challenge in the analysis of single-cell multimodal data is to devise appropriate strategies for tying together data across different modalities. The term 'data integration' has been used to describe this task, encompassing a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription. Although existing integration strategies exploit similar mathematical ideas, they typically have distinct goals and rely on different principles and assumptions. Consequently, new definitions and concepts are needed to contextualize existing methods and to enable development of new methods. <<<
翻译
6.
笑对人生 (2023-01-31 23:51):
#paper Tan J, et al. Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening. Nat Biotechnol. 2023 Jan 9. doi: 10.1038/s41587-022-01612-8.  在过去,三维基因组学的发展已经极大地拓宽了人们对染色质空间结构和相关构象变化对基因表达的影响。然而,受限于时间和技术成本的原因,针对特定细胞类型类型的染色质重塑事件的研究仍存在巨大挑战。本研究利用7份公开的小鼠和人的Hi-C数据,基于Transformer的多模态深度学习框架,以DNA序列信息、CTCF结合状态和ATAC-seq密度特征(非peak特征)作为输入,二维的Hi-C矩阵作为输出,构建了一个名为C.Origami,具有细胞类型特异性的三维基因组构象变化预测模型。该模型不仅de novo预测特定细胞类型的不同层次的基因组结构,而且还可以预测可能影响染色质构象的DNA元件,以及发现导致疾病发生的染色质重塑调控事件。
IF:33.100Q1 Nature biotechnology, 2023-08. DOI: 10.1038/s41587-022-01612-8 PMID: 36624151
Abstract:
Investigating how chromatin organization determines cell-type-specific gene expression remains challenging. Experimental methods for measuring three-dimensional chromatin organization, such as Hi-C, are costly and have technical limitations, restricting their broad application … >>>
Investigating how chromatin organization determines cell-type-specific gene expression remains challenging. Experimental methods for measuring three-dimensional chromatin organization, such as Hi-C, are costly and have technical limitations, restricting their broad application particularly in high-throughput genetic perturbations. We present C.Origami, a multimodal deep neural network that performs de novo prediction of cell-type-specific chromatin organization using DNA sequence and two cell-type-specific genomic features-CTCF binding and chromatin accessibility. C.Origami enables in silico experiments to examine the impact of genetic changes on chromatin interactions. We further developed an in silico genetic screening approach to assess how individual DNA elements may contribute to chromatin organization and to identify putative cell-type-specific trans-acting regulators that collectively determine chromatin architecture. Applying this approach to leukemia cells and normal T cells, we demonstrate that cell-type-specific in silico genetic screening, enabled by C.Origami, can be used to systematically discover novel chromatin regulation circuits in both normal and disease-related biological systems. <<<
翻译
7.
白鸟 (2022-10-27 09:36):
#paper doi:#paper doi:https://doi.org/10.1038/s41587-022-01468-y Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes. 单细胞转录组体细胞拷贝数变异的单倍型感知分析 基因组不稳定性和转录程序的异常改变都在癌症中发挥重要作用。单细胞 RNA 测序 (scRNA-seq) 在一次检测中能够同时研究肿瘤异质性的遗传和非遗传来源。虽然有许多工具可以从外显子组和全基因组测序数据中识别CNV,针对单细胞RNA-seq数据中检测CNV的方法非常稀缺。常用的inferCNV和copyKAT都只是利用转录组的基因表达信息进行CNV推断。最近,哈佛医学院的研究者提出了一种计算方法,Numbat,它将基于群体的定相(population-based phasing)获得的单倍型信息与等位基因和表达信号相结合,能准确推断单个细胞中的等位基因特异性CNV并重建它们的谱系关系。也就是说它通过基因表达和等位基因两个证据链,进行联合推断,避免CNV推断误判。Numbat利用亚克隆之间的进化关系来迭代推断单细胞拷贝数分布和肿瘤克隆系统发育。比其他工具进行基准测试,对包括多发性骨髓瘤、胃癌、乳腺癌和甲状腺癌在内的 22 个肿瘤样本的分析表明,Numbat可以重建肿瘤拷贝数分布,并准确识别肿瘤微环境中的恶性细胞。Numbat 不需要样本匹配的 DNA 数据,也不需要先验基因分型,适用于广泛的实验环境和癌症类型。总之,Numbat 可以扩展单细胞RNA-seq数据来探测细胞的CNV景观以及转录组景观。需要思考的是我们可能需要更多不同遗传背景的人群定相单倍型信息来辅助推断。另外,肿瘤基线倍性估计仍是拷贝数分析中的有挑战性的问题。
IF:33.100Q1 Nature biotechnology, 2023-03. DOI: 10.1038/s41587-022-01468-y PMID: 36163550
Abstract:
Genome instability and aberrant alterations of transcriptional programs both play important roles in cancer. Single-cell RNA sequencing (scRNA-seq) has the potential to investigate both genetic and nongenetic sources of tumor … >>>
Genome instability and aberrant alterations of transcriptional programs both play important roles in cancer. Single-cell RNA sequencing (scRNA-seq) has the potential to investigate both genetic and nongenetic sources of tumor heterogeneity in a single assay. Here we present a computational method, Numbat, that integrates haplotype information obtained from population-based phasing with allele and expression signals to enhance detection of copy number variations from scRNA-seq. Numbat exploits the evolutionary relationships between subclones to iteratively infer single-cell copy number profiles and tumor clonal phylogeny. Analysis of 22 tumor samples, including multiple myeloma, gastric, breast and thyroid cancers, shows that Numbat can reconstruct the tumor copy number profile and precisely identify malignant cells in the tumor microenvironment. We identify genetic subpopulations with transcriptional signatures relevant to tumor progression and therapy resistance. Numbat requires neither sample-matched DNA data nor a priori genotyping, and is applicable to a wide range of experimental settings and cancer types. <<<
翻译
8.
张贝 (2022-09-30 18:50):
#paper doi: 10.1038/s41587-020-0548-6. PICRUSt2 for prediction of metagenome functions. Nat Biotechnol. 2020. PICRUSt2是一款基于标记基因序列(通常为16S rRNA)来预测宏基因组功能丰度的软件,本文自2020年发表以来,已被引用近1200次(Google Scholar)。PICRUSt2在原有PICRUSt1版本的基础上进行升级,更新了基因家族和参考基因组数据库(扩大10倍以上),可与任何OTU选择或去噪算法的互操作,并能够进行表型预测。Benchmarking结果表明,PICRUSt2总体上比PICRUSt和其他竞争方法更准确。同时,PICRUSt2还允许添加自定义参考数据库。
9.
徐炳祥 (2022-07-27 21:51):
#paper International Conference on Learning Representations, 2020, Hyper-SAGNN: a self-attention based graph neural network for hypergraphs. 对具有高阶连接的超图进行图表示学习是提取很多现实问题中有用模式的必经步骤,然而当前(2020)的超图表示学习算法均无法很好处理超边大小不一致的超图。本文作者基于自注意力思想设计了一种称为Hyper-SAGNN的图神经网络结构,很好的处理了有可变超边大小的超图网络学习问题。此网络架构首先使用一单层神经网络将输入特征映射为“静态嵌入”,然后使用Multi-heat attention结构将位于同一超边内的节点映射为“动态嵌入”,进而使用Hadamard积刻画“静态表示”和“动态表示”的相似性,结果传入一单层神经网络,最终预测超边存在的概率。模型在通用测试数据集上均有比当时通行模型更好的表现,同时在单细胞Hi-C数据的表示和细胞分类问题中也有上佳表现。2022年,他们在Nature biotechnology上发表了基于此网络结构的单细胞Hi-C数据表示方法Higashi(doi: 10.1038/s41587-021-01034-y)
IF:33.100Q1 Nature biotechnology, 2022-02. DOI: 10.1038/s41587-021-01034-y PMID: 34635838 PMCID:PMC8843812
Abstract:
Single-cell Hi-C (scHi-C) can identify cell-to-cell variability of three-dimensional (3D) chromatin organization, but the sparseness of measured interactions poses an analysis challenge. Here we report Higashi, an algorithm based on … >>>
Single-cell Hi-C (scHi-C) can identify cell-to-cell variability of three-dimensional (3D) chromatin organization, but the sparseness of measured interactions poses an analysis challenge. Here we report Higashi, an algorithm based on hypergraph representation learning that can incorporate the latent correlations among single cells to enhance overall imputation of contact maps. Higashi outperforms existing methods for embedding and imputation of scHi-C data and is able to identify multiscale 3D genome features in single cells, such as compartmentalization and TAD-like domain boundaries, allowing refined delineation of their cell-to-cell variability. Moreover, Higashi can incorporate epigenomic signals jointly profiled in the same cell into the hypergraph representation learning framework, as compared to separate analysis of two modalities, leading to improved embeddings for single-nucleus methyl-3C data. In an scHi-C dataset from human prefrontal cortex, Higashi identifies connections between 3D genome features and cell-type-specific gene regulation. Higashi can also potentially be extended to analyze single-cell multiway chromatin interactions and other multimodal single-cell omics data. <<<
翻译
10.
颜林林 (2022-06-24 21:32):
#paper doi:10.1038/s41587-022-01294-2 Nature Biotechnology, 2022, The clinical progress of mRNA vaccines and immunotherapies. 这是一篇关于mRNA疫苗的长篇综述。使用mRNA作为载体开发疫苗的概念,始于1990年,它通过借用接种者身体内的蛋白质翻译机制来产生靶蛋白,而非直接注射(灭活或减活)病原体或靶蛋白本身。这种方式带来一系列优点,诸如设计简便、固有免疫原性、可快速量产等。当然,它也存在诸如稳定性差、疫苗在体内递送至目标位置困难等缺点或挑战。在新冠疫情爆发以来的这三年里,借着大量资金投入增加、紧急使用授权等机会,mRNA疫苗的研发及投产使用得到了极大加速。本文对这些发展,包括给药递送方法,针对传染病的疫苗研发、使用及优化,针对癌症治疗的疫苗方法,mRNA疫苗在蛋白质和细胞免疫治疗中的使用等,都做了比较详细的综述介绍,并据此讨论了当前存在的问题和未来研发方向。通篇读下来,能对mRNA疫苗及其技术路线形成比较深入的了解,也确实能体会到这是个潜力巨大、值得探索和继续研发的重要技术体系。
IF:33.100Q1 Nature biotechnology, 2022-06. DOI: 10.1038/s41587-022-01294-2 PMID: 35534554
Abstract:
The emergency use authorizations (EUAs) of two mRNA-based severe acute respiratory syndrome coronavirus (SARS-CoV)-2 vaccines approximately 11 months after publication of the viral sequence highlights the transformative potential of this … >>>
The emergency use authorizations (EUAs) of two mRNA-based severe acute respiratory syndrome coronavirus (SARS-CoV)-2 vaccines approximately 11 months after publication of the viral sequence highlights the transformative potential of this nucleic acid technology. Most clinical applications of mRNA to date have focused on vaccines for infectious disease and cancer for which low doses, low protein expression and local delivery can be effective because of the inherent immunostimulatory properties of some mRNA species and formulations. In addition, work on mRNA-encoded protein or cellular immunotherapies has also begun, for which minimal immune stimulation, high protein expression in target cells and tissues, and the need for repeated administration have led to additional manufacturing and formulation challenges for clinical translation. Building on this momentum, the past year has seen clinical progress with second-generation coronavirus disease 2019 (COVID-19) vaccines, Omicron-specific boosters and vaccines against seasonal influenza, Epstein-Barr virus, human immunodeficiency virus (HIV) and cancer. Here we review the clinical progress of mRNA therapy as well as provide an overview and future outlook of the transformative technology behind these mRNA-based drugs. <<<
翻译
11.
小擎子 (2022-05-31 23:21):
#paper doi:10.1038/nbt.2579 Nat Biotechnol., 2013, Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes。用新的差异覆盖分箱方法从环境宏基因组中恢复多个低丰度的物种基因组,为从宏基因组中恢复质量好的MAG提供方法,数据和方法资料齐全,值得参考。
IF:33.100Q1 Nature biotechnology, 2013-Jun. DOI: 10.1038/nbt.2579 PMID: 23707974
Abstract:
Reference genomes are required to understand the diverse roles of microorganisms in ecology, evolution, human and animal health, but most species remain uncultured. Here we present a sequence composition-independent approach … >>>
Reference genomes are required to understand the diverse roles of microorganisms in ecology, evolution, human and animal health, but most species remain uncultured. Here we present a sequence composition-independent approach to recover high-quality microbial genomes from deeply sequenced metagenomes. Multiple metagenomes of the same community, which differ in relative population abundances, were used to assemble 31 bacterial genomes, including rare (<1% relative abundance) species, from an activated sludge bioreactor. Twelve genomes were assembled into complete or near-complete chromosomes. Four belong to the candidate bacterial phylum TM7 and represent the most complete genomes for this phylum to date (relative abundances, 0.06-1.58%). Reanalysis of published metagenomes reveals that differential coverage binning facilitates recovery of more complete and higher fidelity genome bins than other currently used methods, which are primarily based on sequence composition. This approach will be an important addition to the standard metagenome toolbox and greatly improve access to genomes of uncultured microorganisms. <<<
翻译
12.
十年 (2022-03-25 12:29):
#paper 10.1038/s41587-020-0740-8 Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. 代谢组学数据处理中代谢物识别又一工具,归属于SIRIUS,主要还是碎片树策略。这次用DNN的方法做的模型,交叉验证准确率号称高达99.7%。质谱碎片预测这个东西,很多大佬都在做,但是准确率一直没有想象中的那么高,这几年借着机器学习的风口,希望能做的更好。
IF:33.100Q1 Nature biotechnology, 2021-04. DOI: 10.1038/s41587-020-0740-8 PMID: 33230292
Abstract:
Metabolomics using nontargeted tandem mass spectrometry can detect thousands of molecules in a biological sample. However, structural molecule annotation is limited to structures present in libraries or databases, restricting analysis … >>>
Metabolomics using nontargeted tandem mass spectrometry can detect thousands of molecules in a biological sample. However, structural molecule annotation is limited to structures present in libraries or databases, restricting analysis and interpretation of experimental data. Here we describe CANOPUS (class assignment and ontology prediction using mass spectrometry), a computational tool for systematic compound class annotation. CANOPUS uses a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes. CANOPUS explicitly targets compounds for which neither spectral nor structural reference data are available and predicts classes lacking tandem mass spectrometry training data. In evaluation using reference data, CANOPUS reached very high prediction performance (average accuracy of 99.7% in cross-validation) and outperformed four baseline methods. We demonstrate the broad utility of CANOPUS by investigating the effect of microbial colonization in the mouse digestive system, through analysis of the chemodiversity of different Euphorbia plants and regarding the discovery of a marine natural product, revealing biological insights at the compound class level. <<<
翻译
13.
笑对人生 (2022-02-28 19:28):
#paper Cell types of origin of the cell-free transcriptome. Nat Biotechnol. 2022 Feb 7. doi: 10.1038/s41587-021-01188-9 2017年卢煜明教授曾联合单细胞转录组测序技术和血浆cfRNA测序全面解析了人类胚胎细胞的异质性,同时也首次发现基于cfRNA可以发现先兆子痫胎盘中绒毛滋养层细胞的功能异常。基于cfRNA的液体活检技术目前已证实能用于追溯不同的器官来源,然而,血浆cfRNA是否可以推断不同组织来源的细胞类型及其病理状态呢?该研究利用2021年公布的人类泛组织单细胞转录组图谱数据Tabula Sapiens、HPA转录组数据、GTEx、以及4个cfRNA测序数据集,证实了血浆cfRNA可用于检测多种疾病的健康人和病人之间细胞类型特异的病理差异,其中最易于预测的是来源于脑、肺、肠、肝以及肾的细胞。同时,也发现cfRNA主要的贡献细胞是免疫细胞和造血细胞。有趣的是,在该文章中用到一个经济学指标-基尼系数来衡量一个基因是否是细胞类型特异的。总的来说,这是单细胞转录组测序在临床应用中的一个很好实践。
IF:33.100Q1 Nature biotechnology, 2022-06. DOI: 10.1038/s41587-021-01188-9 PMID: 35132263 PMCID:PMC9200634
Abstract:
Cell-free RNA from liquid biopsies can be analyzed to determine disease tissue of origin. We extend this concept to identify cell types of origin using the Tabula Sapiens transcriptomic cell … >>>
Cell-free RNA from liquid biopsies can be analyzed to determine disease tissue of origin. We extend this concept to identify cell types of origin using the Tabula Sapiens transcriptomic cell atlas as well as individual tissue transcriptomic cell atlases in combination with the Human Protein Atlas RNA consensus dataset. We define cell type signature scores, which allow the inference of cell types that contribute to cell-free RNA for a variety of diseases. <<<
翻译
14.
小W (2022-02-22 18:02):
#paper doi 10.1038 : A knowledge graph to interpret clinical proteomics data. Nat Biotechnol (2022). https://doi.org/10.1038/s41587-021-01145-6这篇文章发布了一个临床知识图谱 (CKG),这是一个开源平台,目前包含近 2000 万个节点和 2.2 亿个关系,包括相关的实验数据、公共数据库和文献。CKG 图结构提供了一个灵活的数据模型,当新数据库可用时,该模型很容易扩展到新节点和关系。CKG 结合了统计和机器学习算法,可加速典型蛋白质组学工作流程的分析和解释。CKG 在 21 年初的时候就已经开源相关代码和数据库文件,当时我测试了相关的分析脚本还有蛮大问题,发表文章后又有一些新的不成熟的看法。另外一个阿斯利康的图谱文章写得对生信还蛮有收获。doi 10.1101 Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development​
IF:33.100Q1 Nature biotechnology, 2022-05. DOI: 10.1038/s41587-021-01145-6 PMID: 35102292
Abstract:
Implementing precision medicine hinges on the integration of omics data, such as proteomics, into the clinical decision-making process, but the quantity and diversity of biomedical data, and the spread of … >>>
Implementing precision medicine hinges on the integration of omics data, such as proteomics, into the clinical decision-making process, but the quantity and diversity of biomedical data, and the spread of clinically relevant knowledge across multiple biomedical databases and publications, pose a challenge to data integration. Here we present the Clinical Knowledge Graph (CKG), an open-source platform currently comprising close to 20 million nodes and 220 million relationships that represent relevant experimental data, public databases and literature. The graph structure provides a flexible data model that is easily extendable to new nodes and relationships as new databases become available. The CKG incorporates statistical and machine learning algorithms that accelerate the analysis and interpretation of typical proteomics workflows. Using a set of proof-of-concept biomarker studies, we show how the CKG might augment and enrich proteomics data and help inform clinical decision-making. <<<
翻译
回到顶部