徐炳祥
(2026-02-28 20:05):
#paper doi: 10.1038/s41592-023-02139-9 nature methods, 2024, A fast, scalable and versatile tool for analysis of single-cell omics data。本文介绍了一种端到端的单细胞数据分析工具snapATAC2,其核心创新在于提出了一种不基于距离的谱嵌入算法,通过Lanczos方法隐式计算拉普拉斯矩阵的特征向量,彻底避免了传统谱嵌入需要构建细胞-细胞相似性矩阵的内存瓶颈,从而实现了与细胞数量线性的时间和空间复杂度。该算法在大量合成与真实数据集上展现出卓越性能且天然支持scRNA-seq、scHi-C、单细胞DNA甲基化等单细胞多组学数据的联合嵌入,在准确性、鲁棒性和可扩展性上全面超越现有线性/非线性方法,且无需GPU、无需繁琐的超参数调优。
Nature Methods,
2024-2.
DOI: 10.1038/s41592-023-02139-9
A fast, scalable and versatile tool for analysis of single-cell omics data
翻译
Abstract:
AbstractSingle-cell omics technologies have revolutionized the study of gene regulation in complex tissues. A major computational challenge in analyzing these datasets is to project the large-scale and high-dimensional data into low-dimensional space while retaining the relative relationships between cells. This low dimension embedding is necessary to decompose cellular heterogeneity and reconstruct cell-type-specific gene regulatory programs. Traditional dimensionality reduction techniques, however, face challenges in computational efficiency and in comprehensively addressing cellular diversity across varied molecular modalities. Here we introduce a nonlinear dimensionality reduction algorithm, embodied in the Python package SnapATAC2, which not only achieves a more precise capture of single-cell omics data heterogeneities but also ensures efficient runtime and memory usage, scaling linearly with the number of cells. Our algorithm demonstrates exceptional performance, scalability and versatility across diverse single-cell omics datasets, including single-cell assay for transposase-accessible chromatin using sequencing, single-cell RNA sequencing, single-cell Hi-C and single-cell multi-omics datasets, underscoring its utility in advancing single-cell analysis.
翻译
Related Links: