徐炳祥 (2025-12-31 20:22):
#paper doi: 10.1038/s41467-025-64186-4 Nature communications, 2025, A comprehensive benchmark of single-cell Hi-C embedding tools。低维嵌入是单细胞Hi-C(scHi-C)分析的核心步骤,是分析能否挖掘数据中蕴含的细胞间染色质构象异质性的关键。本文采用了大量公开文献中的算例,结合自行定制的测试程序和评价指标,系统性评价了截至发稿时主流scHi-C嵌入工具的性能。作者指出,当前没有任何一种工具在所有数据集中具有一致的优势。传统的基于随机游走模型的嵌入技术倾向于更多使用长距离香港胡作用信息,而近期发表的基于深度学习的模型则反之。深度学习模型能基于更小的测序深度在更高分辨率下完成嵌入任务。此外,作者最后指出,如mc3C-seq之类多模态单细胞测序方法能更细致的区分彼此相似的细胞类型。本文不仅是对现有单细胞Hi-C嵌入算法的系统总结,更解释了算法性能差异的成因,为应用和后续新算法的开发指明了方向。
A comprehensive benchmark of single-cell Hi-C embedding tools
翻译
Abstract:
Abstract Embedding is the key step in single-cell Hi-C (scHi-C) analysis which relies on capturing biological meaningful heterogeneity at various levels of genome architecture. To understand the strength and limitations of existing tools in various applications, here we use ten scHi-C datasets to benchmark thirteen embedding tools including Va3DE, a new convolutional neural network model that can accommodate large cell numbers. We built a software framework to decouple the preprocessing options of existing tools and found that no single tool works best across all datasets under default settings. The difficulty levels and preferred resolutions are different between benchmark datasets, and the choice of data representation and preprocessing strongly impact the embedding performance. Embedding cells from early embryonic stages relies on long-range compartment-scale contacts, but resolving cell cycle phases and complex tissue requires short-range loop-scale contacts. Both random-walk and inverse document frequency (IDF) transformation prefers long-range “compartment-scale” over short-range “loop-scale” embedding, while deep-learning methods better overcome sparsity at both scales and are more versatile with different resolutions. Finally, “diagonal integration” with independent data modal is a promising approach to distinguish similar cell subpopulations. Our findings underscore the significance of appropriate priors for scHi-C embedding and also offer insights into genome architecture heterogeneity.
翻译
回到顶部