颜林林
(2022-07-18 06:00):
#paper doi:10.1101/2022.07.14.500036 bioRxiv, 2022, Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics. 单细胞转录组测序数据分析中,需要对批次效应影响进行去除。这通常是对原本高维的数据进行降维,使其在更容易反映出数据结构特征的低维空间上,根据批次信息对数据进行矫正。这个过程很容易导致具有生物学意义的数据特征被误伤,而这样的生物学差异正是我们进行单细胞测序所要研究的对象。针对如何去除批次效应影响,以及如何保留生物学相关数据差异,这两个原本互相矛盾的目标,通常被单细胞测序分析工具根据其各自策略原则的不同,会被选取其中之一作为优先目标进行优化。在本文中,作者通过引入一种名为帕累托多任务学习(Pareto MTL)的多目标优化技术,使综合评估并权衡与两者有关的多种不同指标,以获得整体更优的目的。在这个过程中,还基于神经网络方法,提出一种名为交互信息神经估计(Mutual Information Neural Estimation,MINE)的指标,来帮助该平衡点的选取。文章使用了TM-MARROW和MACAQUE-RETINA等公共数据集,对方法进行了评估,并展示了MINE的效果,确实优于常用的MMD方法。
bioRxiv,
2022.
DOI: 10.1101/2022.07.14.500036
Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics
翻译
Abstract:
Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low dimensional latent representations better to understand overall structure in the data. In this work, we build upon scVI, a powerful deep generative model which can learn biologically meaningful latent representations, but which has limited explicit control of batch effects. Rather than prioritizing batch effect removal over conservation of biological variation, or vice versa, our goal is to provide a bird eye view of the trade-offs between these two conflicting objectives. Specifically, using the well established concept of Pareto front from economics and engineering, we seek to learn the entire trade-off curve between conservation of biological variation and removal of batch effects. A multi-objective optimisation technique known as Pareto multi-task learning (Pareto MTL) is used to obtain the Pareto front between conservation of biological variation and batch effect removal. Our results indicate Pareto MTL can obtain a better Pareto front than the naive scalarization approach typically encountered in the literature. In addition, we propose to measure batch effect by applying a neural-network based estimator called Mutual Information Neural Estimation (MINE) and show benefits over the more standard Maximum Mean Discrepancy (MMD) measure. The Pareto front between conservation of biological variation and batch effect removal is a valuable tool for researchers in computational biology. Our results demonstrate the efficacy of applying Pareto MTL to estimate the Pareto front in conjunction with applying MINE to measure the batch effect.
翻译