当前共找到 1063 篇文献分享,本页显示第 821 - 840 篇。
821.
muton (2022-07-28 11:58):
#paper DOI: 10.1371/journal.pcbi.1009267 Unveiling functions of the visual cortex using task-specific deep neural networks.人类的视觉感知是一种复杂的认知能力,它是由大脑不同皮层区域控制调节的。然而目前这些区域的确切功能我们了解的仍不完全清楚,进而这些区域是如何协调视觉感知的也没有确切的答案。目前的观点认为视觉信息的转变过程是通过不同功能区域的层次化计算,通常我们概括为这些功能区域为腹侧和背侧视觉通路。无论是发现各个视觉皮层区域的确切功能还是利用计算建模的方法实现这种功能都是具有挑战性的,但也是我们的最终诉求。深度神经网络(DNNs)用于实现建模和预测视觉区域反应的一种较有前景的方法。本文通过比较不同视觉任务中的fMRI数据集与针对不同视觉任务优化过的DNN 模型子集的相关(作者选择了通过Taskonomy数据集训练的18个DNNs模型,这些模型分别对应于室内场景图片理解的18个不同任务的优化)发现了视觉信息沿腹侧和背侧视觉通路的结构化映射。低级视觉任务映射到早期视觉皮层,三维场景感知任务映射到背侧流,语义任务映射到腹侧流。文章的亮点可能就是通过模型和人脑实际数据相似性比较的方法能够得出哪些脑区贡献于哪些任务的这种思路。
Abstract:
The human visual cortex enables visual perception through a cascade of hierarchical computations in cortical regions with distinct functionalities. Here, we introduce an AI-driven approach to discover the functional mapping … >>>
The human visual cortex enables visual perception through a cascade of hierarchical computations in cortical regions with distinct functionalities. Here, we introduce an AI-driven approach to discover the functional mapping of the visual cortex. We related human brain responses to scene images measured with functional MRI (fMRI) systematically to a diverse set of deep neural networks (DNNs) optimized to perform different scene perception tasks. We found a structured mapping between DNN tasks and brain regions along the ventral and dorsal visual streams. Low-level visual tasks mapped onto early brain regions, 3-dimensional scene perception tasks mapped onto the dorsal stream, and semantic tasks mapped onto the ventral stream. This mapping was of high fidelity, with more than 60% of the explainable variance in nine key regions being explained. Together, our results provide a novel functional mapping of the human visual cortex and demonstrate the power of the computational approach. <<<
翻译
822.
前进 (2022-07-28 11:54):
#paper doi: 10.1109/TMI.2019.2953788 Transactions on Medical Imaging 2019 Progressively trained convolutional neural networks for deformable image registration 现有的基于深度学习的配准算法对存在大尺度变形的配准任务经常表现不佳。为了解决这种大尺度变形的问题,现有的方法主要分为两种:1、在配准前先采用传统的方法对图像进行预配准(affine,rigid)2、采用多个网络级联的方式,逐步变形,最终生成大尺度变形配准场。这两种方式都存在一定的弊端:1、传统方法耗时过长,削弱了利用深度学习进行后续配准的优势。2、级联网络在配准图像时,会对浮动图像进行多次插值,插值误差积累将会影响最后的变形场质量。因此论文作者提出只采用一个单独的网络联合渐进式训练方式来进行大尺度变形配准。渐进式训练方式首先是被用来提高GAN生成图像的分辨率,现被作者迁移用来解决配准问题。渐进式训练方式简单解释就是当网络的一层训练收敛以后,添加新层,再进行训练,直到生成最后的变形场。该论文有3点创新: 1、 提出了一个渐进式学习模型,能在同一个卷积网络内学习图像不同尺度的变形。 2、 证明了用神经网络配准两张图之前无需预配准。 3、 证明了神经网络可以采用合成的变形场进行监督训练,最后能够泛化解决实际配准问题。
Abstract:
Deep learning-based methods for deformable image registration are attractive alternatives to conventional registration methods because of their short registration times. However, these methods often fail to estimate larger displacements in … >>>
Deep learning-based methods for deformable image registration are attractive alternatives to conventional registration methods because of their short registration times. However, these methods often fail to estimate larger displacements in complex deformation fields, for which a multi-resolution strategy is required. In this article, we propose to train neural networks progressively to address this problem. Instead of training a large convolutional neural network on the registration task all at once, we initially train smaller versions of the network on lower resolution versions of the images and deformation fields. During training, we progressively expand the network with additional layers that are trained on higher resolution data. We show that this way of training allows a network to learn larger displacements without sacrificing registration accuracy and that the resulting network is less sensitive to large misregistrations compared to training the full network all at once. We generate a large number of ground truth example data by applying random synthetic transformations to a training set of images, and test the network on the problem of intrapatient lung CT registration. We analyze the learned representations in the progressively growing network to assess how the progressive learning strategy influences training. Finally, we show that a progressive training procedure leads to improved registration accuracy when learning large and complex deformations. <<<
翻译
823.
芝麻 (2022-07-28 09:52):
#paper doi: 10.1016/j.tranon.2021.101016. Epub 2021 Jan 16. PMID: 33465745; PMCID: PMC7815805. Machine learning analysis using 77,044 genomic and transcriptomic profiles to accurately predict tumor type. Transl Oncol. 肿瘤转移是肿瘤患者的主要死亡威胁之一,而对一部分转移瘤患者,仅凭形态学观察无法确定肿瘤的原发部位,这样的转移瘤被临床称为原发灶不明转移瘤(Cancer of unknown primary, CUP)因为CUP具有较高的转移侵袭性,且没有可识别的起源部位,医生在选择治疗方案时会有的困扰,因此CUP的精准治疗是肿瘤临床的一个挑战。2021年,Jim Abraham 和同事在超过20000个癌症样本中,结合基因组突变和转录组表达特征两类数据进行基于机器学习的模型训练,并且先后尝试了超过300个不同的机器学习模型,最后在19555个样本的独立验证集中达到了97%的正确率
Abstract:
Cancer of Unknown Primary (CUP) occurs in 3-5% of patients when standard histological diagnostic tests are unable to determine the origin of metastatic cancer. Typically, a CUP diagnosis is treated … >>>
Cancer of Unknown Primary (CUP) occurs in 3-5% of patients when standard histological diagnostic tests are unable to determine the origin of metastatic cancer. Typically, a CUP diagnosis is treated empirically and has very poor outcomes, with median overall survival less than one year. Gene expression profiling alone has been used to identify the tissue of origin but struggles with low neoplastic percentage in metastatic sites which is where identification is often most needed. MI GPSai, a Genomic Prevalence Score, uses DNA sequencing and whole transcriptome data coupled with machine learning to aid in the diagnosis of cancer. The algorithm trained on genomic data from 34,352 cases and genomic and transcriptomic data from 23,137 cases and was validated on 19,555 cases. MI GPSai predicted the tumor type in the labeled data set with an accuracy of over 94% on 93% of cases while deliberating amongst 21 possible categories of cancer. When also considering the second highest prediction, the accuracy increases to 97%. Additionally, MI GPSai rendered a prediction for 71.7% of CUP cases. Pathologist evaluation of discrepancies between submitted diagnosis and MI GPSai predictions resulted in change of diagnosis in 41.3% of the time. MI GPSai provides clinically meaningful information in a large proportion of CUP cases and inclusion of MI GPSai in clinical routine could improve diagnostic fidelity. Moreover, all genomic markers essential for therapy selection are assessed in this assay, maximizing the clinical utility for patients within a single test. <<<
翻译
824.
王昊 (2022-07-28 09:51):
#paper doi:10.48550/arXiv.2207.04630 Yi Ma, Doris Tsao, and Heung-Yeung Shum. 2022. On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence. 作者马毅数学功底很好,和做神经科学的Doris Tsao合作的一篇讲述他们认为的2个重要的AI基本原理的文章。本文提出了一个理解深度神经网络的新框架:压缩闭环转录,并回答了从数据中学习的目标是什么,如何衡量?(信息编码论)以及 如何通过高效和有效的计算实现这样的目标?(控制)这两个问题。提出理解AI的两个基本原理:简约性与自洽性。
Abstract:
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in … >>>
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that address two fundamental questions regarding Intelligence: what to learn and how to learn, respectively. We believe the two principles are the cornerstones for the emergence of Intelligence, artificial or natural. While these two principles have rich classical roots, we argue that they can be stated anew in entirely measurable and computable ways. More specifically, the two principles lead to an effective and efficient computational framework, compressive closed-loop transcription, that unifies and explains the evolution of modern deep networks and many artificial intelligence practices. While we mainly use modeling of visual data as an example, we believe the two principles will unify understanding of broad families of autonomous intelligent systems and provide a framework for understanding the brain. <<<
翻译
825.
李欣 (2022-07-28 09:45):
#paperFertil Steril. 2022 Apr;117(4):792-800. doi: 10.1016/j.fertnstert.2021.12.025. Epub 2022 Jan 31. PMID: 35109980 在IVF周期中,子宫内膜厚度是常规测量的,子宫内膜薄与流产、异位妊娠、前置胎盘、低出生体重、以及其他产科并发症风险增加有关。既往研究表明,在鲜胚移植周期中,子宫内膜厚度的增加对妊娠结局的改善有帮助。冷冻胚胎移植周期中,子宫内膜厚度与IVF妊娠结局的关系不一致,也有研究认为FET周期中子宫内膜厚度不能预测活产率。因此,目前尚不清楚妊娠率和活产率是否在某一点趋于稳定,或者是否随着子宫内膜厚度的增加而继续上升。此外,FET与fresh ET的最佳子宫内膜厚度是否相同仍有待揭示。本研究探索了在鲜胚周期与冻胚周期中是否存在最合适的内膜厚度。 研究目的主要目的是确定在新鲜IVF-ET和FET周期中,是否存在活产率达到峰值的子宫内膜厚度,以及是否存在活产率下降的子宫内膜厚度。同时比较了患者年龄、胚胎期别及获卵数是否影响子宫内膜厚度与活产率。 纳入数据来自加拿大辅助生殖技术注册+(CARTR Plus)数据库,纳入2013年1月至2019年12月之间96760个自体周期。这包括43383个鲜胚周期和53377个冻胚周期。 研究性质回顾性队列研究,将冻胚与鲜胚周期分别进行分析,观察其合适的内膜厚度。鲜胚周期的内膜厚度记录的是扳机当天的,而在冷冻周期中,内膜厚度的记录主要是来自开始给孕酮之前或在LH峰或HCG扳机前的。 这是迄今为止该方向最大样本量的一项研究,比较了新鲜和冻融体外受精周期中子宫内膜厚度对活产率的影响。在鲜胚周期中,子宫内膜厚度增加与回收的卵母细胞平均数、雌二醇平均峰值水平和可用胚胎平均数显著增加有关,这可能导致内膜厚度与预后良好患者对于妊娠结局改善的混淆。新鲜和冷冻周期之间的“最佳”内膜厚度似乎存在差异,可能是由于控制性卵巢过度刺激(COH)对子宫内膜的影响导致的。 结论 在新鲜胚胎移植的周期中,活产率显著增加,直到子宫内膜厚度为10-12mm,而在FET周期中,活产率在内膜为7-10mm后趋于稳定。
Abstract:
OBJECTIVE: To study the effect of increasing endometrial thickness on live birth rates in fresh and frozen-thaw embryo transfer (FET) cycles.DESIGN: Retrospective cohort study.SETTING: National data from Autologous in vitro … >>>
OBJECTIVE: To study the effect of increasing endometrial thickness on live birth rates in fresh and frozen-thaw embryo transfer (FET) cycles.DESIGN: Retrospective cohort study.SETTING: National data from Autologous in vitro fertilization (IVF) embryo transfer and FET cycles in Canada from the Canadian Assisted Reproductive Technology Registry Plus (CARTR Plus) database for records between January 2013 and December 2019.PATIENTS: Thirty-three Canadians clinics participated in voluntary reporting of IVF and pregnancy outcomes to the Canadian Assisted Reproductive Technology Registry Plus database, and a total of 43,383 fresh and 53,377 frozen transfers were included.INTERVENTION(S): None.MAIN OUTCOME MEASURE(S): Clinical pregnancy, pregnancy loss, and live birth rates.RESULTS: In fresh IVF-embryo transfer cycles, increasing endometrial thickness is associated with significant increases in the mean number of oocytes retrieved, peak estradiol levels, number of usable embryos, clinical pregnancy rates, live birth rates, and mean term singleton birth weights, and a decrease in pregnancy loss rates. However, live birth rates plateau after 10-12 mm. In contrast, in FET cycles live birth rates plateau after the endometrium measures 7-10 mm. The improvement in live birth rates with increasing endometrial thickness was independent of patient age, timing of embryo transfer (e.g., cleavage stage vs. blastocyst stage), or the number of oocytes at retrieval.CONCLUSIONS: In cycles with a fresh embryo transfer, live birth rates increase significantly until an endometrial thickness of 10-12 mm, while in FET cycles live birth rates plateau after 7-10 mm. However, an endometrial thickness <6 mm was associated clearly with a dramatic reduction in live birth rates in fresh and frozen embryo transfer cycles. <<<
翻译
826.
颜林林 (2022-07-28 08:50):
#paper doi:10.1093/bioinformatics/btac137 Bioinformatics, 2022, BWA-MEME: BWA-MEM emulated with a machine learning approach. 看到李恒在Twitter上转发这篇文章,本以为大神又升级了bwa mem2,之后发现原来是他人的作品,得到了李恒钦点而已。作为某个知名软件的后继者,必然是要在某个方面有较大改进的,这篇的改进主要在性能。用于高通量测序数据的短序列比对算法,通常都是先用精确匹配种子(这几乎都是查表法在常数时间内完成),然后进行延伸匹配。而种子序列的长度选择,是一项比较有技巧性的事,太短可能导致重复匹配(hit)过多,太长则可能大量单词无匹配(在基因组上无该序列)却占据字典,导致字典过大。为此,过去也有一些算法,会采用变长种子来解决该问题(我也设想过这个策略,但惭愧的是,最终未能付诸实践)。而变长种子的策略,存在内存块大小不定、访问频繁等问题,会导致性能瓶颈。在本文中,通过机器学习的方法,在建立种子索引的阶段进行预处理,使得索引能够根据基因组序列数据进行适应,使不同长度种子的内存访问次数固定,从而获得性能提升。在最终的评测中,bwa-meme 能保持与 bwa-mem2 的输出相同,运行速度则提升了 3.45 倍。这篇文章的算法,可以再仔细深入学习下。
Abstract:
MOTIVATION: The growing use of next-generation sequencing and enlarged sequencing throughput require efficient short-read alignment, where seeding is one of the major performance bottlenecks. The key challenge in the seeding … >>>
MOTIVATION: The growing use of next-generation sequencing and enlarged sequencing throughput require efficient short-read alignment, where seeding is one of the major performance bottlenecks. The key challenge in the seeding phase is searching for exact matches of substrings of short reads in the reference DNA sequence. Existing algorithms, however, present limitations in performance due to their frequent memory accesses.RESULTS: This article presents BWA-MEME, the first full-fledged short read alignment software that leverages learned indices for solving the exact match search problem for efficient seeding. BWA-MEME is a practical and efficient seeding algorithm based on a suffix array search algorithm that solves the challenges in utilizing learned indices for SMEM search which is extensively used in the seeding phase. Our evaluation shows that BWA-MEME achieves up to 3.45× speedup in seeding throughput over BWA-MEM2 by reducing the number of instructions by 4.60×, memory accesses by 8.77× and LLC misses by 2.21×, while ensuring the identical SAM output to BWA-MEM2.AVAILABILITY AND IMPLEMENTATION: The source code and test scripts are available for academic use at https://github.com/kaist-ina/BWA-MEME/.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. <<<
翻译
827.
徐炳祥 (2022-07-27 21:51):
#paper International Conference on Learning Representations, 2020, Hyper-SAGNN: a self-attention based graph neural network for hypergraphs. 对具有高阶连接的超图进行图表示学习是提取很多现实问题中有用模式的必经步骤,然而当前(2020)的超图表示学习算法均无法很好处理超边大小不一致的超图。本文作者基于自注意力思想设计了一种称为Hyper-SAGNN的图神经网络结构,很好的处理了有可变超边大小的超图网络学习问题。此网络架构首先使用一单层神经网络将输入特征映射为“静态嵌入”,然后使用Multi-heat attention结构将位于同一超边内的节点映射为“动态嵌入”,进而使用Hadamard积刻画“静态表示”和“动态表示”的相似性,结果传入一单层神经网络,最终预测超边存在的概率。模型在通用测试数据集上均有比当时通行模型更好的表现,同时在单细胞Hi-C数据的表示和细胞分类问题中也有上佳表现。2022年,他们在Nature biotechnology上发表了基于此网络结构的单细胞Hi-C数据表示方法Higashi(doi: 10.1038/s41587-021-01034-y)
IF:33.100Q1 Nature biotechnology, 2022-02. DOI: 10.1038/s41587-021-01034-y PMID: 34635838 PMCID:PMC8843812
Abstract:
Single-cell Hi-C (scHi-C) can identify cell-to-cell variability of three-dimensional (3D) chromatin organization, but the sparseness of measured interactions poses an analysis challenge. Here we report Higashi, an algorithm based on … >>>
Single-cell Hi-C (scHi-C) can identify cell-to-cell variability of three-dimensional (3D) chromatin organization, but the sparseness of measured interactions poses an analysis challenge. Here we report Higashi, an algorithm based on hypergraph representation learning that can incorporate the latent correlations among single cells to enhance overall imputation of contact maps. Higashi outperforms existing methods for embedding and imputation of scHi-C data and is able to identify multiscale 3D genome features in single cells, such as compartmentalization and TAD-like domain boundaries, allowing refined delineation of their cell-to-cell variability. Moreover, Higashi can incorporate epigenomic signals jointly profiled in the same cell into the hypergraph representation learning framework, as compared to separate analysis of two modalities, leading to improved embeddings for single-nucleus methyl-3C data. In an scHi-C dataset from human prefrontal cortex, Higashi identifies connections between 3D genome features and cell-type-specific gene regulation. Higashi can also potentially be extended to analyze single-cell multiway chromatin interactions and other multimodal single-cell omics data. <<<
翻译
828.
cellsarts (2022-07-27 20:16):
利用内生信号肽序列进行苔藓小立碗藓细胞的快速生产和高效分泌外源的蛋白 BMC Biotechnol     . 2005 Nov 7;5:30. doi: 10.1186/1472-6750-5-30. Andreas Schaaf†3,Stefanie Tintelnot†1,Armin baur1,2, Ralf Reski1, Gilbert Gorr2和Eva L Decker*1 1Department of Plant Biotechnology, Faculty of Biology, University of Freiburg, Schaenzlestr. 1, 79104 Freiburg, Germany 摘要 背景:如何将重组蛋白高效的输送到合适的细胞器以实现高效靶向是利用植物系统进行重组蛋白生产的瓶颈之一。常用的做法是利用所要生产的外源蛋白的天然分泌信号肽(native signal peptide)。虽然分泌信号的一般特征在植物和动物之间是保守的,信号多肽之间的广泛序列变异性表明信号多肽识别效率不同。 结果:为了提高小立碗藓(Physcomitrella patens)原丝体生物反应器的分泌效率,我们定量比较了两种人类信号肽和最近分离的6种苔藓(小立碗藓)蛋白的信号肽的效率。因此,我们将不同信号融合到异源报告基因序列中瞬时转染苔藓细胞,并分别测定重组蛋白rhVEGF和GST在细胞外和细胞内的积累情况。我们的数据表明,与使用的两种人类信号肽相比,内源性苔藓信号肽的分泌效率高达五倍。 结论:从细胞外和细胞内重组蛋白的分布,我们认为信号识别粒子周期(SRP-cycle)期间的翻译抑制是人类信号细胞外积累减少的几种可能的解释中最可能的。在这项工作中,我们报告了在小立碗藓-生物反应器系统中,利用苔藓分泌信号的分泌量高于利用的重组蛋白的原有的信号肽。虽然这一效应的分子细节仍有待阐明,但我们的研究结果将有助于分子农业系统的改进。
IF:3.500Q2 BMC Biotechnology, 2005. DOI: 10.1186/1472-6750-5-30
Abstract:
Abstract Background Efficient targeting to appropriate cell organelles is one of the bottlenecks for the production of recombinant proteins in plant systems. A common practice is to use the native … >>>
Abstract Background Efficient targeting to appropriate cell organelles is one of the bottlenecks for the production of recombinant proteins in plant systems. A common practice is to use the native secretory signal peptide of the heterologous protein to be produced. Though general features of secretion signals are conserved between plants and animals, the broad sequence variability among signal peptides suggests differing efficiency of signal peptide recognition. Results Aiming to improve secretion in moss bioreactors, we quantitatively compared the efficiency of two human signal peptides and six signals from recently isolated moss (Physcomitrella patens) proteins. We therefore used fusions of the different signals to heterologous reporter sequences for transient transfection of moss cells and measured the extra- and intracellular accumulation of the recombinant proteins rhVEGF and GST, respectively. Our data demonstrates an up to fivefold higher secretion efficiency with endogenous moss signals compared to the two utilised human signal peptides. Conclusion From the distribution of extra- and intracellular recombinant proteins, we suggest translational inhibition during the signal recognition particle-cycle (SRP-cycle) as the most probable of several possible explanations for the decreased extracellular accumulation with the human signals. In this work, we report on the supremacy of moss secretion signals over the utilised heterologous ones within the moss-bioreactor system. Though the molecular details of this effect remain to be elucidated, our results will contribute to the improvement of molecular farming systems. <<<
翻译
829.
颜林林 (2022-07-26 23:37):
#paper doi:10.1002/jbio.202100389 Journal of Biophotonics, 2022, Skin's green autofluorescence at dorsal centremetacarpus may become a novel biomarker for diagnosis of lung cancer. 肿瘤早筛是当下最热门的研发方向之一,过热到都似乎开始裁员的地步,因为大家都在同质化地走类似的路线(如甲基化测序)。而这篇来自上海交大的文章,另辟蹊径地采取对皮肤的自发荧光进行检测的方法,尝试将其用于肺癌早期筛查和诊断。这是一种真正无创的新型检测方法,其原理在于皮肤表皮的棘层中,存在一种角蛋白分子,在蓝光照射下会发出荧光。而这种荧光的强度,又与疾病状态相关。本文研究中纳入了临床实际病例和异体移植的小鼠肿瘤模型,从肺部感染或健康对照中分别区分肺癌,AUC分别可达到 0.871 和 0.813,证明了这是一种潜在的生物标志物,可用于肺癌早期筛查和诊断。
IF:2.000Q3 Journal of biophotonics, 2022-05. DOI: 10.1002/jbio.202100389 PMID: 35075788
Abstract:
It is critical to discover novel biomarkers of lung cancer for establishing economical technology for diagnosis of lung cancer. Our study has suggested that the autofluorescence (AF) of the skin … >>>
It is critical to discover novel biomarkers of lung cancer for establishing economical technology for diagnosis of lung cancer. Our study has suggested that the autofluorescence (AF) of the skin may become a novel biomarker of this type: First, development of lung cancer led to a significant increase in the skin's green AF in a mouse model of lung cancer; second, lung cancer patients had significantly higher skin's green AF at certain positions compared with healthy volunteers and pulmonary infection patients; and third, using the skin's green AF intensity at dorsal centremetacarpus as the variable, the areas under curve (AUC) for differentiating lung cancer patients and pulmonary infection patients and for differentiating lung cancer patients and healthy volunteers was 0.871 and 0.813, respectively. Collectively, our study has indicated that the skin's green AF at dorsal centremetacarpus may become a novel biomarker for establishing a ground-breaking diagnostic strategy for lung cancer. <<<
翻译
830.
半面阳光 (2022-07-26 14:25):
#paper DOI: 10.1073/pnas.2019768118, 2021 Feb 2;118(5):e2019768118. Genome-wide detection of cytosine methylation by single molecule real-time sequencing. 这篇文章并非一篇最新发表的文献,是香港中文大学卢煜明团队于2021年发表在PANS上一篇研究文献。因为近期在一个学术会议上听到卢煜明教授介绍了这篇文献有关的研究结果,因此拿来研读。这篇文章的核心内容是利用PacBio的SMRT三代测序技术和卷积神经网络来检测DNA的甲基化。胞嘧啶的甲基化修饰,5-Methylcytosine (5mC) 是表观修饰中最重要的一种类型。应用比较广泛的检测CpG测序方法是亚硫酸盐测序(BS-seq)。但是BS-seq有一些不足之处,比如亚硫酸盐会导致DNA降解、还会将DNA序列中非甲基化的胞嘧啶(C)转化为胸腺嘧啶(T),影响后续的比对;而原始序列中C->T的点突变则又无法被亚硫酸盐所修饰。因此,在这篇文献中,作者采用单分子实时测序(Single molecular rea-time sequencing, SMRT sequencing)技术,开发了一个直接检测5mC的方法。这个方法将SMRT测序中的两个关键信息作为输入数据,结合卷积神经网络(CNN)构建了一个称为Holistic Kinetic (HK)Model 的检测方法。关键输入数据包括两个:一是SMRT测序中DNA聚合酶的动态信号(包括单个碱基发出荧光信号的时间和两个连续碱基之间的间隔时间),二是“序列背景”信息,即待检测的一段固定长度的DNA序列信息,这段固定长度的序列被称为一个“检测窗口”。作者首先用全基因组扩增的方法构建了一个非甲基化的数据集(阴性数据集,所有序列几乎都没有甲基化),同时用M.SssI 转甲基酶处理DNA样本构建了一个甲基化(阳性数据集, M.SssI 能够对双链DNA上的所有CpG位点进行甲基化);接着从这两个数据集中各取出一半数据来训练卷积神经网络,剩下的数据用于验证HK model的检测效果。结果显示,用HK model来区分甲基化状态的AUC最高达到了0.97。全基因组范围内在单碱基分辨率水平上检测5mC的敏感性和特异性分别达到90%和94%。研究结果还发现通过调节检测窗口大小和测序深度能够改变HK模型的检测效果。为了平衡下游数据分析与准确性之间的关系,最后选定21nt作为检测窗口的默认值,将10×作为测序深度的默认值。后续,作者采用一段人和小鼠杂交序列验证了HK模型在检测“杂合甲基化”序列(即同一段序列中同时包括甲基化和非甲基化的CpG )的可行性。此外,作者还对BS-seq的检测效果和HK model的检测效果进行了简单的比较研究。看这篇文献的感受一方面是工作量大,二是体现了作者对分子生物学的理论知识和测序技术特点的充分理解和应用。另外,这篇文献的整体研究框架和卢煜明团队以往的研究在思维上有着一脉相承的感觉,都体现了透彻地理解基本理论、灵活地运用测序技术来解决临床检测的难题。
Abstract:
5-Methylcytosine (5mC) is an important type of epigenetic modification. Bisulfite sequencing (BS-seq) has limitations, such as severe DNA degradation. Using single molecule real-time sequencing, we developed a methodology to directly … >>>
5-Methylcytosine (5mC) is an important type of epigenetic modification. Bisulfite sequencing (BS-seq) has limitations, such as severe DNA degradation. Using single molecule real-time sequencing, we developed a methodology to directly examine 5mC. This approach holistically examined kinetic signals of a DNA polymerase (including interpulse duration and pulse width) and sequence context for every nucleotide within a measurement window, termed the holistic kinetic (HK) model. The measurement window of each analyzed double-stranded DNA molecule comprised 21 nucleotides with a cytosine in a CpG site in the center. We used amplified DNA (unmethylated) and M.SssI-treated DNA (methylated) (M.SssI being a CpG methyltransferase) to train a convolutional neural network. The area under the curve for differentiating methylation states using such samples was up to 0.97. The sensitivity and specificity for genome-wide 5mC detection at single-base resolution reached 90% and 94%, respectively. The HK model was then tested on human-mouse hybrid fragments in which each member of the hybrid had a different methylation status. The model was also tested on human genomic DNA molecules extracted from various biological samples, such as buffy coat, placental, and tumoral tissues. The overall methylation levels deduced by the HK model were well correlated with those by BS-seq ( = 0.99; < 0.0001) and allowed the measurement of allele-specific methylation patterns in imprinted genes. Taken together, this methodology has provided a system for simultaneous genome-wide genetic and epigenetic analyses. <<<
翻译
831.
颜林林 (2022-07-25 07:28):
#paper doi:10.1038/s41380-022-01661-0 Molecular Psychiatry, 2022, The serotonin theory of depression: a systematic umbrella review of the evidence. 这是一篇meta分析,而且还是一篇阴性结果的报道,按照很多“业内人”的观点,这样的“水文”是不屑一顾或羞于启齿的。本文研究血清素(serotonin,即5-羟色胺)是否与抑郁症病因有关。这是一个流行于大多数公众和专业研究人员的观点,人们普遍认为血清素降低与抑郁症有关。本文采取了“伞式”审查(umbrella review)方法,纳入多个不同领域对血清素系统进行的大量研究,以便为结论提供可及的最高证据等级支持。涵盖的六个领域分别是:(1) 血清素及其代谢物5-HIAA(5-羟吲哚乙酸)是否在抑郁症患者体液中含量更低;(2) 抑郁症患者的血清素受体是否表达水平更低;(3) 血清素转运蛋白(SERT)是否抑郁症患者中表达更高;(4) 色氨酸(5-羟色胺的前体)耗竭是否会导致抑郁症;(5) 抑郁症患者的 SERT 基因是否表达更高;(6) 抑郁症患者的SERT基因与压力之间是否存在相互作用。本文研究在 PROSPERO 注册(CRD42020207203),共纳入 17 项研究:12 项系统评价和meta分析(systematic reviews and meta-analyses),1 项协作meta分析(collaborative meta-analysis),1 项大型队列研究的meta分析(meta-analysis of large cohort studies),1 项系统评价和综述(systematic review and narrative synthesis),1 项遗传关联研究(genetic association study)和 1 项伞式审查(umbrella review)。最终在六个领域问题上,分别以各自可及的最大样本量(从数百到数万),否定了血清素活性标志物与抑郁症之间的关联,并建议“it is time to acknowledge that the serotonin theory of depression is not empirically substantiated(是时候承认抑郁症的血清素理论并没有经验实证)”。可见,能够明确下一个阴性结论(否定结论),也是相当不容易的。
IF:9.600Q1 Molecular psychiatry, 2023-Aug. DOI: 10.1038/s41380-022-01661-0 PMID: 35854107
Abstract:
The serotonin hypothesis of depression is still influential. We aimed to synthesise and evaluate evidence on whether depression is associated with lowered serotonin concentration or activity in a systematic umbrella … >>>
The serotonin hypothesis of depression is still influential. We aimed to synthesise and evaluate evidence on whether depression is associated with lowered serotonin concentration or activity in a systematic umbrella review of the principal relevant areas of research. PubMed, EMBASE and PsycINFO were searched using terms appropriate to each area of research, from their inception until December 2020. Systematic reviews, meta-analyses and large data-set analyses in the following areas were identified: serotonin and serotonin metabolite, 5-HIAA, concentrations in body fluids; serotonin 5-HT receptor binding; serotonin transporter (SERT) levels measured by imaging or at post-mortem; tryptophan depletion studies; SERT gene associations and SERT gene-environment interactions. Studies of depression associated with physical conditions and specific subtypes of depression (e.g. bipolar depression) were excluded. Two independent reviewers extracted the data and assessed the quality of included studies using the AMSTAR-2, an adapted AMSTAR-2, or the STREGA for a large genetic study. The certainty of study results was assessed using a modified version of the GRADE. We did not synthesise results of individual meta-analyses because they included overlapping studies. The review was registered with PROSPERO (CRD42020207203). 17 studies were included: 12 systematic reviews and meta-analyses, 1 collaborative meta-analysis, 1 meta-analysis of large cohort studies, 1 systematic review and narrative synthesis, 1 genetic association study and 1 umbrella review. Quality of reviews was variable with some genetic studies of high quality. Two meta-analyses of overlapping studies examining the serotonin metabolite, 5-HIAA, showed no association with depression (largest n = 1002). One meta-analysis of cohort studies of plasma serotonin showed no relationship with depression, and evidence that lowered serotonin concentration was associated with antidepressant use (n = 1869). Two meta-analyses of overlapping studies examining the 5-HT receptor (largest n = 561), and three meta-analyses of overlapping studies examining SERT binding (largest n = 1845) showed weak and inconsistent evidence of reduced binding in some areas, which would be consistent with increased synaptic availability of serotonin in people with depression, if this was the original, causal abnormaly. However, effects of prior antidepressant use were not reliably excluded. One meta-analysis of tryptophan depletion studies found no effect in most healthy volunteers (n = 566), but weak evidence of an effect in those with a family history of depression (n = 75). Another systematic review (n = 342) and a sample of ten subsequent studies (n = 407) found no effect in volunteers. No systematic review of tryptophan depletion studies has been performed since 2007. The two largest and highest quality studies of the SERT gene, one genetic association study (n = 115,257) and one collaborative meta-analysis (n = 43,165), revealed no evidence of an association with depression, or of an interaction between genotype, stress and depression. The main areas of serotonin research provide no consistent evidence of there being an association between serotonin and depression, and no support for the hypothesis that depression is caused by lowered serotonin activity or concentrations. Some evidence was consistent with the possibility that long-term antidepressant use reduces serotonin concentration. <<<
翻译
832.
白义民 (2022-07-24 17:54):
#paper 《宁玛派龙钦巴研究》,龙钦巴是藏密大圆满教法ati-yoga的集大成者,这篇文章从其修行历程传记和他的著述两部分对龙钦巴做了概略的介绍,与一般的语焉不详的密宗表述不同,这篇博士论文从民族宗教学的学术角度比较准确,直白的概论了大圆满教法。对大圆满修行感兴趣的人而言,在学术性文章的指引下,可以避免少走弯路。
833.
颜林林 (2022-07-24 05:55):
#paper doi:10.1186/s12864-022-08762-8 BMC Genomics, 2022, Poly(a) selection introduces bias and undue noise in direct RNA-sequencing. 全转录组测序实验中,在初始的RNA提取环节后,经常会使用poly-A筛选方法,来富集mRNA。本文使用ONT平台,开展直接RNA测序(direct RNA-sequencing),并对同一样本,平行地采取使用和不适用poly-A筛选的方法。最终结果说明,省略该环节是合适的,虽然这么做可能轻微降低文库复杂度,但它能更有效避免该筛选环节带来的其他弊端,如需要更多RNA起始量、容易倾向地筛选出具有更长poly-A尾巴的mRNA、会导致差异表达基因也受到影响而更不稳定等。
IF:3.500Q2 BMC genomics, 2022-Jul-22. DOI: 10.1186/s12864-022-08762-8 PMID: 35869428
Abstract:
BACKGROUND: Genome-wide RNA-sequencing technologies are increasingly critical to a wide variety of diagnostic and research applications. RNA-seq users often first enrich for mRNA, with the most popular enrichment method being … >>>
BACKGROUND: Genome-wide RNA-sequencing technologies are increasingly critical to a wide variety of diagnostic and research applications. RNA-seq users often first enrich for mRNA, with the most popular enrichment method being poly(A) selection. In many applications it is well-known that poly(A) selection biases the view of the transcriptome by selecting for longer tailed mRNA species.RESULTS: Here, we show that poly(A) selection biases Oxford Nanopore direct RNA sequencing. As expected, poly(A) selection skews sequenced mRNAs toward longer poly(A) tail lengths. Interestingly, we identify a population of mRNAs (> 10% of genes' mRNAs) that are inconsistently captured by poly(A) selection due to highly variable poly(A) tails, and demonstrate this phenomenon in our hands and in published data. Importantly, we show poly(A) selection is dispensable for Oxford Nanopore's direct RNA-seq technique, and demonstrate successful library construction without poly(A) selection, with decreased input, and without loss of quality.CONCLUSIONS: Our work expands the utility of direct RNA-seq by validating the use of total RNA as input, and demonstrates important technical artifacts from poly(A) selection that inconsistently skew mRNA expression and poly(A) tail length measurements. <<<
翻译
834.
颜林林 (2022-07-23 22:05):
#paper doi:10.1101/2022.07.21.500999 bioRxiv, 2022, High-resolution de novo structure prediction from primary sequence. 这篇预发表的文章,开发了一个工具,OmegaFold,可以基于单个蛋白的一级序列信息,预测三级结构。现在主流的方法,都需要依赖演化信息,即通过多序列比对作为辅助,进行蛋白质折叠结构的预测。而本文认为,蛋白从被翻译合成出来后,就会经历从一级序列自动折叠成为三级结构,因而这些演化信息对于结构预测而言并非必要。本文采取的深度模型,会依赖于一组预训练模型,帮助识别出一级序列中哪些氨基酸更为重要(即赋予不同的注意力),并采取基于BERT的语言模型技术,帮助进行蛋白质折叠的模型训练。最终实现的方法,可以有效解决孤儿蛋白(即当前结构数据库中缺乏其他可供参考的相近蛋白)的结构预测问题,且与AlphaFold等工具相比,在准确度上又有显著提升。
Abstract:
Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such … >>>
Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such as with orphan proteins and fast-evolving proteins like antibodies, and a protein typically folds in a natural setting from its primary amino acid sequence into its three-dimensional structure, suggesting that evolutionary information and MSAs should not be necessary to predict a protein's folded form. Here, we introduce OmegaFold, the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures. OmegaFold enables accurate predictions on orphan proteins that do not belong to any functionally characterized protein family and antibodies that tend to have noisy MSAs due to fast evolution. Our study fills a much-needed structure prediction gap and brings us a step closer to understanding protein folding in nature. <<<
翻译
835.
颜林林 (2022-07-22 00:00):
#paper doi:10.1056/NEJMe2207902 The New England Journal of Medicine, 2022, Setting the Benchmark for KRAS(G12C)-Mutated NSCLC. 这是一篇社论(Editorial),介绍了该期杂志上关于KRYSTAL-1二期临床试验的结果报道(doi:10.1056/NEJMoa2204619)。该临床试验的主角,是一种KRAS G12C抑制剂,阿达格拉西布(Adagrasib),其在此次临床试验中表现不错,对经过化疗与免疫治疗的携带KRAS G12C突变的患者,生存评估的指标(ORR、PFS和OS等),与此前另一个获批药物,索托拉西布(sotorasib)非常接近。这篇社论由此推测,这两个药物在机制上可能存在很大的重叠。此外,两个药物在代谢和动力学方面的差异(如穿越血脑屏障、在体内的半衰期等),则又为两个药物未来在选用时可采取的差异化,提供了方向提示。
836.
颜林林 (2022-07-21 00:29):
#paper doi:10.1186/s13059-022-02726-7 Genome Biology, 2022, Integration of single-cell multi-omics data by regression analysis on unpaired observations. 受技术条件限制,绝大多数的单细胞多组学研究,其实都很难在同一细胞上同时检测多个不同组学。本文针对这个问题,基于“相似表达的靶基因的调控基因也相似”的直观认识和假设,采用回归分析方法,对scRNA-seq和ATAC-seq数据之间的关系进行关联和推断,使非配对的scRNA-seq和ATAC-seq实验(即并非同一细胞,而是在不同细胞上分别开展了这两项检测)中,可以通过其中一项数据(如ATAC-seq的染色质开放信息)去推断对应的被调控基因的表达。该方法在模拟数据和实测数据上进行评估,可以达到很高的准确度(与eQTL mapping进行对比,结果高度一致)。这为更好利用当前积累的大量非配对单细胞数据,提供了方法学上的支持。
IF:10.100Q1 Genome biology, 2022-07-19. DOI: 10.1186/s13059-022-02726-7 PMID: 35854350 PMCID:PMC9295346
通过对未配对观察值的回归分析整合单细胞多组学数据
Abstract:
Despite recent developments, it is hard to profile all multi-omics single-cell data modalities on the same cell. Thus, huge amounts of single-cell genomics data of unpaired observations on different cells … >>>
Despite recent developments, it is hard to profile all multi-omics single-cell data modalities on the same cell. Thus, huge amounts of single-cell genomics data of unpaired observations on different cells are generated. We propose a method named UnpairReg for the regression analysis on unpaired observations to integrate single-cell multi-omics data. On real and simulated data, UnpairReg provides an accurate estimation of cell gene expression where only chromatin accessibility data is available. The cis-regulatory network inferred from UnpairReg is highly consistent with eQTL mapping. UnpairReg improves cell type identification accuracy by joint analysis of single-cell gene expression and chromatin accessibility data. <<<
翻译
837.
颜林林 (2022-07-20 07:49):
#paper doi:10.1101/2022.07.17.500374 bioRxiv, 2022, Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations. 这是一个“认真地做一件小事”的例子。在做基因组分析时,我们经常遭遇“究竟该用hg19还是hg38”的纠结,有时候不得不并行地分别使用两个参考基因组来进行两次差不多的分析,以避免由于使用liftOver之类的基因组坐标转换工具带来的信息丢失。这篇文章针对这个小小的(甚至不那么常见的)痛点,在兼容现有VCF格式的情况下,使其在同一个结果文件中带上两套基因组坐标,不仅不影响现有工具的使用,而且可以随时从中进行所需基因组坐标的提取。想法很简单,实现也不难,但却的确是有效解决了某些实际操作的问题。
Abstract:
We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented … >>>
We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented in the Genozip platform, DVCF enables bioinformatics pipelines to seamlessly operate across two coordinate systems by leveraging the system most advantageous to each pipeline step, simplifying bioinformatics workflows and reducing file generation and associated data storage burden. Moreover, our benchmarking of Genozip DVCF shows that it produces more complete, less erroneous, and less biased translations across coordinate systems than two widely used alternative tools (i.e., LiftoverVcf and CrossMap). <<<
翻译
838.
张德祥 (2022-07-19 18:49):
#paper https://doi.org/10.48550/arXiv.2207.04630 On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence 马毅的这篇论文已经有公众号报道过了,马毅结合自己的之前的两个工作,LDR 数据压缩及闭环生成模型的深度网络,将压缩和闭环生成提炼为简约和自洽的智能原则,本论文继续提出了更多通用性的想法,并扩展到3d视觉及强化学习并预测对神经科学及高级智能的影响。
Abstract:
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in … >>>
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that address two fundamental questions regarding Intelligence: what to learn and how to learn, respectively. We believe the two principles are the cornerstones for the emergence of Intelligence, artificial or natural. While these two principles have rich classical roots, we argue that they can be stated anew in entirely measurable and computable ways. More specifically, the two principles lead to an effective and efficient computational framework, compressive closed-loop transcription, that unifies and explains the evolution of modern deep networks and many artificial intelligence practices. While we mainly use modeling of visual data as an example, we believe the two principles will unify understanding of broad families of autonomous intelligent systems and provide a framework for understanding the brain. <<<
翻译
839.
颜林林 (2022-07-19 00:21):
#paper doi:10.1002/humu.24440 Human Mutation, 2022, Multi-omics analysis reveals multiple mechanisms causing Prader-Willi like syndrome in a family with a X;15 translocation. 这篇文章报道了一个患有PWS(Prader-Willi syndrome)遗传病的家庭,以及对其致病基因进行发现和确认的过程。PWS是一种神经发育疾病,且属于教科书级别的遗传病,因为它由一个遗传印记基因区域的变异所导致。所谓遗传印记,即该等位基因会记住其来源是父方或母方,并只在其中一方来源的染色体上的该基因才会表达。PWS就是与15q11.2区域相关,通常是该区域基因的父源拷贝缺失导致疾病。这篇文章报道的家庭,两位女儿都表现出该疾病相关症状(肥胖、智力障碍等),其母亲是携带者(存在一个15号染色体与X染色体的易位突变,translocation)。在本文中,分别使用了核型分析(karyotype)、FISH(染色体原位荧光杂交)、甲基化敏感的MLPA、短序列WGS、10x linked read WGS、转录组测序、ddPCR等方法,各方法都对应解决了在该遗传调查过程中要解决的某个环节的问题,最终确认了该致病基因,以及解释和推论出两个女儿患者的不同发病机制:一个是在15号染色体该区域表现为单亲二体(Uniparental disomy,UPD),另一个则是在印记基因上丧失了印记特性,即两条染色体上都能同时表达该SNRPN基因。对于遗传病研究人员或者从事遗传咨询工作的人员,这篇文章的整个研究过程,涉及的技术众多,逻辑条理清晰,非常具有学习价值。
IF:3.300Q2 Human mutation, 2022-11. DOI: 10.1002/humu.24440 PMID: 35842787
Abstract:
Prader-Willi syndrome (PWS; MIM# 176270) is a neurodevelopmental disorder caused by the loss of expression of paternally imprinted genes within the PWS region located on 15q11.2. It is usually caused … >>>
Prader-Willi syndrome (PWS; MIM# 176270) is a neurodevelopmental disorder caused by the loss of expression of paternally imprinted genes within the PWS region located on 15q11.2. It is usually caused by either maternal uniparental disomy of chromosome 15 (UPD15) or 15q11.2 recurrent deletion(s). Here, we report a healthy carrier of a balanced X;15 translocation and her two daughters, both with the karyotype 45,X,der(X)t(X;15)(p22;q11.2),-15. Both daughters display symptoms consistent with haploinsufficiency of the SHOX gene and PWS. We explored the architecture of the derivative chromosomes and investigated effects on gene expression in patient-derived neural cells. First, a multiplex ligation-dependent probe amplification methylation assay was used to determine the methylation status of the PWS-region revealing maternal UPD15 in daughter 2, explaining her clinical symptoms. Next, short read whole genome sequencing and 10X genomics linked read sequencing was used to pinpoint the exact breakpoints of the translocation. Finally, we performed transcriptome sequencing on neuroepithelial stem cells from the mother and from daughter 1 and observed biallelic expression of genes in the PWS region (including SNRPN) in daughter 1. In summary, our multi-omics analysis highlights two different PWS mechanisms in one family and provide an example of how structural variation can affect imprinting through long-range interactions. <<<
翻译
840.
颜林林 (2022-07-18 06:00):
#paper doi:10.1101/2022.07.14.500036 bioRxiv, 2022, Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics. 单细胞转录组测序数据分析中,需要对批次效应影响进行去除。这通常是对原本高维的数据进行降维,使其在更容易反映出数据结构特征的低维空间上,根据批次信息对数据进行矫正。这个过程很容易导致具有生物学意义的数据特征被误伤,而这样的生物学差异正是我们进行单细胞测序所要研究的对象。针对如何去除批次效应影响,以及如何保留生物学相关数据差异,这两个原本互相矛盾的目标,通常被单细胞测序分析工具根据其各自策略原则的不同,会被选取其中之一作为优先目标进行优化。在本文中,作者通过引入一种名为帕累托多任务学习(Pareto MTL)的多目标优化技术,使综合评估并权衡与两者有关的多种不同指标,以获得整体更优的目的。在这个过程中,还基于神经网络方法,提出一种名为交互信息神经估计(Mutual Information Neural Estimation,MINE)的指标,来帮助该平衡点的选取。文章使用了TM-MARROW和MACAQUE-RETINA等公共数据集,对方法进行了评估,并展示了MINE的效果,确实优于常用的MMD方法。
Abstract:
Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low … >>>
Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low dimensional latent representations better to understand overall structure in the data. In this work, we build upon scVI, a powerful deep generative model which can learn biologically meaningful latent representations, but which has limited explicit control of batch effects. Rather than prioritizing batch effect removal over conservation of biological variation, or vice versa, our goal is to provide a bird eye view of the trade-offs between these two conflicting objectives. Specifically, using the well established concept of Pareto front from economics and engineering, we seek to learn the entire trade-off curve between conservation of biological variation and removal of batch effects. A multi-objective optimisation technique known as Pareto multi-task learning (Pareto MTL) is used to obtain the Pareto front between conservation of biological variation and batch effect removal. Our results indicate Pareto MTL can obtain a better Pareto front than the naive scalarization approach typically encountered in the literature. In addition, we propose to measure batch effect by applying a neural-network based estimator called Mutual Information Neural Estimation (MINE) and show benefits over the more standard Maximum Mean Discrepancy (MMD) measure. The Pareto front between conservation of biological variation and batch effect removal is a valuable tool for researchers in computational biology. Our results demonstrate the efficacy of applying Pareto MTL to estimate the Pareto front in conjunction with applying MINE to measure the batch effect. <<<
翻译
回到顶部