来自杂志 bioRxiv 的文献。
当前共找到 43 篇文献分享,本页显示第 21 - 40 篇。
21.
颜林林 (2024-03-13 05:35):
#paper doi:10.1101/2024.02.18.580107, 2024, FECDO-Flexible and Efficient Coding for DNA Odyssey. 这篇文献提出了一种新的DNA数据存储编码方法,FECDO(缩写自 Flexible and Efficient Coding for DNA Odyssey),旨在通过高效的数据压缩和灵活的编码策略来减少DNA合成成本,从而促进DNA数据存储技术的实用化。该方法首先使用深度学习方法(分别尝试了无任何先验知识的独立神经网络,以及预训练的语言模型)来提取数据特征,从而把要存储的数据,从独热编码张量(one-hot encoded tensor)转换成为边际概率序列,实现了压缩的过程;该概率序列被映射成为4字母(A、C、G、T)的碱基序列,进而再使用一个层次有限状态机(hierarchical finite state machine)排除掉不适合DNA存储的特殊编码(如连续相同碱基、有特殊二级结构等)。通过上述过程,本文方法通过实测文本和图像数据,对比bzip2方法,提高了12%-26%的压缩效率,这种压缩效率将反映到DNA合成成本的显著降低上,是DNA存储技术的关键问题。同时,本文还尝试将其中一组文字所编码的结果,实际合成为DNA(进行保存),之后使用PCR将目标片段扩增出来,使用NanoPore测序,再解码还原得到原始数据,从整个流程上对方法进行了验证。由于目前文章尚处于bioRxiv preprint(文章提交版本v2),只提供了正文全文和正文图表,并未提供补充材料、方法描述和程序源码,尚有许多实现和结果的细节未公布,我个人比较怀疑该方法的信息容错能力和实测效果,正文中图表上展现的非英语文本和图像的压缩效果看起来也不是很理想,这些都有待文章正式发表后看到相应解答。
Fajia Sun, Long Qian
Abstract:
DNA has been pursued as a compelling medium for digital data storage during the past decade. While large-scale data storage and random access have been achieved in artificial DNA, the synthesis cost keeps hindering DNA data storage from popularizing into daily life. In this study, we proposed a more efficient paradigm for digital data compressing to DNA, while excluding arbitrary sequence constraints. Both standalone neural networks and pre-trained language models were used to extract the intrinsic patterns of data, and generated probabilistic portrayal, which was then transformed into constr… >>>
DNA has been pursued as a compelling medium for digital data storage during the past decade. While large-scale data storage and random access have been achieved in artificial DNA, the synthesis cost keeps hindering DNA data storage from popularizing into daily life. In this study, we proposed a more efficient paradigm for digital data compressing to DNA, while excluding arbitrary sequence constraints. Both standalone neural networks and pre-trained language models were used to extract the intrinsic patterns of data, and generated probabilistic portrayal, which was then transformed into constraint-free nucleotide sequences with a hierarchical finite state machine. Utilizing these methods, a 12%-26% improvement of compression ratio was realized for various data, which directly translated to up to 26% reduction in DNA synthesis cost. Combined with the progress in DNA synthesis, our methods are expected to facilitate the realization of practical DNA data storage. <<<
22.
DeDe宝 (2023-10-18 10:48):
#paper doi:https://doi.org/10.1101/2023.08.18.553829,Temporal regularities shape perceptual decisions and striatal dopamine signals, bioRxiv, 2023。时间规律塑造感知决策和纹状体多巴胺信号,这里的时间规律指的是实验条件之间的转移概率,而不是我们一般理解的时间分布。文章对小鼠的视觉感知决策行为数据进行分析,总结行为数据关键特征并构建多试次部分可见马尔科夫强化学习模型(POMDP)捕捉并解释数据的关键特征。研究者还比对了公开数据集中99只小鼠在相似实验下的行为表现,说明小鼠更依赖2-back而非1-back决策和决策结果的权重不是由于本实验的实验设计导致,可能是知觉实验中的一种默认策略。最后,研究者发现多巴胺的分泌模式能够和强化学习中的关键预测印证。
Matthias Fritsche, Antara Majumdar, Lauren Strickland, Samuel Liebana Garcia, Rafal Bogacz, Armin Lak
Abstract:
AbstractPerceptual decisions should depend on sensory evidence. However, such decisions are also influenced by past choices and outcomes. These choice history biases may reflect advantageous strategies to exploit temporal regularities of natural environments. However, it is unclear whether and how observers can adapt their choice history biases to different temporal regularities, to exploit the multitude of temporal correlations that exist in nature. Here, we show that mice adapt their perceptual choice history biases to different temporal regularities. This adaptation is well captured by a n… >>>
AbstractPerceptual decisions should depend on sensory evidence. However, such decisions are also influenced by past choices and outcomes. These choice history biases may reflect advantageous strategies to exploit temporal regularities of natural environments. However, it is unclear whether and how observers can adapt their choice history biases to different temporal regularities, to exploit the multitude of temporal correlations that exist in nature. Here, we show that mice adapt their perceptual choice history biases to different temporal regularities. This adaptation is well captured by a normative reinforcement learning algorithm with multi-trial belief states, comprising both current trial sensory and previous trial memory states. We demonstrate that striatal dopamine tracks predictions of the model and behavior, pointing towards the involvement of dopamine in forming adaptive history biases. Our results reveal the adaptive nature of perceptual choice history biases, and shed light on their underlying computational principles and neural implementation. <<<
23.
Ricardo (2023-09-21 17:32):
#paper https://www.biorxiv.org/content/10.1101/2023.09.15.557874v1.full SACNet: A Multiscale Diffeomorphic Convolutional Registration Network with Prior Neuroanatomical Constraints for Flexible Susceptibility Artifact Correction in Echo Planar Imaging 这是我最近released的一个工作。由于回波平面成像技术成像(EPI)速度较快,因此弥散磁共振成像和功能磁共振成像大都会采用EPI技术进行影像采集工作。但是EPI图像中一般会存在磁敏感性伪影(Susceptibility Artifacts, SAs),从而会导致采集的影像存在几何和信号上的扭曲。目前的伪影校正算法一般是针对特定采集序列的图像开发专门的方法,并且存在处理时间较长且校正质量有限等问题。因此,在这个研究中,我提出了一个基于无监督学习的卷积配准网络的伪影校正框架,该框架有以下几点技术创新:1. 我们建立了一个统一的数学框架,通过修正模型超参数,从而可以灵活地用于多相位编码和单相位编码数据的校正;2. 我们通过修改核物理领域内用于模拟无限深势阱的Woods-Saxon势函数,从而提出了一个微分同胚保持函数,用于生成微分同胚形变场;3. 我们设计了一个先验解剖学信息约束函数,从而将没有伪影的T1w/T2w图像中的先验结构信息纳入模型中;4. 我们最后针对该问题设计了一套多尺度的训练及推理协议用于网络的快速训练并优化模型收敛。通过在涵盖新生儿、儿童以及健康成年人的2000个脑影像扫描数据上实验证明,我们的方法比现有的方法表现出更加优异的性能。
Zilong Zeng, Jiaying Zhang, Xinyuan Liang, Lianglong Sun, Yihe Zhang, Weiwei Men, Yanpei Wang, Rui Chen, Haibo Zhang, Shuping Tan ... >>>
Zilong Zeng, Jiaying Zhang, Xinyuan Liang, Lianglong Sun, Yihe Zhang, Weiwei Men, Yanpei Wang, Rui Chen, Haibo Zhang, Shuping Tan, Jia-Hong Gao, Shaozheng Qin, Qiqi Tong, Hongjian He, Sha Tao, Qi Dong, Yong He, Tengda Zhao <<<
Abstract:
AbstractSusceptibility artifacts (SAs), which are inevitable for modern diffusion brain MR images with single-shot echo planar imaging (EPI) protocols in wide large-scale neuroimaging datasets, severely hamper the accurate detection of the human brain white matter structure. While several conventional and deep-learning based distortion correction methods have been proposed, the correction quality and model generality of these approaches are still limited. Here, we proposed the SACNet, a flexible SAs correction (SAC) framework for brain diffusion MR images of various phase-encoding EPI protoco… >>>
AbstractSusceptibility artifacts (SAs), which are inevitable for modern diffusion brain MR images with single-shot echo planar imaging (EPI) protocols in wide large-scale neuroimaging datasets, severely hamper the accurate detection of the human brain white matter structure. While several conventional and deep-learning based distortion correction methods have been proposed, the correction quality and model generality of these approaches are still limited. Here, we proposed the SACNet, a flexible SAs correction (SAC) framework for brain diffusion MR images of various phase-encoding EPI protocols based on an unsupervised learning-based registration convolutional neural network. This method could generate smooth diffeomorphic warps with optional neuroanatomy guidance to correct both geometric and intensity distortions of SAs. By employing near 2000 brain scans covering neonatal, child, adult and traveling participants, our SACNet consistently demonstrates state-of-the-art correction performance and effectively eliminates SAs-related multicenter effects compared with existing SAC methods. To facilitate the development of standard SAC tools for future neuroimaging studies, we also created easy-to-use command lines incorporating containerization techniques for quick user deployment. <<<
24.
尹志 (2023-04-30 10:32):
#paper Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models doi: https://doi.org/10.1101/2022.12.09.519842 这篇文章提出了一种全新的蛋白质设计方法,叫做rf diffusion,它使用深度生成学习生成全新的蛋白质结构。文章主要使用的是 diffusion model,考虑到蛋白质骨架的复杂几何性质以及氨基酸序列-结构的复杂关系,蛋白质生成任务一直以来的挑战很大。这篇工作 使用diffusion model的思路如下:1.使用RoseTTAFold作为去噪网络,考虑到RoseTTA本来就是baker组用来做蛋白质设计的(更多的是基于物理的),这个去噪网络的选择还是很巧妙的;2.整个加噪去噪过程主要针对alpha碳原子的坐标进行,因此rf diffusion的思路是先对骨架结构进行生成的;3.然后full 的protein structure是通过backbone tracking的技术来实现的,这个过程可以理解为基于一些几何约束、bond的长度角度参数等等为已经预测的alpha碳原子添加缺失的bond和原子,4.侧链是通过rotamer实现的,rotamer是一个已经对 每个氨基酸残基做了预先计算的库,它可以为你选择符合能量最优的构象的侧链结构。 因此整个蛋白质生成的过程可以认为是深度生成模型+物理约束+后处理(预先计算)来实现的。当然,这篇工作也做了很多的实验对设计进行验证。baker组在之后使用了rfdiffusion做了后续的一些设计工作,包括De novo design of high-affinity protein binders to bioactive helical peptides这个工作,并在不久前开源了rf diffusion的代码,也有很多蛋白质设计的研究人员开始大量尝试 基于rfdiffusion的设计,并尝试进行湿实验的验证,因此这绝对是一篇开创性的工作,值得各位小伙伴关注。
Joseph L. Watson , David Juergens , Nathaniel R. Bennett , Brian L. Trippe , Jason Yim , Helen E. Eisenach , Woody Ahern , Andrew J. Borst , Robert J. Ragotte , Lukas F. Milles ... >>>
Joseph L. Watson , David Juergens , Nathaniel R. Bennett , Brian L. Trippe , Jason Yim , Helen E. Eisenach , Woody Ahern , Andrew J. Borst , Robert J. Ragotte , Lukas F. Milles , Basile I. M. Wicky , Nikita Hanikel , Samuel J. Pellock , Alexis Courbet , William Sheffler , Jue Wang , Preetham Venkatesh , Isaac Sappington , Susana Vázquez Torres , Anna Lauko , Valentin De Bortoli , Emile Mathieu , Regina Barzilay , Tommi S. Jaakkola , Frank DiMaio , Minkyung Baek , David Baker <<<
Abstract:
AbstractThere has been considerable recent progress in designing new proteins using deep learning methods1–9. Despite this progress, a general deep learning framework for protein design that enables solution of a wide range of design challenges, includingde novobinder design and design of higher order symmetric architectures, has yet to be described. Diffusion models10,11have had considerable success in image and language generative modeling but limited success when applied to protein modeling, likely due to the complexity of protein backbone geometry and sequence-structure relationshi… >>>
AbstractThere has been considerable recent progress in designing new proteins using deep learning methods1–9. Despite this progress, a general deep learning framework for protein design that enables solution of a wide range of design challenges, including<i>de novo</i>binder design and design of higher order symmetric architectures, has yet to be described. Diffusion models10,11have had considerable success in image and language generative modeling but limited success when applied to protein modeling, likely due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding, and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold Diffusion (RF<i>diffusion</i>), by experimentally characterizing the structures and functions of hundreds of new designs. In a manner analogous to networks which produce images from user-specified inputs, RF<i>diffusion</i>enables the design of diverse, complex, functional proteins from simple molecular specifications. <<<
25.
张德祥 (2023-03-20 10:45):
#paper doi: https://doi.org/10.1101/2022.05.17.492325 Inferring Neural Activity Before Plasticity: A Foundation for Learning Beyond Backpropagation 超越GPT需要从更底层的技术改进,BP是深度学习的核心,生物算法比BP更高效,生物算法是超越BP的一个途径,这篇论文给出了很好的解释及后续论文有一些实验及算法,效率已经可以匹配BP,仍然有更多的优点, 更多可以参考 https://mp.weixin.qq.com/s/lPzGvY6oOnwzVgxDr9ePpA
Yuhang Song , Beren Millidge , Tommaso Salvatori , Thomas Lukasiewicz , Zhenghua Xu , Rafal Bogacz
Abstract:
AbstractFor both humans and machines, the essence of learning is to pinpoint which components in its information processing pipeline are responsible for an error in its output — a challenge that is known ascredit assignment. How the brain solves credit assignment is a key question in neuroscience, and also of significant importance for artificial intelligence. It has long been assumed that credit assignment is best solved by backpropagation, which is also the foundation of modern machine learning. However, it has been questioned whether it is possible for the brain to implement backpro… >>>
AbstractFor both humans and machines, the essence of learning is to pinpoint which components in its information processing pipeline are responsible for an error in its output — a challenge that is known as<i>credit assignment</i>. How the brain solves credit assignment is a key question in neuroscience, and also of significant importance for artificial intelligence. It has long been assumed that credit assignment is best solved by backpropagation, which is also the foundation of modern machine learning. However, it has been questioned whether it is possible for the brain to implement backpropagation and learning in the brain may actually be more efficient and effective than backpropagation. Here, we set out a fundamentally different principle on credit assignment, called<i>prospective configuration</i>. In prospective configuration, the network first infers the pattern of neural activity that should result from learning, and then the synaptic weights are modified to consolidate the change in neural activity. We demonstrate that this distinct mechanism, in contrast to backpropagation, (1) underlies learning in a well-established family of models of cortical circuits, (2) enables learning that is more efficient and effective in many contexts faced by biological organisms, and (3) reproduces surprising patterns of neural activity and behaviour observed in diverse human and animal learning experiments. Our findings establish a new foundation for learning beyond backpropagation, for both understanding biological learning and building artificial intelligence. <<<
26.
muton (2023-01-31 23:03):
#paper # Yu, W., Zadbood, A., Chanales, A. J., & Davachi, L. (2022). Repetition accelerates neural markers of memory consolidation. bioRxiv, 2022-12.https://doi.org/10.1101/2022.12.14.520481; 认知加工过程中一旦体验结束,神经记忆表征就开始通过记忆回放的过程得到加强和转化。使用功能磁共振成像技术,作者研究了编码过程中通过重复操纵而改变的记忆强度如何调节人类的编码后回放。结果显示,重复不能增强海马的回放频率,但是皮层区域的回放以及皮层海马共同协调的回放在重复事件中被显著增强,表明重复加速了记忆巩固的过程,另外在海马和皮层的回放频率可以调节即时联想辨认测试中编码较弱的信息的行为成功率,这表明了编码后回放在帮助回忆曾经出现过事件的重要作用。总的来说这篇文章突出了回放在巩固较弱记忆和加速皮层记忆巩固来增强记忆过程中的作用。
Wangjing Yu , Asieh Zadbood , Avi J. H. Chanales , Lila Davachi
Abstract:
AbstractNo sooner is an experience over than its neural memory representation begins to be strengthened and transformed through the process of memory replay. Using fMRI, we examined how memory strength manipulated through repetition during encoding modulates post-encoding replay in humans. Results revealed that repetition did not increase replay frequency in the hippocampus. However, replay in cortical regions and hippocampal-cortical coordinated replay were significantly enhanced for repeated events, suggesting that repetition accelerates the consolidation process. Interestingly, we found th… >>>
AbstractNo sooner is an experience over than its neural memory representation begins to be strengthened and transformed through the process of memory replay. Using fMRI, we examined how memory strength manipulated through repetition during encoding modulates post-encoding replay in humans. Results revealed that repetition did not increase replay frequency in the hippocampus. However, replay in cortical regions and hippocampal-cortical coordinated replay were significantly enhanced for repeated events, suggesting that repetition accelerates the consolidation process. Interestingly, we found that replay frequency in both hippocampus and cortex modulated behavioral success on an immediate associative recognition test for the weakly encoded information, indicating a significant role for post-encoding replay in rescuing once-presented events. Together, our findings highlight the relationships of replay to stabilizing weak memories and accelerating cortical consolidation for strong memories. <<<
27.
muton (2022-12-31 22:43):
#paper doi: https://doi.org/10.1101/2022.10.03.510672 Human hippocampal ripples signal encoding of episodic memories biorixv 2022 海马尖波涟漪是在哺乳动物电生理中发现的一个很特别具有代表性的成分,最开始是在小鼠研究中被发现,随着人类脑电记录的发展,颅内记录的出现让研究尖波涟漪在人类中变为现实,以往在人类的研究中更多关注于ripple和记忆提取之间的关系,很少研究在编码信息,尤其是单个项目时ripple的作用,本文则填补了这一空白,通过124名被试的情景记忆任务表现,作者发现虽然在MTL等重要脑区能够发现高频信号的随后记忆效应,但ripple并未表现出差异,但令人新奇的是ripple会在记忆item在编码时间上相近或语义相近的item时表现出更频繁的发放,也被称为一种聚类效应,并且这一现象在编码和提取阶段都能够被发现,这种现象可能代表了一种对于记忆的保留,有助于预测和提取记忆。本篇文章对于探究ripple这一脑电成分在人类情景记忆中的功能有重要提示。
John J. Sakon , David J. Halpern , Daniel R. Schonhaut , Michael J. Kahana
Abstract:
AbstractRecent human electrophysiology work has uncovered the presence of high frequency oscillatory events, termed ripples, during awake behavior. This prior work focuses on ripples in the medial temporal lobe (MTL) during memory retrieval. Few studies, however, investigate ripples during item encoding. Many studies have found neural activity during encoding that predicts later recall, termed subsequent memory effects (SMEs), but it is unclear if ripples during encoding also predict subsequent recall. Detecting ripples in 124 neurosurgical participants performing an episodic memory task, we … >>>
AbstractRecent human electrophysiology work has uncovered the presence of high frequency oscillatory events, termed ripples, during awake behavior. This prior work focuses on ripples in the medial temporal lobe (MTL) during memory retrieval. Few studies, however, investigate ripples during item encoding. Many studies have found neural activity during encoding that predicts later recall, termed subsequent memory effects (SMEs), but it is unclear if ripples during encoding also predict subsequent recall. Detecting ripples in 124 neurosurgical participants performing an episodic memory task, we find insignificant ripple SMEs in any MTL region, even as these regions exhibit robust high frequency activity (HFA) SMEs. Instead, hippocampal ripples increase during encoding of items leading to recall of temporally or semantically associated items, a phenomenon known as clustering. This subsequent clustering effect (SCE) arises specifically when hippocampal ripples occur during both encoding and retrieval, suggesting that ripples mediate the encoding and future reinstatement of episodic memories. <<<
28.
Ricardo (2022-10-31 23:13):
#paper doi:https://doi.org/10.1101/251512 Unbiased construction of a temporally consistent morphological atlas of neonatal brain development 这是UCL一名已毕业的博士在博士期间做的新生儿脑模板构建的工作,但是一直没有见刊,至今还挂在bioRxiv上。为构建无偏的脑模板,作者首先通过成对的线性配准寻找公共空间,在这个全局配准阶段,模板构建算法可以暂时忽略全局的形状变化,而专注于局部的形变。其次,作者介绍了一个快速且无偏的配准算法。最后,作者利用kernel regression的方法分配每个被试的权重,用于生成对应孕周的脑模板。
bioRxiv, 2018. DOI: 10.1101/251512
Andreas Schuh , Antonios Makropoulos , Emma C. Robinson , Lucilio Cordero-Grande , Emer Hughes , Jana Hutter , Anthony N. Price , Maria Murgasova , Rui Pedro A. G. Teixeira , Nora Tusor ... >>>
Andreas Schuh , Antonios Makropoulos , Emma C. Robinson , Lucilio Cordero-Grande , Emer Hughes , Jana Hutter , Anthony N. Price , Maria Murgasova , Rui Pedro A. G. Teixeira , Nora Tusor , Johannes K. Steinweg , Suresh Victor , Mary A. Rutherford , Joseph V. Hajnal , A. David Edwards , Daniel Rueckert <<<
Abstract:
AbstractPremature birth increases the risk of developing neurocognitive and neurobe-havioural disorders. The mechanisms of altered brain development causing these disorders are yet unknown. Studying the morphology and function of the brain during maturation provides us not only with a better understanding of normal development, but may help us to identify causes of abnormal development and their consequences. A particular difficulty is to distinguish abnormal patterns of neurodevelopment from normal variation. The Developing Human Connectome Project (dHCP) seeks to create a detailed four-dime… >>>
AbstractPremature birth increases the risk of developing neurocognitive and neurobe-havioural disorders. The mechanisms of altered brain development causing these disorders are yet unknown. Studying the morphology and function of the brain during maturation provides us not only with a better understanding of normal development, but may help us to identify causes of abnormal development and their consequences. A particular difficulty is to distinguish abnormal patterns of neurodevelopment from normal variation. The Developing Human Connectome Project (dHCP) seeks to create a detailed four-dimensional (4D) connectome of early life. This connectome may provide insights into normal as well as abnormal patterns of brain development. As part of this project, more than a thousand healthy fetal and neonatal brains will be scanned <i>in vivo.</i> This requires computational methods which scale well to larger data sets. We propose a novel groupwise method for the construction of a spatio-temporal model of mean morphology from cross-sectional brain scans at different gestational ages. This model scales linearly with the number of images and thus improves upon methods used to build existing public neonatal atlases, which derive correspondence between all pairs of images. By jointly estimating mean shape and longitudinal change, the atlas created with our method overcomes temporal inconsistencies, which are encountered when mean shape and intensity images are constructed separately for each time point. Using this approach, we have constructed a spatio-temporal atlas from 275 healthy neonates between 35 and 44 weeks post-menstrual age (PMA). The resulting atlas qualitatively preserves cortical details significantly better than publicly available atlases. This is moreover confirmed by a number of quantitative measures of the quality of the spatial normalisation and sharpness of the resulting template brain images. <<<
29.
周周复始 (2022-10-26 20:17):
#paper doi: https://doi.org/10.1101/2021.03.04.433968,Deep Diffusion MRI Registration (DDMReg): A Deep Learning Method for Diffusion MRI Registration。本文基于深度学习提出了新的配准框架,用于dmri数据的配准。由于dmri数据既包含水分子扩散强度也包含水扩散方向信息,所以配准dmri,既要使全脑解剖结构对齐也要让纤维束方向保持一致,传统配准方法存在的问题是要么不包含方向信息,要么是专门针对纤维束进行配准不能保证全脑结构的对齐。本文方法的输入数据包含了代表全脑解剖结构信息的FA图像和代表纤维束方向的TOM图像,通过一个基于voxelmorph改进后的DDMReg网络架构,训练出的模型效果与最先进的四种方法(SyN,DTI-Tk,MRReg,voxelmorph)相比是最优的。
Fan Zhang , William M. Wells , Lauren J. O’Donnell
Abstract:
AbstractIn this paper, we present a deep learning method, DDMReg, for accurate registration between diffusion MRI (dMRI) datasets. In dMRI registration, the goal is to spatially align brain anatomical structures while ensuring that local fiber orientations remain consistent with the underlying white matter fiber tract anatomy. DDMReg is a novel method that uses joint whole-brain and tract-specific information for dMRI registration. Based on the successful VoxelMorph framework for image registration, we propose a novel registration architecture that leverages not only whole brain information b… >>>
AbstractIn this paper, we present a deep learning method, DDMReg, for accurate registration between diffusion MRI (dMRI) datasets. In dMRI registration, the goal is to spatially align brain anatomical structures while ensuring that local fiber orientations remain consistent with the underlying white matter fiber tract anatomy. DDMReg is a novel method that uses joint whole-brain and tract-specific information for dMRI registration. Based on the successful VoxelMorph framework for image registration, we propose a novel registration architecture that leverages not only whole brain information but also tract-specific fiber orientation information. DDMReg is an unsupervised method for deformable registration between pairs of dMRI datasets: it does not require nonlinearly pre-registered training data or the corresponding deformation fields as ground truth. We perform comparisons with four state-of-the-art registration methods on multiple independently acquired datasets from different populations (including teenagers, young and elderly adults) and different imaging protocols and scanners. We evaluate the registration performance by assessing the ability to align anatomically corresponding brain structures and ensure fiber spatial agreement between different subjects after registration. Experimental results show that DDMReg obtains significantly improved registration performance compared to the state-of-the-art methods. Importantly, we demonstrate successful generalization of DDMReg to dMRI data from different populations with varying ages and acquired using different acquisition protocols and different scanners. <<<
30.
颜林林 (2022-09-11 23:59):
#paper doi:10.1101/2022.09.09.453067 bioRxiv, 2022, HexSE: Simulating evolution in overlapping reading frames. 重叠基因是在病毒(质粒)中发现的一种有趣现象,即同一段核酸序列,因为翻译蛋白质的起始位置不同(即阅读框不同)导致形成不同蛋白。到目前为止的研究,发现在许多物种中都存在此现象。本文通过分析序列演化速率,来从积累的大量已被测序的基因组数据中,寻找这样的重叠基因。其基本假设是,如果存在重叠基因,则相应序列上受到的演化选择压力会有所不同,于是在结果上呈现出不同的演化速率。这是个很有意思的思路和研究课题。
Laura Munoz-Baena , Kaitlyn Wade , Art Poon
Abstract:
Motivation: Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may provide a mechanism to increase the information content of compact genomes. The presence of overlapping reading frames (OvRFs) can skew estimates of selection based on the rates of non-synonymous and synonymous substitutions, since a substitution that is synonymous in one reading frame may be non-synonymous in another, and vice versa.
Results: To understand the impact of OvRFs on molecular evoluti… >>>
Motivation: Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may provide a mechanism to increase the information content of compact genomes. The presence of overlapping reading frames (OvRFs) can skew estimates of selection based on the rates of non-synonymous and synonymous substitutions, since a substitution that is synonymous in one reading frame may be non-synonymous in another, and vice versa. <br>Results: To understand the impact of OvRFs on molecular evolution, we implemented a versatile simulation model of nucleotide sequence evolution along a phylogeny with an arbitrary distribution of reading frames. We use a custom data structure to track the substitution rates at every nucleotide site, which is determined by the stationary nucleotide frequencies, transition bias, and the distribution of selection biases (dN/dS) in the respective reading frames. <br>Availability and implementation: Our simulation model is implemented in the Python scripting language. All source code is released under the GNU General Public License (GPL) version 3, and is available at https://github.com/PoonLab/HexSE. <<<
31.
颜林林 (2022-08-26 23:18):
#paper doi:10.1101/2022.08.24.505159 bioRxiv, 2022, A genome-wide atlas of recurrent repeat expansions in human cancer. 这篇来自斯坦福大学的Michael Snyder团队。通过重分析来自ICGC和TCGA的2622个癌症全基因组测序数据,涉及29个癌种,从中鉴定出160个重复序列扩张(recurrent repeat expansions, rRE)事件,且这些事件绝大多数都与特定癌症亚型相关。这些重复序列所处基因组区域,也富集在某些基因的调控元件附近,提示了它们在基因调控方面可能发挥作用。其中一个GAAA重复发生在UGT2B7基因的内含子中,在34%的肾细胞癌样本中都能观察到,于是通过斯坦福癌症中心入组了12例肾癌病例,对其样本开展了二代测序(Illumina NovaSeq)和三代测序(PacBio),验证了该rRE事件的发生。
Graham Scott Erwin , Gamze Gursoy , Rashid Al-Abri , Ashwini Suriyaprakash , Egor Dolzhenko , Kevin Zhu , Christian R Hoerner , Shannon M White , Lucia Ramirez , Ananya Vadlakonda ... >>>
Graham Scott Erwin , Gamze Gursoy , Rashid Al-Abri , Ashwini Suriyaprakash , Egor Dolzhenko , Kevin Zhu , Christian R Hoerner , Shannon M White , Lucia Ramirez , Ananya Vadlakonda , Alekhya Vadlakonda , Konor von Kraut , Julia Park , Charlotte M Brannon , Daniel A Sumano , Raushun A Kirtikar , Alicia A Erwin , Thomas J Metzner , Ryan K. C. Yuen , Alice C Fan , John T Leppert , Michael A Eberle , Mark Gerstein , Michael P Snyder <<<
Abstract:
Expansion of a single repetitive DNA sequence, termed a tandem repeat (TR), is known to cause more than 50 diseases. However, repeat expansions are often not explored beyond neurological and neurodegenerative disorders. In some cancers, mutations accumulate in short tracts of TRs (STRs), a phenomenon termed microsatellite instability (MSI); however larger repeat expansions have not been systematically analyzed in cancer. Here, we identified TR expansions in 2,622 cancer genomes, spanning 29 cancer types. In 7 cancer types, we found 160 recurrent repeat expansions (rREs); most of these (155/16… >>>
Expansion of a single repetitive DNA sequence, termed a tandem repeat (TR), is known to cause more than 50 diseases. However, repeat expansions are often not explored beyond neurological and neurodegenerative disorders. In some cancers, mutations accumulate in short tracts of TRs (STRs), a phenomenon termed microsatellite instability (MSI); however larger repeat expansions have not been systematically analyzed in cancer. Here, we identified TR expansions in 2,622 cancer genomes, spanning 29 cancer types. In 7 cancer types, we found 160 recurrent repeat expansions (rREs); most of these (155/160) were subtype specific. We found that rREs were non-uniformly distributed in the genome with an enrichment near candidate cis-regulatory elements, suggesting a role in gene regulation. One rRE located near a regulatory element in the first intron of UGT2B7 was detected in 34% of renal cell carcinoma samples and was validated by long-read DNA sequencing. Moreover, targeting cells harboring this rRE with a rationally designed, sequence-specific DNA binder led to a dose-dependent decrease in cell proliferation. Overall, our results demonstrate that rREs are an important but unexplored source of genetic variation in human cancers, and we provide a comprehensive catalog for further study. <<<
32.
惊鸿 (2022-08-14 18:12):
#paper doi:10.1101/2022.08.08.503198 Bilallelic germline mutations in MAD1L1 induce a novel syndrome of aneuploidy with high tumor susceptibility MAD1L1是编码纺锤体组装检查点 (SAC) 蛋白MAD1的基因,发生在一名36岁的患有十几个肿瘤的女性身上,包括五个恶性肿瘤。外周血细胞的功能研究表明缺乏全长蛋白质和SAC反应不足,导致细胞遗传学和单细胞 (sc) 检测到约30-40% 的非整倍体细胞DNA分析。对患者血细胞的scRNA-seq分析确定了线粒体应激伴随全身炎症,干扰素和NFkB信号增强。MAD1L1突变还导致 γ δ T细胞的特异性克隆扩增,增加了18号染色体并增强了细胞毒性,以及具有慢性淋巴细胞白血病细胞特征的染色体12增益和转录组特征的中间b细胞。这些数据表明MAD1L1突变是一种新的具有全身炎症和前所未有的肿瘤易感性的非整倍体综合征的原因。 仅仅一个基因片段就可以给全身带来变化,这些变化有好有坏,所以基因编辑不是消消乐,是一个严谨的技术,这是一个基因工程师应有的心态
Carolina Villarroya-Beltri , Ana Osorio , Raúl Torres-Ruiz , David Gómez-Sánchez , Marianna Trakala , Agustin Sánchez-Belmonte , Fátima Mercadillo , Borja Pitarch , Almudena Hernández-Núñez , Antonio Gómez-Caturla ... >>>
Carolina Villarroya-Beltri , Ana Osorio , Raúl Torres-Ruiz , David Gómez-Sánchez , Marianna Trakala , Agustin Sánchez-Belmonte , Fátima Mercadillo , Borja Pitarch , Almudena Hernández-Núñez , Antonio Gómez-Caturla , Daniel Rueda , José Perea , Sandra Rodríguez-Perales , Marcos Malumbres , Miguel Urioste <<<
Abstract:
Aneuploidy is a frequent feature of human tumors. Germline mutations leading to aneuploidy are very rare in humans, and their tumor-promoting properties are mostly unknown at the molecular level. We report here novel germline biallelic mutations in MAD1L1, the gene encoding the Spindle Assembly Checkpoint (SAC) protein MAD1, in a 36-year-old female with a dozen of neoplasias, including five malignant tumors. Functional studies in peripheral blood cells demonstrated lack of full-length protein and deficient SAC response, resulting in ∼30-40% of aneuploid cells as detected by cytogenetic… >>>
Aneuploidy is a frequent feature of human tumors. Germline mutations leading to aneuploidy are very rare in humans, and their tumor-promoting properties are mostly unknown at the molecular level. We report here novel germline biallelic mutations in <i>MAD1L1</i>, the gene encoding the Spindle Assembly Checkpoint (SAC) protein MAD1, in a 36-year-old female with a dozen of neoplasias, including five malignant tumors. Functional studies in peripheral blood cells demonstrated lack of full-length protein and deficient SAC response, resulting in ∼30-40% of aneuploid cells as detected by cytogenetic and single-cell (sc) DNA analysis. scRNA-seq analysis of patient blood cells identified mitochondrial stress accompanied by systemic inflammation with enhanced interferon and NFkB signaling. The inference of chromosomal aberrations from scRNA-seq analysis detected inflammatory signals both in aneuploid and euploid cells, suggesting a non-cell autonomous response to aneuploidy. In addition to random aneuploidies, <i>MAD1L1</i> mutations resulted in specific clonal expansions of γδ T-cells with chromosome 18 gains and enhanced cytotoxic profile, as well as intermediate B-cells with chromosome 12 gains and transcriptomic signatures characteristic of chronic lymphocytic leukemia cells. These data point to <i>MAD1L1</i> mutations as the cause of a new aneuploidy syndrome with systemic inflammation and unprecedented tumor susceptibility. <<<
33.
颜林林 (2022-08-02 23:38):
#paper doi:10.1101/2020.02.16.951657 bioRxiv, 2022, APA-Scan: Detection and Visualization of 3'-UTR Alternative Polyadenylation with RNA-seq and 3'-end-seq Data. 在真核生物中存在一种名为APA(可变的多聚腺苷酸)的机制,通过形成不同的可变剪接,使表达的基因的3'-UTR区域携带不同长度的poly-A(多聚腺苷酸)序列,从而实现精细调控基因表达(包括降解等)。本文开发了一个计算工具APA-Scan,能够基于RNA-seq数据,分析并充分考虑其相关区域的测序深度信息,鉴定APA事件,给出相应注释,并提供图形化展示,弥补了过去其他工具方法在这方面的缺失和不足。本文还通过对模拟数据和两个实际公共数据集(DaPars和APAtrap)进行分析评测,并使用qPCR实验进行了验证。
Naima Ahmed Fahmi , Khandakar Tanvir Ahmed , Jae-Woong Chang , Heba Nassereddeen , Deliang Fan , Jeongsik Yong , Wei Zhang
Abstract:
BackgroundThe eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3’-untranslated region (3’-UTR) of mRNA produces transcripts with shorter or longer 3’-UTR. Often, 3’-UTR serves as a binding platform for microRNAs and RNA-binding proteins, which affect the fate of the mRNA transcript. Thus, 3’-UTR APA is known to modulate translation and provides a mean to regulate gene expression at the post-transcriptional level. Current bioinformatics pipelines have limited capability in profiling 3’-UTR APA ev… >>>
BackgroundThe eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3’-untranslated region (3’-UTR) of mRNA produces transcripts with shorter or longer 3’-UTR. Often, 3’-UTR serves as a binding platform for microRNAs and RNA-binding proteins, which affect the fate of the mRNA transcript. Thus, 3’-UTR APA is known to modulate translation and provides a mean to regulate gene expression at the post-transcriptional level. Current bioinformatics pipelines have limited capability in profiling 3’-UTR APA events due to incomplete annotations and a low-resolution analyzing power: widely available bioinformatics pipelines do not reference actionable polyadenylation (cleavage) sites but simulate 3’-UTR APA only using RNA-seq read coverage, causing false positive identifications. To overcome these limitations, we developed APA-Scan, a robust program that identifies 3’-UTR APA events and visualizes the RNA-seq short-read coverage with gene annotations.MethodsAPA-Scan utilizes either predicted or experimentally validated actionable polyadenylation signals as a reference for polyadenylation sites and calculates the quantity of long and short 3’-UTR transcripts in the RNA-seq data. APA-Scan works in three major steps: (i) calculate the read coverage of the 3’-UTR regions of genes; (ii) identify the potential APA sites and evaluate the significance of the events among two biological conditions; (iii) graphical representation of user specific event with 3’-UTR annotation and read coverage on the 3’-UTR regions. APA-Scan is implemented in Python3. Source code and a comprehensive user’s manual are freely available at https://github.com/compbiolabucf/APA-Scan.ResultAPA-Scan was applied to both simulated and real RNA-seq datasets and compared with two widely used baselines DaPars and APAtrap. In simulation APA-Scan significantly improved the accuracy of 3’-UTR APA identification compared to the other baselines. The performance of APA-Scan was also validated by 3’-end-seq data and qPCR on mouse embryonic fibroblast cells. The experiments confirm that APA-Scan can detect unannotated 3’ -UTR APA events and improve genome annotation.ConclusionAPA-Scan is a comprehensive computational pipeline to detect transcriptome-wide 3’-UTR APA events. The pipeline integrates both RNA-seq and 3’-end-seq data information and can efficiently identify the significant events with a high-resolution short reads coverage plots. <<<
34.
象棋 (2022-07-31 23:20):
#paper doi:https://doi.org/10.1101/2021.03.16.435524, bioRxiv preprint, (2021), Decoding the Information Structure Underlying the Neural Representation of Concepts. 人类对于语义概念的表征有三种,taxonomic(动物,工具等,强调类别),sensory-motor(苹果是红色的圆圆的很甜,强调各种特征),distributed(消防员和水龙头,强调共同出现的频率)。作者利用各种语料库得到了三种表征方式的行为模型,然后将这些行为模型和脑信号模型做相关,发现大部分脑区的表征方式为sensory-motor。
Leonardo Fernandino , Jia-Qing Tong , Lisa L. Conant , Colin J. Humphries , Jeffrey R. Binder
Abstract:
AbstractThe nature of the representational code underlying conceptual knowledge remains a major unsolved problem in cognitive neuroscience. We assessed the extent to which different representational systems contribute to the instantiation of lexical concepts in high-level, heteromodal cortical areas previously associated with semantic cognition. We found that lexical semantic information can be reliably decoded from a wide range of heteromodal cortical areas in frontal, parietal, and temporal cortex. In most of these areas, we found a striking advantage for experience-based representational s… >>>
AbstractThe nature of the representational code underlying conceptual knowledge remains a major unsolved problem in cognitive neuroscience. We assessed the extent to which different representational systems contribute to the instantiation of lexical concepts in high-level, heteromodal cortical areas previously associated with semantic cognition. We found that lexical semantic information can be reliably decoded from a wide range of heteromodal cortical areas in frontal, parietal, and temporal cortex. In most of these areas, we found a striking advantage for experience-based representational structures (i.e., encoding information about sensory-motor, affective, and other features of phenomenal experience), with little evidence for independent taxonomic or distributional organization. These results were found independently for object and event concepts. Our findings indicate that concept representations in heteromodal cortex are based, at least in part, on experiential information. They also reveal that, in most heteromodal areas, event concepts have more heterogeneous representations (i.e., they are more easily decodable) than object concepts, and that other areas beyond the traditional “semantic hubs” contribute to semantic cognition, particularly the posterior cingulate gyrus and the precuneus. <<<
35.
颜林林 (2022-07-23 22:05):
#paper doi:10.1101/2022.07.21.500999 bioRxiv, 2022, High-resolution de novo structure prediction from primary sequence. 这篇预发表的文章,开发了一个工具,OmegaFold,可以基于单个蛋白的一级序列信息,预测三级结构。现在主流的方法,都需要依赖演化信息,即通过多序列比对作为辅助,进行蛋白质折叠结构的预测。而本文认为,蛋白从被翻译合成出来后,就会经历从一级序列自动折叠成为三级结构,因而这些演化信息对于结构预测而言并非必要。本文采取的深度模型,会依赖于一组预训练模型,帮助识别出一级序列中哪些氨基酸更为重要(即赋予不同的注意力),并采取基于BERT的语言模型技术,帮助进行蛋白质折叠的模型训练。最终实现的方法,可以有效解决孤儿蛋白(即当前结构数据库中缺乏其他可供参考的相近蛋白)的结构预测问题,且与AlphaFold等工具相比,在准确度上又有显著提升。
Ruidong Wu , Fan Ding , Rui Wang , Rui Shen , Xiwen Zhang , Shitong Luo , Chenpeng Su , Zuofan Wu , Qi Xie , Bonnie Berger ... >>>
Ruidong Wu , Fan Ding , Rui Wang , Rui Shen , Xiwen Zhang , Shitong Luo , Chenpeng Su , Zuofan Wu , Qi Xie , Bonnie Berger , Jianzhu Ma , Jian Peng <<<
Abstract:
Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such as with orphan proteins and fast-evolving proteins like antibodies, and a protein typically folds in a natural setting from its primary amino acid sequence into its three-dimensional structure, suggesting that evolutionary information and MSAs should not be necessary to predict a protein's folded form. Here, we introduce OmegaFold, the first computational method to s… >>>
Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such as with orphan proteins and fast-evolving proteins like antibodies, and a protein typically folds in a natural setting from its primary amino acid sequence into its three-dimensional structure, suggesting that evolutionary information and MSAs should not be necessary to predict a protein's folded form. Here, we introduce OmegaFold, the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures. OmegaFold enables accurate predictions on orphan proteins that do not belong to any functionally characterized protein family and antibodies that tend to have noisy MSAs due to fast evolution. Our study fills a much-needed structure prediction gap and brings us a step closer to understanding protein folding in nature. <<<
36.
颜林林 (2022-07-20 07:49):
#paper doi:10.1101/2022.07.17.500374 bioRxiv, 2022, Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations. 这是一个“认真地做一件小事”的例子。在做基因组分析时,我们经常遭遇“究竟该用hg19还是hg38”的纠结,有时候不得不并行地分别使用两个参考基因组来进行两次差不多的分析,以避免由于使用liftOver之类的基因组坐标转换工具带来的信息丢失。这篇文章针对这个小小的(甚至不那么常见的)痛点,在兼容现有VCF格式的情况下,使其在同一个结果文件中带上两套基因组坐标,不仅不影响现有工具的使用,而且可以随时从中进行所需基因组坐标的提取。想法很简单,实现也不难,但却的确是有效解决了某些实际操作的问题。
Divon Mordechai Lan , Gludhug Purnomo , Raymond Tobler , Yassine Souilmi , Bastien Llamas
Abstract:
We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented in the Genozip platform, DVCF enables bioinformatics pipelines to seamlessly operate across two coordinate systems by leveraging the system most advantageous to each pipeline step, simplifying bioinformatics workflows and reducing file generation and associated data storage burden. Moreover, our benchmarking of Genozip DVCF shows that it produces more complete, less erroneous, and l… >>>
We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented in the Genozip platform, DVCF enables bioinformatics pipelines to seamlessly operate across two coordinate systems by leveraging the system most advantageous to each pipeline step, simplifying bioinformatics workflows and reducing file generation and associated data storage burden. Moreover, our benchmarking of Genozip DVCF shows that it produces more complete, less erroneous, and less biased translations across coordinate systems than two widely used alternative tools (i.e., LiftoverVcf and CrossMap). <<<
37.
颜林林 (2022-07-18 06:00):
#paper doi:10.1101/2022.07.14.500036 bioRxiv, 2022, Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics. 单细胞转录组测序数据分析中,需要对批次效应影响进行去除。这通常是对原本高维的数据进行降维,使其在更容易反映出数据结构特征的低维空间上,根据批次信息对数据进行矫正。这个过程很容易导致具有生物学意义的数据特征被误伤,而这样的生物学差异正是我们进行单细胞测序所要研究的对象。针对如何去除批次效应影响,以及如何保留生物学相关数据差异,这两个原本互相矛盾的目标,通常被单细胞测序分析工具根据其各自策略原则的不同,会被选取其中之一作为优先目标进行优化。在本文中,作者通过引入一种名为帕累托多任务学习(Pareto MTL)的多目标优化技术,使综合评估并权衡与两者有关的多种不同指标,以获得整体更优的目的。在这个过程中,还基于神经网络方法,提出一种名为交互信息神经估计(Mutual Information Neural Estimation,MINE)的指标,来帮助该平衡点的选取。文章使用了TM-MARROW和MACAQUE-RETINA等公共数据集,对方法进行了评估,并展示了MINE的效果,确实优于常用的MMD方法。
Hui Li , Davis J. McCarthy , Heejung Shim , Susan Wei
Abstract:
Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low dimensional latent representations better to understand overall structure in the data. In this work, we build upon scVI, a powerful deep generative model which can learn biologically meaningful latent representations, but which has limited explicit control of batch effects. Rather than prioritizing batch effect removal over conservation of biological variation, or vice versa… >>>
Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low dimensional latent representations better to understand overall structure in the data. In this work, we build upon scVI, a powerful deep generative model which can learn biologically meaningful latent representations, but which has limited explicit control of batch effects. Rather than prioritizing batch effect removal over conservation of biological variation, or vice versa, our goal is to provide a bird eye view of the trade-offs between these two conflicting objectives. Specifically, using the well established concept of Pareto front from economics and engineering, we seek to learn the entire trade-off curve between conservation of biological variation and removal of batch effects. A multi-objective optimisation technique known as Pareto multi-task learning (Pareto MTL) is used to obtain the Pareto front between conservation of biological variation and batch effect removal. Our results indicate Pareto MTL can obtain a better Pareto front than the naive scalarization approach typically encountered in the literature. In addition, we propose to measure batch effect by applying a neural-network based estimator called Mutual Information Neural Estimation (MINE) and show benefits over the more standard Maximum Mean Discrepancy (MMD) measure. The Pareto front between conservation of biological variation and batch effect removal is a valuable tool for researchers in computational biology. Our results demonstrate the efficacy of applying Pareto MTL to estimate the Pareto front in conjunction with applying MINE to measure the batch effect. <<<
38.
颜林林 (2022-07-11 00:41):
#paper doi:10.1101/2022.07.09.499321 bioRxiv, 2022, A Draft Human Pangenome Reference. 这应该又是一篇重磅文章,在bioRxiv上提前预发表出来。三十多家顶级单位合作,作者名单即使在使用“Human Pangenome Reference Consortium”做了浓缩后依然很长,包含不少让人熟知的名字,他们在过去这些年里曾反复出现在基因组学的各重磅文章中,比如其中就包含李恒这位大神,他赫然是通讯作者之一。全文篇幅长达97页(不含另外39页的补充材料),也反映出这项工作的体量重大。众所周知,我们一直在使用的人类参考基因组,其实来自最早的七八个人,他们的基因组,对于全人类的基因库而言,是很难相信有足够代表性的。于是这些年来,随着大量基因组数据的积累,参考基因组一直在更新迭代,打了一个又一个补丁。这篇文章所提出的“泛基因组参考(pangenome reference)”可以被认为是又一个重大改进和新版本发布,甚至可能这是接近“一劳永逸”的关键改进。它整合了多达47个个体基因组,这些个体基因组完成了定相位(phased)和二倍体组装(diploid assemblies)。且通过先前诸如HapMap、千人基因组等人类群体基因组研究的积累,确定了这47个个体的基因组差异足够大,能够涵盖超过 99% 的预期序列,并且在结构和碱基对水平上的准确率超过 99%。超长的篇幅中,详细展示了这套新参考基因组的完整构建过程,甚至精确到详细的命令行及参数,是非常值得仔细学习的。
Wen-Wei Liao, Mobin Asri, Jana Ebler, Daniel Doerr, Marina Haukness, View ORCID ProfileGlenn Hickey, Shuangjia Lu, Julian K. Lucas, View ORCID ProfileJean Monlong, Haley J. Abel, Silvia Buonaiuto, View ORCID ProfileXian H. Chang, Haoyu Cheng, Justin Chu, Vincenza Colonna, View ORCID ProfileJordan M. Eizenga, Xiaowen Feng, Christian Fischer, Robert S. Fulton, Shilpa Garg, Cristian Groza, Andrea Guarracino, William T. Harvey, Simon Heumos, Kerstin Howe, Miten Jain, View ORCID ProfileTsung-Yu Lu, View ORCID ProfileCharles Markello, View ORCID ProfileFergal J. Martin, Matthew W. Mitchell, View ORCID ProfileKatherine M. Munson, Moses Njagi Mwaniki, View ORCID ProfileAdam M. Novak, View ORCID ProfileHugh E. Olsen, View ORCID ProfileTrevor Pesout, View ORCID ProfileDavid Porubsky, View ORCID ProfilePjotr Prins, View ORCID ProfileJonas A. Sibbesen, Chad Tomlinson, View ORCID ProfileFlavia Villani, View ORCID ProfileMitchell R. Vollger, Human Pangenome Reference Consortium, View ORCID ProfileGuillaume Bourque, View ORCID ProfileMark J. P. Chaisson, View ORCID ProfilePaul Flicek, Adam M. Phillippy, Justin M. Zook, View ORCID ProfileEvan E. Eichler, View ORCID ProfileDavid Haussler, Erich D. Jarvis, View ORCID ProfileKaren H. Miga, Ting Wang, View ORCID ProfileErik Garrison, Tobias Marschall, View ORCID ProfileIra M. Hall, View ORCID ProfileHeng Li, View ORCID ProfileBenedict Paten
Abstract:
The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence and are more than 99% accurate at the structural and base-pair levels. Based on alignments of the assemblies, we generated a draft pangenome that captures known variants and haplotypes, reveals novel alleles at structurally complex loci, and adds 119 million base pairs of euchromatic polymorphic sequence and 1,529 gene duplications re… >>>
The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence and are more than 99% accurate at the structural and base-pair levels. Based on alignments of the assemblies, we generated a draft pangenome that captures known variants and haplotypes, reveals novel alleles at structurally complex loci, and adds 119 million base pairs of euchromatic polymorphic sequence and 1,529 gene duplications relative to the existing reference, GRCh38. Roughly 90 million of the additional base pairs derive from structural variation. Using our draft pangenome to analyze short-read data reduces errors when discovering small variants by 34% and boosts the detected structural variants per haplotype by 104% compared to GRCh38-based workflows, and by 34% compared to using previous diversity sets of genome assemblies. <<<
39.
颜林林 (2022-07-01 07:57):
#paper doi:10.1101/2022.06.27.497710 bioRxiv, 2022, PaliDIS: A tool for fast discovery of novel insertion sequences. 这是一篇有关的生信工具的文章,通讯作者来自Wellcome Sanger Institute。该工具从宏基因组数据中,寻找彼此之间含有相同重复片段的序列,将其比对到各组装好的微生物基因组上,将连锁位于同一组装序列且彼此反向互补的重复片段筛选出来,并经过一系列质控过滤,从而鉴别出在微生物基因组上发生的倒位形式的移动元件,以此帮助对耐药基因及其在不同菌种之间传播进行研究。类似流程在人类基因组分析中并不少见,且基本都是根据基因组事件及其序列特征直接进行实现,方法本身算不上有什么特别的创新之处。只不过应用于特定场景的特定数据集(在这篇文章里,数据是来自HMP,Human Microbiome Project,人类微生物计划),对分析结果进行(关于该移动元件的)统计描述和分析,倒是可行且常见的研究套路。
Victoria R Carr, Solon P. Pissis, Peter Mullany, Saeed Shoaie, David Gomez-Cabrero, David L. Moyes
Abstract:
The diversity of microbial insertion sequences, crucial mobile genetic elements in generating diversity in microbial genomes, needs to be better represented in current microbial databases. Identification of these sequences in microbiome communities presents some significant problems that have led to their underrepresentation. Here, we present a software tool called PaliDIS that recognises insertion sequences in metagenomic sequence data rapidly by identifying inverted terminal repeat regions from mixed microbial community genomes. Applying this software to 266 human metagenomes identifies 11,… >>>
The diversity of microbial insertion sequences, crucial mobile genetic elements in generating diversity in microbial genomes, needs to be better represented in current microbial databases. Identification of these sequences in microbiome communities presents some significant problems that have led to their underrepresentation. Here, we present a software tool called PaliDIS that recognises insertion sequences in metagenomic sequence data rapidly by identifying inverted terminal repeat regions from mixed microbial community genomes. Applying this software to 266 human metagenomes identifies 11,681 unique insertion sequences. Querying this catalogue against a large database of isolate genomes reveals evidence of horizontal gene transfer events of clinically relevant antimicrobial resistance genes between classes of bacteria. We will continue to apply this tool more widely, building the Insertion Sequence Catalogue, a valuable resource for researchers wishing to query their microbial genomes for insertion sequences. <<<
40.
颜林林 (2022-06-28 07:39):
#paper doi:10.1101/2022.06.22.497216 bioRxiv, 2022, Intratumoral mregDC and CXCL13 T helper niches enable local differentiation of CD8 T cells following PD-1 blockade. 这篇文章来自西奈山伊坎医学院,其病例队列出自一项用于非小细胞肺癌(NSCLC)、肝细胞癌(HCC)和头颈部鳞癌(HNSCC)的手术前抗PD-1免疫药物(西米普利单抗,Cemiplimab)新辅助治疗的多中心II期临床试验(NCT03916627,该临床试验尚在进行中,始于2019年,预计2024年完成)。本文仅针对其中的肝细胞癌患者,通过对其新辅助治疗后手术取样组织,开展TCR测序、全外显子测序、单细胞转录组测序、多重免疫组化等实验,寻找与新辅助治疗疗效相关的特定细胞类群。通过免疫组化和免疫荧光方法,确认在肿瘤中确实富含T细胞并浸润其中的患者,仍有部分患者对PD-1药物并无响应。对比响应者与无响应者之间的细胞类群组成差异,找到一个细胞类群组合,成熟调节树突状细胞(mregDC,LAMP3+)与 CXCL13+ CD4+ 辅助性T细胞,它们与 PD-1高表达的CD8+ T细胞前体结合,形成三元组,促使后者形成 PD-1高表达的 GZMK+ 效应T细胞。而在没有这两类细胞的情况下,后者将形成耗竭型CD8+ T细胞。这导致了该新辅助治疗的不同预后结局。这项研究也为进一步揭示免疫治疗相关机制提供了新的证据。
Assaf Magen , Pauline Hamon , Nathalie Fiaschi , Leanna Troncoso , Etienne Humblin , Darwin D'souza , Travis Dawson , Matthew D. Park , Joel Kim , Steven Hamel ... >>>
Assaf Magen , Pauline Hamon , Nathalie Fiaschi , Leanna Troncoso , Etienne Humblin , Darwin D'souza , Travis Dawson , Matthew D. Park , Joel Kim , Steven Hamel , Mark Buckup , Christie Chang , Alexandra Tabachnikova , Hara Schwartz , Nausicaa Malissen , Yonit Lavin , Alessandra Soares-Schanoski , Bruno Giotti , Samarth Hegde , Raphael Mattiuz , Clotilde Hennequin , Jessica Le Berichel , Zhen Zhao , Stephen Ward , Isabel Fiel , Colles Price , Nicolas Fernandez , Jiang He , Baijun Kou , Michael Dobosz , Lianjie Li , Christina Adler , Min Ni , Yi Wei , Wei Wang , Namita T. Gupta , Kunal Kundu , Kamil Cygan , Raquel P. Deering , Alex Tsankov , Seunghee Kim-Schulze , Sacha Gnjatic , Ephraim Kenigsberg , Myron Schwartz , Thomas U. Marron , Gavin Thurston , Alice O. Kamphorst , Miriam Merad <<<
Abstract:
Here, we leveraged a large neoadjuvant PD-1 blockade trial in patients with hepatocellular carcinoma (HCC) to search for correlates of response to immune checkpoint blockade (ICB) within T cell-rich tumors. We show that ICB response correlated with the clonal expansion of intratumoral CXCL13+ CH25H+ IL-21+ PD-1+ CD4 T helper cells (CXCL13+ Th) and Granzyme K+ PD-1+ effector-like CD8 T cells, whereas terminally exhausted CD39hi TOXhi PD-1hi CD8 T cells dominated in non-responders. Strikingly, most T cell receptor (TCR) clones that expanded post-treatment were found in pre-treatment biopsies. N… >>>
Here, we leveraged a large neoadjuvant PD-1 blockade trial in patients with hepatocellular carcinoma (HCC) to search for correlates of response to immune checkpoint blockade (ICB) within T cell-rich tumors. We show that ICB response correlated with the clonal expansion of intratumoral CXCL13+ CH25H+ IL-21+ PD-1+ CD4 T helper cells (CXCL13+ Th) and Granzyme K+ PD-1+ effector-like CD8 T cells, whereas terminally exhausted CD39hi TOXhi PD-1hi CD8 T cells dominated in non-responders. Strikingly, most T cell receptor (TCR) clones that expanded post-treatment were found in pre-treatment biopsies. Notably, PD-1+ TCF-1+ progenitor-like CD8 T cells were present in tumors of responders and non-responders and shared clones mainly with effector-like cells in responders or terminally differentiated cells in non-responders, suggesting that local CD8 T cell differentiation occurs upon ICB. We found that these progenitor CD8 T cells interact with CXCL13+ Th cells within cellular triads around dendritic cells enriched in maturation and regulatory molecules, or "mregDC". Receptor-ligand analysis revealed unique interactions within these triads that may promote the differentiation of progenitor CD8 T cells into effector-like cells upon ICB. These results suggest that discrete intratumoral niches that include mregDC and CXCL13+ Th cells control the differentiation of tumor-specific progenitor CD8 T cell clones in patients treated with ICB. <<<
回到顶部