文献收藏与分享平台

21.

尹志 (2023-04-30 10:32):

#paper Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models doi: https://doi.org/10.1101/2022.12.09.519842 这篇文章提出了一种全新的蛋白质设计方法，叫做rf diffusion，它使用深度生成学习生成全新的蛋白质结构。文章主要使用的是 diffusion model，考虑到蛋白质骨架的复杂几何性质以及氨基酸序列-结构的复杂关系，蛋白质生成任务一直以来的挑战很大。这篇工作使用diffusion model的思路如下：1.使用RoseTTAFold作为去噪网络，考虑到RoseTTA本来就是baker组用来做蛋白质设计的（更多的是基于物理的），这个去噪网络的选择还是很巧妙的；2.整个加噪去噪过程主要针对alpha碳原子的坐标进行，因此rf diffusion的思路是先对骨架结构进行生成的；3.然后full 的protein structure是通过backbone tracking的技术来实现的，这个过程可以理解为基于一些几何约束、bond的长度角度参数等等为已经预测的alpha碳原子添加缺失的bond和原子，4.侧链是通过rotamer实现的，rotamer是一个已经对每个氨基酸残基做了预先计算的库，它可以为你选择符合能量最优的构象的侧链结构。因此整个蛋白质生成的过程可以认为是深度生成模型+物理约束+后处理（预先计算）来实现的。当然，这篇工作也做了很多的实验对设计进行验证。baker组在之后使用了rfdiffusion做了后续的一些设计工作，包括De novo design of high-affinity protein binders to bioactive helical peptides这个工作，并在不久前开源了rf diffusion的代码，也有很多蛋白质设计的研究人员开始大量尝试基于rfdiffusion的设计，并尝试进行湿实验的验证，因此这绝对是一篇开创性的工作，值得各位小伙伴关注。

bioRxiv, 2022. DOI: 10.1101/2022.12.09.519842

Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models

翻译

Joseph L. Watson , David Juergens , Nathaniel R. Bennett , Brian L. Trippe , Jason Yim , Helen E. Eisenach , Woody Ahern , Andrew J. Borst , Robert J. Ragotte , Lukas F. Milles , ... >>>

Abstract:

AbstractThere has been considerable recent progress in designing new proteins using deep learning methods1–9. Despite this progress, a general deep learning framework for protein design that enables solution of a … >>>

翻译

22.

张德祥 (2023-03-20 10:45):

#paper doi: https://doi.org/10.1101/2022.05.17.492325 Inferring Neural Activity Before Plasticity: A Foundation for Learning Beyond Backpropagation 超越GPT需要从更底层的技术改进，BP是深度学习的核心，生物算法比BP更高效，生物算法是超越BP的一个途径，这篇论文给出了很好的解释及后续论文有一些实验及算法，效率已经可以匹配BP，仍然有更多的优点，更多可以参考 https://mp.weixin.qq.com/s/lPzGvY6oOnwzVgxDr9ePpA

bioRxiv, 2022. DOI: 10.1101/2022.05.17.492325

Inferring Neural Activity Before Plasticity: A Foundation for Learning Beyond Backpropagation

翻译

Yuhang Song , Beren Millidge , Tommaso Salvatori , Thomas Lukasiewicz , Zhenghua Xu , Rafal Bogacz

Abstract:

AbstractFor both humans and machines, the essence of learning is to pinpoint which components in its information processing pipeline are responsible for an error in its output — a challenge … >>>

翻译

23.

muton (2023-01-31 23:03):

#paper # Yu, W., Zadbood, A., Chanales, A. J., & Davachi, L. (2022). Repetition accelerates neural markers of memory consolidation. bioRxiv, 2022-12.https://doi.org/10.1101/2022.12.14.520481; 认知加工过程中一旦体验结束，神经记忆表征就开始通过记忆回放的过程得到加强和转化。使用功能磁共振成像技术，作者研究了编码过程中通过重复操纵而改变的记忆强度如何调节人类的编码后回放。结果显示，重复不能增强海马的回放频率，但是皮层区域的回放以及皮层海马共同协调的回放在重复事件中被显著增强，表明重复加速了记忆巩固的过程，另外在海马和皮层的回放频率可以调节即时联想辨认测试中编码较弱的信息的行为成功率，这表明了编码后回放在帮助回忆曾经出现过事件的重要作用。总的来说这篇文章突出了回放在巩固较弱记忆和加速皮层记忆巩固来增强记忆过程中的作用。

bioRxiv, 2022. DOI: 10.1101/2022.12.14.520481

Repetition accelerates neural markers of memory consolidation

翻译

Wangjing Yu , Asieh Zadbood , Avi J. H. Chanales , Lila Davachi

Abstract:

AbstractNo sooner is an experience over than its neural memory representation begins to be strengthened and transformed through the process of memory replay. Using fMRI, we examined how memory strength … >>>

翻译

24.

muton (2022-12-31 22:43):

#paper doi: https://doi.org/10.1101/2022.10.03.510672 Human hippocampal ripples signal encoding of episodic memories biorixv 2022 海马尖波涟漪是在哺乳动物电生理中发现的一个很特别具有代表性的成分，最开始是在小鼠研究中被发现，随着人类脑电记录的发展，颅内记录的出现让研究尖波涟漪在人类中变为现实，以往在人类的研究中更多关注于ripple和记忆提取之间的关系，很少研究在编码信息，尤其是单个项目时ripple的作用，本文则填补了这一空白，通过124名被试的情景记忆任务表现，作者发现虽然在MTL等重要脑区能够发现高频信号的随后记忆效应，但ripple并未表现出差异，但令人新奇的是ripple会在记忆item在编码时间上相近或语义相近的item时表现出更频繁的发放，也被称为一种聚类效应，并且这一现象在编码和提取阶段都能够被发现，这种现象可能代表了一种对于记忆的保留，有助于预测和提取记忆。本篇文章对于探究ripple这一脑电成分在人类情景记忆中的功能有重要提示。

bioRxiv, 2022. DOI: 10.1101/2022.10.03.510672

Human hippocampal ripples signal encoding of episodic memories

翻译

John J. Sakon , David J. Halpern , Daniel R. Schonhaut , Michael J. Kahana

Abstract:

AbstractRecent human electrophysiology work has uncovered the presence of high frequency oscillatory events, termed ripples, during awake behavior. This prior work focuses on ripples in the medial temporal lobe (MTL) … >>>

翻译

25.

Ricardo (2022-10-31 23:13):

#paper doi:https://doi.org/10.1101/251512 Unbiased construction of a temporally consistent morphological atlas of neonatal brain development 这是UCL一名已毕业的博士在博士期间做的新生儿脑模板构建的工作，但是一直没有见刊，至今还挂在bioRxiv上。为构建无偏的脑模板，作者首先通过成对的线性配准寻找公共空间，在这个全局配准阶段，模板构建算法可以暂时忽略全局的形状变化，而专注于局部的形变。其次，作者介绍了一个快速且无偏的配准算法。最后，作者利用kernel regression的方法分配每个被试的权重，用于生成对应孕周的脑模板。

bioRxiv, 2018. DOI: 10.1101/251512

Unbiased construction of a temporally consistent morphological atlas of neonatal brain development

翻译

Abstract:

AbstractPremature birth increases the risk of developing neurocognitive and neurobe-havioural disorders. The mechanisms of altered brain development causing these disorders are yet unknown. Studying the morphology and function of the … >>>

翻译

26.

周周复始 (2022-10-26 20:17):

#paper doi: https://doi.org/10.1101/2021.03.04.433968,Deep Diffusion MRI Registration (DDMReg): A Deep Learning Method for Diffusion MRI Registration。本文基于深度学习提出了新的配准框架，用于dmri数据的配准。由于dmri数据既包含水分子扩散强度也包含水扩散方向信息，所以配准dmri，既要使全脑解剖结构对齐也要让纤维束方向保持一致，传统配准方法存在的问题是要么不包含方向信息，要么是专门针对纤维束进行配准不能保证全脑结构的对齐。本文方法的输入数据包含了代表全脑解剖结构信息的FA图像和代表纤维束方向的TOM图像，通过一个基于voxelmorph改进后的DDMReg网络架构，训练出的模型效果与最先进的四种方法（SyN，DTI-Tk，MRReg，voxelmorph）相比是最优的。

bioRxiv, 2021. DOI: 10.1101/2021.03.04.433968

Deep Diffusion MRI Registration (DDMReg): A Deep Learning Method for Diffusion MRI Registration

翻译

Fan Zhang , William M. Wells , Lauren J. O’Donnell

Abstract:

AbstractIn this paper, we present a deep learning method, DDMReg, for accurate registration between diffusion MRI (dMRI) datasets. In dMRI registration, the goal is to spatially align brain anatomical structures … >>>

翻译

27.

颜林林 (2022-09-11 23:59):

#paper doi:10.1101/2022.09.09.453067 bioRxiv, 2022, HexSE: Simulating evolution in overlapping reading frames. 重叠基因是在病毒（质粒）中发现的一种有趣现象，即同一段核酸序列，因为翻译蛋白质的起始位置不同（即阅读框不同）导致形成不同蛋白。到目前为止的研究，发现在许多物种中都存在此现象。本文通过分析序列演化速率，来从积累的大量已被测序的基因组数据中，寻找这样的重叠基因。其基本假设是，如果存在重叠基因，则相应序列上受到的演化选择压力会有所不同，于是在结果上呈现出不同的演化速率。这是个很有意思的思路和研究课题。

bioRxiv, 2022. DOI: 10.1101/2022.09.09.453067

HexSE: Simulating evolution in overlapping reading frames

翻译

Laura Munoz-Baena , Kaitlyn Wade , Art Poon

Abstract:

Motivation: Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where … >>>

翻译

28.

颜林林 (2022-08-26 23:18):

#paper doi:10.1101/2022.08.24.505159 bioRxiv, 2022, A genome-wide atlas of recurrent repeat expansions in human cancer. 这篇来自斯坦福大学的Michael Snyder团队。通过重分析来自ICGC和TCGA的2622个癌症全基因组测序数据，涉及29个癌种，从中鉴定出160个重复序列扩张（recurrent repeat expansions, rRE）事件，且这些事件绝大多数都与特定癌症亚型相关。这些重复序列所处基因组区域，也富集在某些基因的调控元件附近，提示了它们在基因调控方面可能发挥作用。其中一个GAAA重复发生在UGT2B7基因的内含子中，在34%的肾细胞癌样本中都能观察到，于是通过斯坦福癌症中心入组了12例肾癌病例，对其样本开展了二代测序（Illumina NovaSeq）和三代测序（PacBio），验证了该rRE事件的发生。

bioRxiv, 2022. DOI: 10.1101/2022.08.24.505159

A genome-wide atlas of recurrent repeat expansions in human cancer

翻译

Graham Scott Erwin , Gamze Gursoy , Rashid Al-Abri , Ashwini Suriyaprakash , Egor Dolzhenko , Kevin Zhu , Christian R Hoerner , Shannon M White , Lucia Ramirez , Ananya Vadlakonda , ... >>>

Abstract:

Expansion of a single repetitive DNA sequence, termed a tandem repeat (TR), is known to cause more than 50 diseases. However, repeat expansions are often not explored beyond neurological and … >>>

翻译

29.

惊鸿 (2022-08-14 18:12):

#paper doi：10.1101/2022.08.08.503198 Bilallelic germline mutations in MAD1L1 induce a novel syndrome of aneuploidy with high tumor susceptibility MAD1L1是编码纺锤体组装检查点 (SAC) 蛋白MAD1的基因，发生在一名36岁的患有十几个肿瘤的女性身上，包括五个恶性肿瘤。外周血细胞的功能研究表明缺乏全长蛋白质和SAC反应不足，导致细胞遗传学和单细胞 (sc) 检测到约30-40% 的非整倍体细胞DNA分析。对患者血细胞的scRNA-seq分析确定了线粒体应激伴随全身炎症，干扰素和NFkB信号增强。MAD1L1突变还导致 γ δ T细胞的特异性克隆扩增，增加了18号染色体并增强了细胞毒性，以及具有慢性淋巴细胞白血病细胞特征的染色体12增益和转录组特征的中间b细胞。这些数据表明MAD1L1突变是一种新的具有全身炎症和前所未有的肿瘤易感性的非整倍体综合征的原因。仅仅一个基因片段就可以给全身带来变化，这些变化有好有坏，所以基因编辑不是消消乐，是一个严谨的技术，这是一个基因工程师应有的心态

bioRxiv, 2022. DOI: 10.1101/2022.08.08.503198

Bilallelic germline mutations in MAD1L1 induce a novel syndrome of aneuploidy with high tumor susceptibility

翻译

Abstract:

Aneuploidy is a frequent feature of human tumors. Germline mutations leading to aneuploidy are very rare in humans, and their tumor-promoting properties are mostly unknown at the molecular level. We … >>>

翻译

30.

颜林林 (2022-08-02 23:38):

#paper doi:10.1101/2020.02.16.951657 bioRxiv, 2022, APA-Scan: Detection and Visualization of 3'-UTR Alternative Polyadenylation with RNA-seq and 3'-end-seq Data. 在真核生物中存在一种名为APA（可变的多聚腺苷酸）的机制，通过形成不同的可变剪接，使表达的基因的3'-UTR区域携带不同长度的poly-A（多聚腺苷酸）序列，从而实现精细调控基因表达（包括降解等）。本文开发了一个计算工具APA-Scan，能够基于RNA-seq数据，分析并充分考虑其相关区域的测序深度信息，鉴定APA事件，给出相应注释，并提供图形化展示，弥补了过去其他工具方法在这方面的缺失和不足。本文还通过对模拟数据和两个实际公共数据集（DaPars和APAtrap）进行分析评测，并使用qPCR实验进行了验证。

bioRxiv, 2020. DOI: 10.1101/2020.02.16.951657

APA-Scan: Detection and Visualization of 3’-UTR Alternative Polyadenylation with RNA-seq and 3’-end-seq Data

翻译

Naima Ahmed Fahmi , Khandakar Tanvir Ahmed , Jae-Woong Chang , Heba Nassereddeen , Deliang Fan , Jeongsik Yong , Wei Zhang

Abstract:

BackgroundThe eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3’-untranslated region (3’-UTR) of mRNA produces transcripts with … >>>

BackgroundThe eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3’-untranslated region (3’-UTR) of mRNA produces transcripts with shorter or longer 3’-UTR. Often, 3’-UTR serves as a binding platform for microRNAs and RNA-binding proteins, which affect the fate of the mRNA transcript. Thus, 3’-UTR APA is known to modulate translation and provides a mean to regulate gene expression at the post-transcriptional level. Current bioinformatics pipelines have limited capability in profiling 3’-UTR APA events due to incomplete annotations and a low-resolution analyzing power: widely available bioinformatics pipelines do not reference actionable polyadenylation (cleavage) sites but simulate 3’-UTR APA only using RNA-seq read coverage, causing false positive identifications. To overcome these limitations, we developed APA-Scan, a robust program that identifies 3’-UTR APA events and visualizes the RNA-seq short-read coverage with gene annotations.MethodsAPA-Scan utilizes either predicted or experimentally validated actionable polyadenylation signals as a reference for polyadenylation sites and calculates the quantity of long and short 3’-UTR transcripts in the RNA-seq data. APA-Scan works in three major steps: (i) calculate the read coverage of the 3’-UTR regions of genes; (ii) identify the potential APA sites and evaluate the significance of the events among two biological conditions; (iii) graphical representation of user specific event with 3’-UTR annotation and read coverage on the 3’-UTR regions. APA-Scan is implemented in Python3. Source code and a comprehensive user’s manual are freely available at https://github.com/compbiolabucf/APA-Scan.ResultAPA-Scan was applied to both simulated and real RNA-seq datasets and compared with two widely used baselines DaPars and APAtrap. In simulation APA-Scan significantly improved the accuracy of 3’-UTR APA identification compared to the other baselines. The performance of APA-Scan was also validated by 3’-end-seq data and qPCR on mouse embryonic fibroblast cells. The experiments confirm that APA-Scan can detect unannotated 3’ -UTR APA events and improve genome annotation.ConclusionAPA-Scan is a comprehensive computational pipeline to detect transcriptome-wide 3’-UTR APA events. The pipeline integrates both RNA-seq and 3’-end-seq data information and can efficiently identify the significant events with a high-resolution short reads coverage plots. <<<

翻译

31.

象棋 (2022-07-31 23:20):

#paper doi:https://doi.org/10.1101/2021.03.16.435524, bioRxiv preprint, (2021), Decoding the Information Structure Underlying the Neural Representation of Concepts. 人类对于语义概念的表征有三种，taxonomic(动物，工具等，强调类别)，sensory-motor(苹果是红色的圆圆的很甜，强调各种特征)，distributed(消防员和水龙头，强调共同出现的频率)。作者利用各种语料库得到了三种表征方式的行为模型，然后将这些行为模型和脑信号模型做相关，发现大部分脑区的表征方式为sensory-motor。

bioRxiv, 2021. DOI: 10.1101/2021.03.16.435524

Decoding the Information Structure Underlying the Neural Representation of Concepts

翻译

Leonardo Fernandino , Jia-Qing Tong , Lisa L. Conant , Colin J. Humphries , Jeffrey R. Binder

Abstract:

AbstractThe nature of the representational code underlying conceptual knowledge remains a major unsolved problem in cognitive neuroscience. We assessed the extent to which different representational systems contribute to the instantiation … >>>

翻译

32.

颜林林 (2022-07-23 22:05):

#paper doi:10.1101/2022.07.21.500999 bioRxiv, 2022, High-resolution de novo structure prediction from primary sequence. 这篇预发表的文章，开发了一个工具，OmegaFold，可以基于单个蛋白的一级序列信息，预测三级结构。现在主流的方法，都需要依赖演化信息，即通过多序列比对作为辅助，进行蛋白质折叠结构的预测。而本文认为，蛋白从被翻译合成出来后，就会经历从一级序列自动折叠成为三级结构，因而这些演化信息对于结构预测而言并非必要。本文采取的深度模型，会依赖于一组预训练模型，帮助识别出一级序列中哪些氨基酸更为重要（即赋予不同的注意力），并采取基于BERT的语言模型技术，帮助进行蛋白质折叠的模型训练。最终实现的方法，可以有效解决孤儿蛋白（即当前结构数据库中缺乏其他可供参考的相近蛋白）的结构预测问题，且与AlphaFold等工具相比，在准确度上又有显著提升。

bioRxiv, 2022. DOI: 10.1101/2022.07.21.500999

High-resolution de novo structure prediction from primary sequence

翻译

Ruidong Wu , Fan Ding , Rui Wang , Rui Shen , Xiwen Zhang , Shitong Luo , Chenpeng Su , Zuofan Wu , Qi Xie , Bonnie Berger , ... >>>

Abstract:

Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such … >>>

翻译

33.

颜林林 (2022-07-20 07:49):

#paper doi:10.1101/2022.07.17.500374 bioRxiv, 2022, Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations. 这是一个“认真地做一件小事”的例子。在做基因组分析时，我们经常遭遇“究竟该用hg19还是hg38”的纠结，有时候不得不并行地分别使用两个参考基因组来进行两次差不多的分析，以避免由于使用liftOver之类的基因组坐标转换工具带来的信息丢失。这篇文章针对这个小小的（甚至不那么常见的）痛点，在兼容现有VCF格式的情况下，使其在同一个结果文件中带上两套基因组坐标，不仅不影响现有工具的使用，而且可以随时从中进行所需基因组坐标的提取。想法很简单，实现也不难，但却的确是有效解决了某些实际操作的问题。

bioRxiv, 2022. DOI: 10.1101/2022.07.17.500374

Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations

翻译

Divon Mordechai Lan , Gludhug Purnomo , Raymond Tobler , Yassine Souilmi , Bastien Llamas

Abstract:

We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented … >>>

翻译

34.

颜林林 (2022-07-18 06:00):

#paper doi:10.1101/2022.07.14.500036 bioRxiv, 2022, Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics. 单细胞转录组测序数据分析中，需要对批次效应影响进行去除。这通常是对原本高维的数据进行降维，使其在更容易反映出数据结构特征的低维空间上，根据批次信息对数据进行矫正。这个过程很容易导致具有生物学意义的数据特征被误伤，而这样的生物学差异正是我们进行单细胞测序所要研究的对象。针对如何去除批次效应影响，以及如何保留生物学相关数据差异，这两个原本互相矛盾的目标，通常被单细胞测序分析工具根据其各自策略原则的不同，会被选取其中之一作为优先目标进行优化。在本文中，作者通过引入一种名为帕累托多任务学习（Pareto MTL）的多目标优化技术，使综合评估并权衡与两者有关的多种不同指标，以获得整体更优的目的。在这个过程中，还基于神经网络方法，提出一种名为交互信息神经估计（Mutual Information Neural Estimation，MINE）的指标，来帮助该平衡点的选取。文章使用了TM-MARROW和MACAQUE-RETINA等公共数据集，对方法进行了评估，并展示了MINE的效果，确实优于常用的MMD方法。

bioRxiv, 2022. DOI: 10.1101/2022.07.14.500036

Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics

翻译

Hui Li , Davis J. McCarthy , Heejung Shim , Susan Wei

Abstract:

Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low … >>>

翻译

35.

颜林林 (2022-07-11 00:41):

#paper doi:10.1101/2022.07.09.499321 bioRxiv, 2022, A Draft Human Pangenome Reference. 这应该又是一篇重磅文章，在bioRxiv上提前预发表出来。三十多家顶级单位合作，作者名单即使在使用“Human Pangenome Reference Consortium”做了浓缩后依然很长，包含不少让人熟知的名字，他们在过去这些年里曾反复出现在基因组学的各重磅文章中，比如其中就包含李恒这位大神，他赫然是通讯作者之一。全文篇幅长达97页（不含另外39页的补充材料），也反映出这项工作的体量重大。众所周知，我们一直在使用的人类参考基因组，其实来自最早的七八个人，他们的基因组，对于全人类的基因库而言，是很难相信有足够代表性的。于是这些年来，随着大量基因组数据的积累，参考基因组一直在更新迭代，打了一个又一个补丁。这篇文章所提出的“泛基因组参考（pangenome reference）”可以被认为是又一个重大改进和新版本发布，甚至可能这是接近“一劳永逸”的关键改进。它整合了多达47个个体基因组，这些个体基因组完成了定相位（phased）和二倍体组装（diploid assemblies）。且通过先前诸如HapMap、千人基因组等人类群体基因组研究的积累，确定了这47个个体的基因组差异足够大，能够涵盖超过 99% 的预期序列，并且在结构和碱基对水平上的准确率超过 99%。超长的篇幅中，详细展示了这套新参考基因组的完整构建过程，甚至精确到详细的命令行及参数，是非常值得仔细学习的。

bioRxiv, 2022. DOI: 10.1101/2022.07.09.499321

A Draft Human Pangenome Reference

翻译

Abstract:

The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover … >>>

翻译

36.

颜林林 (2022-07-01 07:57):

#paper doi:10.1101/2022.06.27.497710 bioRxiv, 2022, PaliDIS: A tool for fast discovery of novel insertion sequences. 这是一篇有关的生信工具的文章，通讯作者来自Wellcome Sanger Institute。该工具从宏基因组数据中，寻找彼此之间含有相同重复片段的序列，将其比对到各组装好的微生物基因组上，将连锁位于同一组装序列且彼此反向互补的重复片段筛选出来，并经过一系列质控过滤，从而鉴别出在微生物基因组上发生的倒位形式的移动元件，以此帮助对耐药基因及其在不同菌种之间传播进行研究。类似流程在人类基因组分析中并不少见，且基本都是根据基因组事件及其序列特征直接进行实现，方法本身算不上有什么特别的创新之处。只不过应用于特定场景的特定数据集（在这篇文章里，数据是来自HMP，Human Microbiome Project，人类微生物计划），对分析结果进行（关于该移动元件的）统计描述和分析，倒是可行且常见的研究套路。

bioRxiv, 2022. DOI: 10.1101/2022.06.27.497710

PaliDIS: A tool for fast discovery of novel insertion sequences

翻译

Victoria R Carr, Solon P. Pissis, Peter Mullany, Saeed Shoaie, David Gomez-Cabrero, David L. Moyes

Abstract:

The diversity of microbial insertion sequences, crucial mobile genetic elements in generating diversity in microbial genomes, needs to be better represented in current microbial databases. Identification of these sequences in … >>>

翻译

37.

颜林林 (2022-06-28 07:39):

#paper doi:10.1101/2022.06.22.497216 bioRxiv, 2022, Intratumoral mregDC and CXCL13 T helper niches enable local differentiation of CD8 T cells following PD-1 blockade. 这篇文章来自西奈山伊坎医学院，其病例队列出自一项用于非小细胞肺癌（NSCLC）、肝细胞癌（HCC）和头颈部鳞癌（HNSCC）的手术前抗PD-1免疫药物（西米普利单抗，Cemiplimab）新辅助治疗的多中心II期临床试验（NCT03916627，该临床试验尚在进行中，始于2019年，预计2024年完成）。本文仅针对其中的肝细胞癌患者，通过对其新辅助治疗后手术取样组织，开展TCR测序、全外显子测序、单细胞转录组测序、多重免疫组化等实验，寻找与新辅助治疗疗效相关的特定细胞类群。通过免疫组化和免疫荧光方法，确认在肿瘤中确实富含T细胞并浸润其中的患者，仍有部分患者对PD-1药物并无响应。对比响应者与无响应者之间的细胞类群组成差异，找到一个细胞类群组合，成熟调节树突状细胞（mregDC，LAMP3+）与 CXCL13+ CD4+ 辅助性T细胞，它们与 PD-1高表达的CD8+ T细胞前体结合，形成三元组，促使后者形成 PD-1高表达的 GZMK+ 效应T细胞。而在没有这两类细胞的情况下，后者将形成耗竭型CD8+ T细胞。这导致了该新辅助治疗的不同预后结局。这项研究也为进一步揭示免疫治疗相关机制提供了新的证据。

bioRxiv, 2022. DOI: 10.1101/2022.06.22.497216

Intratumoral mregDC and CXCL13 T helper niches enable local differentiation of CD8 T cells following PD-1 blockade

翻译

Assaf Magen , Pauline Hamon , Nathalie Fiaschi , Leanna Troncoso , Etienne Humblin , Darwin D'souza , Travis Dawson , Matthew D. Park , Joel Kim , Steven Hamel , ... >>>

Abstract:

Here, we leveraged a large neoadjuvant PD-1 blockade trial in patients with hepatocellular carcinoma (HCC) to search for correlates of response to immune checkpoint blockade (ICB) within T cell-rich tumors. … >>>

翻译

38.

颜林林 (2022-06-17 22:10):

#paper doi:10.1101/2022.06.12.495839 bioRxiv, 2022, Accurate Estimation of Molecular Counts from Amplicon Sequence Data with Unique Molecular Identifiers. 高通量测序数据中充满由PCR扩增和测序过程导致的错误，为解决此问题，人们通常会引入分子标签（UMI）技术，即用一段随机序列来标记出哪些序列来自同一原始模板分子，而哪些不是。很多工具在处理UMI时，都简单粗暴地将相同UMI的序列直接进行合并，而由于UMI序列本身也存在突变，会导致还原样本中原始模板分子信息的过程被误判。这个过程在扩增子测序（amplicon-seq）中尤为明显。本文通过构建一个单步隐马科夫模型（one step HMM），来处理PCR和测序过程中的错误，并用C语言实现了一套EM算法，对UMI测序数据的真实原始模板分子数进行估算。在模拟数据和真实数据中，分别进行了评测，对比既往其他类似工具，本文开发的工具（DAUMI），能有效识别出UMI冲突（UMI collision），表现出更优异的性能。

bioRxiv, 2022. DOI: 10.1101/2022.06.12.495839

Accurate Estimation of Molecular Counts from Amplicon Sequence Data with Unique Molecular Identifiers

翻译

Xiyu Peng , Karin Dorman

Abstract:

Motivation: Amplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured … >>>

翻译

39.

颜林林 (2022-06-01 07:41):

#paper doi:10.1101/2022.05.29.493900 bioRxiv 2022, Cost-efficient whole genome-sequencing using novel mostly natural sequencing-by-synthesis chemistry and open fluidics platform. 这是来自MIT的一家创业公司Ultima Genomics的新作品，它从设计原理上对当前“边合成边测序”的方法进行突破创新。通过在圆形大晶片上设计流控和光学系统，使相应的试剂耗材更加便宜。相对于Illumina测序在每个cycle进行可逆阻断的碱基追加方法，本文通过使用非阻断的方法，使碱基追加过程更加快速，同时配合一套CNN算法，来实现准确的base calling。实测下来，该测序方法可以做到在20小时以内、300bp长读长、Q30>85%高质量的高通量测序，且每Gb数据成本低于1美元。本文还使用GIAB及千人基因组的样本进行了基准测试，验证了测序结果的准确度。我们很多人天天都在围绕高通量测序做研究，早已把Illumina测序原理当做习以为常且理所当然的技术，默认了它的垄断和天花板地位，很少去考虑它还有什么可以进一步改善的地方。这篇文章是个拓展这方面眼界的机会。

bioRxiv, 2022. DOI: 10.1101/2022.05.29.493900

Cost-efficient whole genome-sequencing using novel mostly natural sequencing-by-synthesis chemistry and open fluidics platform

翻译

Gilad Almogy, Mark Pratt, Florian Oberstrass, Linda Lee, Dan Mazur, Nate Beckett, Omer Barad, Ilya Soifer, Eddie Perelman, Yoav Etzioni, Martin Sosa, April Jung, Tyson Clark, Gila Lithwick-Yanai, Sarah Pollock, Gil Hornung, Maya Levy, Matthew Coole, Tom Howd, Megan Shand, Yossi Farjoun, James Emery, Giles Hall, Samuel K Lee, Takuto Sato, Ricky Magner, Sophie Low, Andrew Bernier, Bharathi Gandi, Jack Stohlman, Corey Nolet, Siobhan Donovan, Brendan Blumenstiel, Michelle Cipicchio, Sheila Dodge, Eric Banks, Niall Lennon, Stacey Gabriel, Doron Lipson

Abstract:

We introduce a massively parallel novel sequencing platform that combines an open flow cell design on a circular wafer with a large surface area and mostly natural nucleotides that allow … >>>

翻译

40.

颜林林 (2022-03-06 20:48):

#paper doi:10.1101/2021.07.19.452956, bioRxiv, 2022, The Tabula Sapiens: a multiple organ single cell transcriptomic atlas of humans. 这是一篇preprint，介绍了对于单细胞转录组测序而言非常重磅的一项资源。它纳入了15位捐赠者（一般由于中风、外伤或缺氧等导致死亡，参见：https://tabula-sapiens-portal.ds.czbiohub.org/whereisthedata）所提供的24个不同组织器官，分离得到将近50万个单细胞，分别进行了10x和/或SmartSeq2的单细胞转录组测序技术，分析得到400多种细胞类型的组织特异性表达数据，提供了组织间T细胞克隆分布、B细胞组织特异性突变率、细胞周期状态及不同细胞在组织器官之间的分布、个体不同组织间细胞类型特异性RNA剪接形式等重要参考基准图谱信息。同时，通过对样本进行病理切片和H&E染色等分析，也将转录组数据与宏观临床相关信息，如不同组织类型的空间异质性、细胞相对丰度估计等都做了关联和讨论。这个项目由 Tabula Sapiens Consortium 执行，其数据（包括原始测序数据和分析结果）存放在AWS、FigShare、CellXGene等平台，供全世界开放使用（但不允许在未征得该委员会及合作方同意前发表图谱或组织规模的数据分析文章），相关信息可在项目网站（https://tabula-sapiens-portal.ds.czbiohub.org/）上找到，该网站还提供了一套流程，帮助用户使用其结果来注释和解读自己的数据。有两点很值得一提：一、该委员会及项目主要由 Chan Zuckerberg Initiative 基金会支持，该基金会由 Facebook创始人马克·扎克伯格及其妻子普莉希拉·陈（生物学专业）共同创办，bioRxiv和medRxiv也是由该基金会支持建立和维持运营；二、这篇文章的通讯作者Stephen R Quake，是生物技术领域的超级大牛，他也应该是在很早期将自己基因组贡献出来验证相关高通量测序技术的名人之一，可参见2009年NBT文章（doi:10.1038/nbt.1561），该文章的受试者P0（猜测很可能就是Quake本人），基于已成为历史的Helicos Biosciences公司的单分子高通量测序技术（应该属于三代测序体系；要知道，二代测序的兴起，也仅仅开始于2008年左右），测定了该技术的最早人全基因组数据。Quake的贡献及事迹这里不做展开，有兴趣者可自行搜索。

bioRxiv, 2022. DOI: 10.1101/2021.07.19.452956

The Tabula Sapiens: a multiple organ single cell transcriptomic atlas of humans

翻译

Abstract: No abstract available.