来自杂志 bioRxiv 的文献。
当前共找到 31 篇文献分享,本页显示第 21 - 31 篇。
21.
颜林林 (2022-08-02 23:38):
#paper doi:10.1101/2020.02.16.951657 bioRxiv, 2022, APA-Scan: Detection and Visualization of 3'-UTR Alternative Polyadenylation with RNA-seq and 3'-end-seq Data. 在真核生物中存在一种名为APA(可变的多聚腺苷酸)的机制,通过形成不同的可变剪接,使表达的基因的3'-UTR区域携带不同长度的poly-A(多聚腺苷酸)序列,从而实现精细调控基因表达(包括降解等)。本文开发了一个计算工具APA-Scan,能够基于RNA-seq数据,分析并充分考虑其相关区域的测序深度信息,鉴定APA事件,给出相应注释,并提供图形化展示,弥补了过去其他工具方法在这方面的缺失和不足。本文还通过对模拟数据和两个实际公共数据集(DaPars和APAtrap)进行分析评测,并使用qPCR实验进行了验证。
Abstract:
BackgroundThe eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3’-untranslated region (3’-UTR) of mRNA produces transcripts with … >>>
BackgroundThe eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3’-untranslated region (3’-UTR) of mRNA produces transcripts with shorter or longer 3’-UTR. Often, 3’-UTR serves as a binding platform for microRNAs and RNA-binding proteins, which affect the fate of the mRNA transcript. Thus, 3’-UTR APA is known to modulate translation and provides a mean to regulate gene expression at the post-transcriptional level. Current bioinformatics pipelines have limited capability in profiling 3’-UTR APA events due to incomplete annotations and a low-resolution analyzing power: widely available bioinformatics pipelines do not reference actionable polyadenylation (cleavage) sites but simulate 3’-UTR APA only using RNA-seq read coverage, causing false positive identifications. To overcome these limitations, we developed APA-Scan, a robust program that identifies 3’-UTR APA events and visualizes the RNA-seq short-read coverage with gene annotations.MethodsAPA-Scan utilizes either predicted or experimentally validated actionable polyadenylation signals as a reference for polyadenylation sites and calculates the quantity of long and short 3’-UTR transcripts in the RNA-seq data. APA-Scan works in three major steps: (i) calculate the read coverage of the 3’-UTR regions of genes; (ii) identify the potential APA sites and evaluate the significance of the events among two biological conditions; (iii) graphical representation of user specific event with 3’-UTR annotation and read coverage on the 3’-UTR regions. APA-Scan is implemented in Python3. Source code and a comprehensive user’s manual are freely available at https://github.com/compbiolabucf/APA-Scan.ResultAPA-Scan was applied to both simulated and real RNA-seq datasets and compared with two widely used baselines DaPars and APAtrap. In simulation APA-Scan significantly improved the accuracy of 3’-UTR APA identification compared to the other baselines. The performance of APA-Scan was also validated by 3’-end-seq data and qPCR on mouse embryonic fibroblast cells. The experiments confirm that APA-Scan can detect unannotated 3’ -UTR APA events and improve genome annotation.ConclusionAPA-Scan is a comprehensive computational pipeline to detect transcriptome-wide 3’-UTR APA events. The pipeline integrates both RNA-seq and 3’-end-seq data information and can efficiently identify the significant events with a high-resolution short reads coverage plots. <<<
翻译
22.
象棋 (2022-07-31 23:20):
#paper doi:https://doi.org/10.1101/2021.03.16.435524, bioRxiv preprint, (2021), Decoding the Information Structure Underlying the Neural Representation of Concepts. 人类对于语义概念的表征有三种,taxonomic(动物,工具等,强调类别),sensory-motor(苹果是红色的圆圆的很甜,强调各种特征),distributed(消防员和水龙头,强调共同出现的频率)。作者利用各种语料库得到了三种表征方式的行为模型,然后将这些行为模型和脑信号模型做相关,发现大部分脑区的表征方式为sensory-motor。
Abstract:
AbstractThe nature of the representational code underlying conceptual knowledge remains a major unsolved problem in cognitive neuroscience. We assessed the extent to which different representational systems contribute to the instantiation … >>>
AbstractThe nature of the representational code underlying conceptual knowledge remains a major unsolved problem in cognitive neuroscience. We assessed the extent to which different representational systems contribute to the instantiation of lexical concepts in high-level, heteromodal cortical areas previously associated with semantic cognition. We found that lexical semantic information can be reliably decoded from a wide range of heteromodal cortical areas in frontal, parietal, and temporal cortex. In most of these areas, we found a striking advantage for experience-based representational structures (i.e., encoding information about sensory-motor, affective, and other features of phenomenal experience), with little evidence for independent taxonomic or distributional organization. These results were found independently for object and event concepts. Our findings indicate that concept representations in heteromodal cortex are based, at least in part, on experiential information. They also reveal that, in most heteromodal areas, event concepts have more heterogeneous representations (i.e., they are more easily decodable) than object concepts, and that other areas beyond the traditional “semantic hubs” contribute to semantic cognition, particularly the posterior cingulate gyrus and the precuneus. <<<
翻译
23.
颜林林 (2022-07-23 22:05):
#paper doi:10.1101/2022.07.21.500999 bioRxiv, 2022, High-resolution de novo structure prediction from primary sequence. 这篇预发表的文章,开发了一个工具,OmegaFold,可以基于单个蛋白的一级序列信息,预测三级结构。现在主流的方法,都需要依赖演化信息,即通过多序列比对作为辅助,进行蛋白质折叠结构的预测。而本文认为,蛋白从被翻译合成出来后,就会经历从一级序列自动折叠成为三级结构,因而这些演化信息对于结构预测而言并非必要。本文采取的深度模型,会依赖于一组预训练模型,帮助识别出一级序列中哪些氨基酸更为重要(即赋予不同的注意力),并采取基于BERT的语言模型技术,帮助进行蛋白质折叠的模型训练。最终实现的方法,可以有效解决孤儿蛋白(即当前结构数据库中缺乏其他可供参考的相近蛋白)的结构预测问题,且与AlphaFold等工具相比,在准确度上又有显著提升。
Abstract:
Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such … >>>
Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such as with orphan proteins and fast-evolving proteins like antibodies, and a protein typically folds in a natural setting from its primary amino acid sequence into its three-dimensional structure, suggesting that evolutionary information and MSAs should not be necessary to predict a protein's folded form. Here, we introduce OmegaFold, the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures. OmegaFold enables accurate predictions on orphan proteins that do not belong to any functionally characterized protein family and antibodies that tend to have noisy MSAs due to fast evolution. Our study fills a much-needed structure prediction gap and brings us a step closer to understanding protein folding in nature. <<<
翻译
24.
颜林林 (2022-07-20 07:49):
#paper doi:10.1101/2022.07.17.500374 bioRxiv, 2022, Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations. 这是一个“认真地做一件小事”的例子。在做基因组分析时,我们经常遭遇“究竟该用hg19还是hg38”的纠结,有时候不得不并行地分别使用两个参考基因组来进行两次差不多的分析,以避免由于使用liftOver之类的基因组坐标转换工具带来的信息丢失。这篇文章针对这个小小的(甚至不那么常见的)痛点,在兼容现有VCF格式的情况下,使其在同一个结果文件中带上两套基因组坐标,不仅不影响现有工具的使用,而且可以随时从中进行所需基因组坐标的提取。想法很简单,实现也不难,但却的确是有效解决了某些实际操作的问题。
Abstract:
We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented … >>>
We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented in the Genozip platform, DVCF enables bioinformatics pipelines to seamlessly operate across two coordinate systems by leveraging the system most advantageous to each pipeline step, simplifying bioinformatics workflows and reducing file generation and associated data storage burden. Moreover, our benchmarking of Genozip DVCF shows that it produces more complete, less erroneous, and less biased translations across coordinate systems than two widely used alternative tools (i.e., LiftoverVcf and CrossMap). <<<
翻译
25.
颜林林 (2022-07-18 06:00):
#paper doi:10.1101/2022.07.14.500036 bioRxiv, 2022, Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics. 单细胞转录组测序数据分析中,需要对批次效应影响进行去除。这通常是对原本高维的数据进行降维,使其在更容易反映出数据结构特征的低维空间上,根据批次信息对数据进行矫正。这个过程很容易导致具有生物学意义的数据特征被误伤,而这样的生物学差异正是我们进行单细胞测序所要研究的对象。针对如何去除批次效应影响,以及如何保留生物学相关数据差异,这两个原本互相矛盾的目标,通常被单细胞测序分析工具根据其各自策略原则的不同,会被选取其中之一作为优先目标进行优化。在本文中,作者通过引入一种名为帕累托多任务学习(Pareto MTL)的多目标优化技术,使综合评估并权衡与两者有关的多种不同指标,以获得整体更优的目的。在这个过程中,还基于神经网络方法,提出一种名为交互信息神经估计(Mutual Information Neural Estimation,MINE)的指标,来帮助该平衡点的选取。文章使用了TM-MARROW和MACAQUE-RETINA等公共数据集,对方法进行了评估,并展示了MINE的效果,确实优于常用的MMD方法。
Abstract:
Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low … >>>
Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low dimensional latent representations better to understand overall structure in the data. In this work, we build upon scVI, a powerful deep generative model which can learn biologically meaningful latent representations, but which has limited explicit control of batch effects. Rather than prioritizing batch effect removal over conservation of biological variation, or vice versa, our goal is to provide a bird eye view of the trade-offs between these two conflicting objectives. Specifically, using the well established concept of Pareto front from economics and engineering, we seek to learn the entire trade-off curve between conservation of biological variation and removal of batch effects. A multi-objective optimisation technique known as Pareto multi-task learning (Pareto MTL) is used to obtain the Pareto front between conservation of biological variation and batch effect removal. Our results indicate Pareto MTL can obtain a better Pareto front than the naive scalarization approach typically encountered in the literature. In addition, we propose to measure batch effect by applying a neural-network based estimator called Mutual Information Neural Estimation (MINE) and show benefits over the more standard Maximum Mean Discrepancy (MMD) measure. The Pareto front between conservation of biological variation and batch effect removal is a valuable tool for researchers in computational biology. Our results demonstrate the efficacy of applying Pareto MTL to estimate the Pareto front in conjunction with applying MINE to measure the batch effect. <<<
翻译
26.
颜林林 (2022-07-11 00:41):
#paper doi:10.1101/2022.07.09.499321 bioRxiv, 2022, A Draft Human Pangenome Reference. 这应该又是一篇重磅文章,在bioRxiv上提前预发表出来。三十多家顶级单位合作,作者名单即使在使用“Human Pangenome Reference Consortium”做了浓缩后依然很长,包含不少让人熟知的名字,他们在过去这些年里曾反复出现在基因组学的各重磅文章中,比如其中就包含李恒这位大神,他赫然是通讯作者之一。全文篇幅长达97页(不含另外39页的补充材料),也反映出这项工作的体量重大。众所周知,我们一直在使用的人类参考基因组,其实来自最早的七八个人,他们的基因组,对于全人类的基因库而言,是很难相信有足够代表性的。于是这些年来,随着大量基因组数据的积累,参考基因组一直在更新迭代,打了一个又一个补丁。这篇文章所提出的“泛基因组参考(pangenome reference)”可以被认为是又一个重大改进和新版本发布,甚至可能这是接近“一劳永逸”的关键改进。它整合了多达47个个体基因组,这些个体基因组完成了定相位(phased)和二倍体组装(diploid assemblies)。且通过先前诸如HapMap、千人基因组等人类群体基因组研究的积累,确定了这47个个体的基因组差异足够大,能够涵盖超过 99% 的预期序列,并且在结构和碱基对水平上的准确率超过 99%。超长的篇幅中,详细展示了这套新参考基因组的完整构建过程,甚至精确到详细的命令行及参数,是非常值得仔细学习的。
Abstract:
The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover … >>>
The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence and are more than 99% accurate at the structural and base-pair levels. Based on alignments of the assemblies, we generated a draft pangenome that captures known variants and haplotypes, reveals novel alleles at structurally complex loci, and adds 119 million base pairs of euchromatic polymorphic sequence and 1,529 gene duplications relative to the existing reference, GRCh38. Roughly 90 million of the additional base pairs derive from structural variation. Using our draft pangenome to analyze short-read data reduces errors when discovering small variants by 34% and boosts the detected structural variants per haplotype by 104% compared to GRCh38-based workflows, and by 34% compared to using previous diversity sets of genome assemblies. <<<
翻译
27.
颜林林 (2022-07-01 07:57):
#paper doi:10.1101/2022.06.27.497710 bioRxiv, 2022, PaliDIS: A tool for fast discovery of novel insertion sequences. 这是一篇有关的生信工具的文章,通讯作者来自Wellcome Sanger Institute。该工具从宏基因组数据中,寻找彼此之间含有相同重复片段的序列,将其比对到各组装好的微生物基因组上,将连锁位于同一组装序列且彼此反向互补的重复片段筛选出来,并经过一系列质控过滤,从而鉴别出在微生物基因组上发生的倒位形式的移动元件,以此帮助对耐药基因及其在不同菌种之间传播进行研究。类似流程在人类基因组分析中并不少见,且基本都是根据基因组事件及其序列特征直接进行实现,方法本身算不上有什么特别的创新之处。只不过应用于特定场景的特定数据集(在这篇文章里,数据是来自HMP,Human Microbiome Project,人类微生物计划),对分析结果进行(关于该移动元件的)统计描述和分析,倒是可行且常见的研究套路。
Abstract:
The diversity of microbial insertion sequences, crucial mobile genetic elements in generating diversity in microbial genomes, needs to be better represented in current microbial databases. Identification of these sequences in … >>>
The diversity of microbial insertion sequences, crucial mobile genetic elements in generating diversity in microbial genomes, needs to be better represented in current microbial databases. Identification of these sequences in microbiome communities presents some significant problems that have led to their underrepresentation. Here, we present a software tool called PaliDIS that recognises insertion sequences in metagenomic sequence data rapidly by identifying inverted terminal repeat regions from mixed microbial community genomes. Applying this software to 266 human metagenomes identifies 11,681 unique insertion sequences. Querying this catalogue against a large database of isolate genomes reveals evidence of horizontal gene transfer events of clinically relevant antimicrobial resistance genes between classes of bacteria. We will continue to apply this tool more widely, building the Insertion Sequence Catalogue, a valuable resource for researchers wishing to query their microbial genomes for insertion sequences. <<<
翻译
28.
颜林林 (2022-06-28 07:39):
#paper doi:10.1101/2022.06.22.497216 bioRxiv, 2022, Intratumoral mregDC and CXCL13 T helper niches enable local differentiation of CD8 T cells following PD-1 blockade. 这篇文章来自西奈山伊坎医学院,其病例队列出自一项用于非小细胞肺癌(NSCLC)、肝细胞癌(HCC)和头颈部鳞癌(HNSCC)的手术前抗PD-1免疫药物(西米普利单抗,Cemiplimab)新辅助治疗的多中心II期临床试验(NCT03916627,该临床试验尚在进行中,始于2019年,预计2024年完成)。本文仅针对其中的肝细胞癌患者,通过对其新辅助治疗后手术取样组织,开展TCR测序、全外显子测序、单细胞转录组测序、多重免疫组化等实验,寻找与新辅助治疗疗效相关的特定细胞类群。通过免疫组化和免疫荧光方法,确认在肿瘤中确实富含T细胞并浸润其中的患者,仍有部分患者对PD-1药物并无响应。对比响应者与无响应者之间的细胞类群组成差异,找到一个细胞类群组合,成熟调节树突状细胞(mregDC,LAMP3+)与 CXCL13+ CD4+ 辅助性T细胞,它们与 PD-1高表达的CD8+ T细胞前体结合,形成三元组,促使后者形成 PD-1高表达的 GZMK+ 效应T细胞。而在没有这两类细胞的情况下,后者将形成耗竭型CD8+ T细胞。这导致了该新辅助治疗的不同预后结局。这项研究也为进一步揭示免疫治疗相关机制提供了新的证据。
Abstract:
Here, we leveraged a large neoadjuvant PD-1 blockade trial in patients with hepatocellular carcinoma (HCC) to search for correlates of response to immune checkpoint blockade (ICB) within T cell-rich tumors. … >>>
Here, we leveraged a large neoadjuvant PD-1 blockade trial in patients with hepatocellular carcinoma (HCC) to search for correlates of response to immune checkpoint blockade (ICB) within T cell-rich tumors. We show that ICB response correlated with the clonal expansion of intratumoral CXCL13+ CH25H+ IL-21+ PD-1+ CD4 T helper cells (CXCL13+ Th) and Granzyme K+ PD-1+ effector-like CD8 T cells, whereas terminally exhausted CD39hi TOXhi PD-1hi CD8 T cells dominated in non-responders. Strikingly, most T cell receptor (TCR) clones that expanded post-treatment were found in pre-treatment biopsies. Notably, PD-1+ TCF-1+ progenitor-like CD8 T cells were present in tumors of responders and non-responders and shared clones mainly with effector-like cells in responders or terminally differentiated cells in non-responders, suggesting that local CD8 T cell differentiation occurs upon ICB. We found that these progenitor CD8 T cells interact with CXCL13+ Th cells within cellular triads around dendritic cells enriched in maturation and regulatory molecules, or "mregDC". Receptor-ligand analysis revealed unique interactions within these triads that may promote the differentiation of progenitor CD8 T cells into effector-like cells upon ICB. These results suggest that discrete intratumoral niches that include mregDC and CXCL13+ Th cells control the differentiation of tumor-specific progenitor CD8 T cell clones in patients treated with ICB. <<<
翻译
29.
颜林林 (2022-06-17 22:10):
#paper doi:10.1101/2022.06.12.495839 bioRxiv, 2022, Accurate Estimation of Molecular Counts from Amplicon Sequence Data with Unique Molecular Identifiers. 高通量测序数据中充满由PCR扩增和测序过程导致的错误,为解决此问题,人们通常会引入分子标签(UMI)技术,即用一段随机序列来标记出哪些序列来自同一原始模板分子,而哪些不是。很多工具在处理UMI时,都简单粗暴地将相同UMI的序列直接进行合并,而由于UMI序列本身也存在突变,会导致还原样本中原始模板分子信息的过程被误判。这个过程在扩增子测序(amplicon-seq)中尤为明显。本文通过构建一个单步隐马科夫模型(one step HMM),来处理PCR和测序过程中的错误,并用C语言实现了一套EM算法,对UMI测序数据的真实原始模板分子数进行估算。在模拟数据和真实数据中,分别进行了评测,对比既往其他类似工具,本文开发的工具(DAUMI),能有效识别出UMI冲突(UMI collision),表现出更优异的性能。
Abstract:
Motivation: Amplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured … >>>
Motivation: Amplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured abundances are distorted by stochasticity and bias in amplification, plus errors during Polymerase Chain Reaction (PCR) and sequencing. One solution attaches Unique Molecular Identifiers (UMIs) to sample sequences before amplification eliminating amplification bias by clustering reads on UMI and counting clusters to quantify abundance. While modern methods improve over naive clustering by UMI identity, most do not account for UMI reuse, or collision, and they do not adequately model PCR and sequencing errors in the UMIs and sample sequences. Results: We introduce Deduplication and accurate Abundance estimation with UMIs (DAUMI), a probabilistic framework to detect true biological sequences and accurately estimate their deduplicated abundance from amplicon sequence data. DAUMI recognizes UMI collision, even on highly similar sequences, and detects and corrects most PCR and sequencing errors in the UMI and sampled sequences. We demonstrate DAUMI performs better on simulated and real data compared to other UMI-aware clustering methods. Availability: Source code is available at https://github.com/xiyupeng/AmpliCI-UMI. <<<
翻译
30.
颜林林 (2022-06-01 07:41):
#paper doi:10.1101/2022.05.29.493900 bioRxiv 2022, Cost-efficient whole genome-sequencing using novel mostly natural sequencing-by-synthesis chemistry and open fluidics platform. 这是来自MIT的一家创业公司Ultima Genomics的新作品,它从设计原理上对当前“边合成边测序”的方法进行突破创新。通过在圆形大晶片上设计流控和光学系统,使相应的试剂耗材更加便宜。相对于Illumina测序在每个cycle进行可逆阻断的碱基追加方法,本文通过使用非阻断的方法,使碱基追加过程更加快速,同时配合一套CNN算法,来实现准确的base calling。实测下来,该测序方法可以做到在20小时以内、300bp长读长、Q30>85%高质量的高通量测序,且每Gb数据成本低于1美元。本文还使用GIAB及千人基因组的样本进行了基准测试,验证了测序结果的准确度。我们很多人天天都在围绕高通量测序做研究,早已把Illumina测序原理当做习以为常且理所当然的技术,默认了它的垄断和天花板地位,很少去考虑它还有什么可以进一步改善的地方。这篇文章是个拓展这方面眼界的机会。
Abstract:
We introduce a massively parallel novel sequencing platform that combines an open flow cell design on a circular wafer with a large surface area and mostly natural nucleotides that allow … >>>
We introduce a massively parallel novel sequencing platform that combines an open flow cell design on a circular wafer with a large surface area and mostly natural nucleotides that allow optical end-point detection without reversible terminators. This platform enables sequencing billions of reads with longer read length (~300bp) and fast runs times (<20hrs) with high base accuracy (Q30 > 85%), at a low cost of $1/Gb. We establish system performance by whole-genome sequencing of the Genome-In-A-Bottle reference samples HG001-7, demonstrating high accuracy for SNPs (99.6%) and Indels in homopolymers up to length 10 (96.4%) across the vast majority (>98%) of the defined high-confidence regions of these samples. We demonstrate scalability of the whole-genome sequencing workflow by sequencing an additional 224 selected samples from the 1000 Genomes project achieving high concordance with reference data. <<<
翻译
31.
颜林林 (2022-03-06 20:48):
#paper doi:10.1101/2021.07.19.452956, bioRxiv, 2022, The Tabula Sapiens: a multiple organ single cell transcriptomic atlas of humans. 这是一篇preprint,介绍了对于单细胞转录组测序而言非常重磅的一项资源。它纳入了15位捐赠者(一般由于中风、外伤或缺氧等导致死亡,参见:https://tabula-sapiens-portal.ds.czbiohub.org/whereisthedata)所提供的24个不同组织器官,分离得到将近50万个单细胞,分别进行了10x和/或SmartSeq2的单细胞转录组测序技术,分析得到400多种细胞类型的组织特异性表达数据,提供了组织间T细胞克隆分布、B细胞组织特异性突变率、细胞周期状态及不同细胞在组织器官之间的分布、个体不同组织间细胞类型特异性RNA剪接形式等重要参考基准图谱信息。同时,通过对样本进行病理切片和H&E染色等分析,也将转录组数据与宏观临床相关信息,如不同组织类型的空间异质性、细胞相对丰度估计等都做了关联和讨论。这个项目由 Tabula Sapiens Consortium 执行,其数据(包括原始测序数据和分析结果)存放在AWS、FigShare、CellXGene等平台,供全世界开放使用(但不允许在未征得该委员会及合作方同意前发表图谱或组织规模的数据分析文章),相关信息可在项目网站(https://tabula-sapiens-portal.ds.czbiohub.org/)上找到,该网站还提供了一套流程,帮助用户使用其结果来注释和解读自己的数据。有两点很值得一提:一、该委员会及项目主要由 Chan Zuckerberg Initiative 基金会支持,该基金会由 Facebook创始人马克·扎克伯格及其妻子普莉希拉·陈(生物学专业)共同创办,bioRxiv和medRxiv也是由该基金会支持建立和维持运营;二、这篇文章的通讯作者Stephen R Quake,是生物技术领域的超级大牛,他也应该是在很早期将自己基因组贡献出来验证相关高通量测序技术的名人之一,可参见2009年NBT文章(doi:10.1038/nbt.1561),该文章的受试者P0(猜测很可能就是Quake本人),基于已成为历史的Helicos Biosciences公司的单分子高通量测序技术(应该属于三代测序体系;要知道,二代测序的兴起,也仅仅开始于2008年左右),测定了该技术的最早人全基因组数据。Quake的贡献及事迹这里不做展开,有兴趣者可自行搜索。
回到顶部