白鸟 (2022-12-31 23:22):
#paper https://doi.org/10.1016/j.csbj.2020.06.012 Computational and Structural Biotechnology Journal 2020. Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation. 关注点:这是一篇关于单细胞ATAC-seq分析的综述文章,比较系统地从数据的预处理到生成科学假设的过程进行了详细方法论的说明和基准测试,使用适当的软件工具和数据库,提供有价值的分析方法指导。 研究背景:与人类复杂性状相关的大多数遗传变异位于基因组非编码区域。因此,了解基因型到表型之间的生物学机理机制的研究,大多涉及基因表达的表观遗传调控。开放染色质区域的全基因组图谱可以通过顺式和反式调控元件与性状相关序列变异的关联分析,促进顺式和跨式调控元件的功能分析。ATAC-seq测序 技术,转座酶可及染色质分析被认为是染色质可及性全基因组分析中最容易获得且最具成本效益的策略。 研究不足:目前,还开发了单细胞 ATAC-seq (scATAC-seq) 技术,来研究不同异质细胞群的组织样本中细胞类型特异性染色质的可及性差异。但是,由于 scATAC-seq 数据的固有特性,高噪声和稀疏性,很难准确提取生物信号并设计有效的生物学假设。为了克服 scATAC-seq 数据分析中的这些限制,过去几年研究者开发了一些新的方法和软件工具。然而,scATAC-seq 数据分析的最佳和标准分析流程并未达成共识。 内容大纲:1.阐述scATAC-seq 分析工作流程:数据的预处理,测序read的预处理->过滤掉低质量细胞或双细胞->生成细胞-特征矩阵->多样本的批次校正和数据整合->数据转换,包括归一化->降维、可视化和聚类。以上跟scrna-seq的步骤很相似,又有其特殊性。2.scATAC-seq生成科学假设的下游分析:包括细胞类型注释,染色质可及性动力学研究,基于TF motif,基于基因,增强子,基因-疾病相关遗传变异的研究促进假说的生成。以阐明顺式调控元件(例如启动子和增强子)与反式调控元件(例如转录因子 (TF))之间的网络。还可以使用 scATAC-seq 数据分析基因活性和遗传变异的可及性。3.多模态分析:scATAC-seq 可以与单细胞 RNA 测序 (scRNA-seq) 数据 和其他组学数据相结合,用于多组学研究。这种综合多模态分析将有助于识别参与疾病进展的关键调节因子,这些调节因子通常是潜在的治疗靶点和诊断生物标志物。
Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation
翻译
Abstract:
Most genetic variations associated with human complex traits are located in non-coding genomic regions. Therefore, understanding the genotype-to-phenotype axis requires a comprehensive catalog of functional non-coding genomic elements, most of which are involved in epigenetic regulation of gene expression. Genome-wide maps of open chromatin regions can facilitate functional analysis of cis- and trans-regulatory elements via their connections with trait-associated sequence variants. Currently, Assay for Transposase Accessible Chromatin with high-throughput sequencing (ATAC-seq) is considered the most accessible and cost-effective strategy for genome-wide profiling of chromatin accessibility. Single-cell ATAC-seq (scATAC-seq) technology has also been developed to study cell type-specific chromatin accessibility in tissue samples containing a heterogeneous cellular population. However, due to the intrinsic nature of scATAC-seq data, which are highly noisy and sparse, accurate extraction of biological signals and devising effective biological hypothesis are difficult. To overcome such limitations in scATAC-seq data analysis, new methods and software tools have been developed over the past few years. Nevertheless, there is no consensus for the best practice of scATAC-seq data analysis yet. In this review, we discuss scATAC-seq technology and data analysis methods, ranging from preprocessing to downstream analysis, along with an up-to-date list of published studies that involved the application of this method. We expect this review will provide a guideline for successful data generation and analysis methods using appropriate software tools and databases for the study of chromatin accessibility at single-cell resolution.
翻译
回到顶部