Vincent
(2023-01-31 14:45):
#paper doi:https://doi.org/10.1186/s13059-021-02388-x Gene set enrichment analysis for genome-wide DNA methylation data. Genome Biology 2021. 甲基化芯片相比WGBS而言所需要的费用更低,其被广泛用于DNA甲基化的测量。过去的研究主要着重于甲基化芯片的数据处理和甲基化差异分析上,对基因集富集分析的关注较少,这篇文章提出了一个基于甲基化差异分析结果的的基因集富集分析:GOmeth(适用于探针层面的差异分析数据)和GOregion(适用于区域层面的差异分析数据)。具体来说,CpG位点在基因组上的分布并不是均匀的,不同基因附近的CpG位点数量并不一样多,这导致依照甲基化差异分析选择相邻基因做富集分析时,CpG较多的基因更容易被选中,给富集分析带来偏差。同时同一个CpG位点可能位于好几个基因附近(大概占总数的8%),导致那些差异甲基化的基因并不是独立获得的,也会给基因集富集分析带来偏差。这篇文章的方案调整了富集分析中CpG位点的权重和统计分布,通过数据仿真和重复抽样的方法探究了上述两种偏差对基因集富集分析的影响,同时也验证了提出的方法能够很好的控制错误发现率(FDR),同时能给更加biological meaningful的通路分析结果
Gene set enrichment analysis for genome-wide DNA methylation data
翻译
Abstract:
DNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalization, and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches, and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.
翻译