来自杂志 BMC Bioinformatics 的文献。
当前共找到 2 篇文献分享。
1.
白鸟
(2025-05-27 15:06):
#paper doi:10.1186/s12859-023-05603-7, A score-based method of immune status evaluation for healthy individuals with complete blood cell counts.
文章介绍了基于16715名健康个体全血细胞计数 (CBC) 的免疫状态评分模型。
主要步骤如下:
1.数据采集:16715 名健康个体的CBC免疫相关的15个免疫指标;
2.数据质控:剔除细菌感染和炎症指标感染的数据;
3.数据归一化:三平台归一化,即log_norm归一化;
4.免疫状态聚类:利用期望最大化(EM-GMM)技术对高斯混合模型优化,聚类,免疫状态分三组,良好/中等/较差;
5.CBC指标与免疫状态的相关性评估:采用 RF、LightGBM 和 XGBoost 算法来评估各CBC指标与免疫状态之间的相关性(权重);权重反映CBC指标与人体免疫状态的相关程度;
6.免疫力评分计算:加权和模型,scores= a1*WBC+a2*NEUT+...+a15*BLR;
7.免疫状态评估:免疫状态曲线(age-score曲线):三阶多项式回归模型;
免疫评分>年龄的拟合值:免疫健康;
免疫评分<年龄的拟合值:免疫状态欠佳或亚健康;
研究意义:健康人的异常免疫状态进行早期预警;
BMC Bioinformatics,
2023-12-11.
DOI: 10.1186/s12859-023-05603-7
Abstract:
Abstract Background With the COVID-19 outbreak, an increasing number of individuals are concerned about their health, particularly their immune status. However, as of now, there is no available algorithm that …
>>>
Abstract Background With the COVID-19 outbreak, an increasing number of individuals are concerned about their health, particularly their immune status. However, as of now, there is no available algorithm that effectively assesses the immune status of normal, healthy individuals. In response to this, a new score-based method is proposed that utilizes complete blood cell counts (CBC) to provide early warning of disease risks, such as COVID-19. Methods First, data on immune-related CBC measurements from 16,715 healthy individuals were collected. Then, a three-platform model was developed to normalize the data, and a Gaussian mixture model was optimized with expectation maximization (EM-GMM) to cluster the immune status of healthy individuals. Based on the results, Random Forest (RF), Light Gradient Boosting Machine (LightGBM) and Extreme Gradient Boosting (XGBoost) were used to determine the correlation of each CBC index with the immune status. Consequently, a weighted sum model was constructed to calculate a continuous immunity score, enabling the evaluation of immune status. Results The results demonstrated a significant negative correlation between the immunity score and the age of healthy individuals, thereby validating the effectiveness of the proposed method. In addition, a nonlinear polynomial regression model was developed to depict this trend. By comparing an individual’s immune status with the reference value corresponding to their age, their immune status can be evaluated. Conclusion In summary, this study has established a novel model for evaluating the immune status of healthy individuals, providing a good approach for early detection of abnormal immune status in healthy individuals. It is helpful in early warning of the risk of infectious diseases and of significant importance.
<<<
翻译
2.
前进
(2024-09-30 16:31):
#paper DOI 10.1186/1471-2105-12-451 Frazer Meacham, Dario Boffelli, Joseph , Identification and correction of systematic error in high-throughput sequence data 这篇论文主要研究了高通量测序数据中系统性错误的问题。系统性错误是指在基因组(或转录组)特定位置的测序读段中,以统计上不太可能的方式累积出现的错误。作者们通过使用高覆盖率数据中的重叠配对读段来表征和描述系统性错误,发现这类错误大约每1000个碱基对中发生一次,并且在不同实验中高度可复制。他们识别了在系统性错误位点频繁出现的序列,并设计了一个分类器,用于区分杂合位点和系统性错误。这个分类器可以用于处理杂合位点等位基因频率不一定为0.5的实验数据,并且可以用于单端数据集。论文的结论是,系统性错误可能很容易被误认为是个体中的杂合位点,或者是群体分析中的SNPs。作者们通过系统性错误的特征描述,开发了一个名为SysCall的程序,用于识别和纠正这类错误,并得出结论认为,在设计和解释高通量测序实验时,考虑纠正系统性错误是很重要的。
BMC Bioinformatics,
2011-12.
DOI: 10.1186/1471-2105-12-451
Abstract:
Abstract Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from …
>>>
Abstract Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. Results We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. Conclusions Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments.
<<<
翻译