颜林林
(2023-12-27 12:42):
#paper doi:10.1101/2023.10.04.560604. bioRxiv, 2023, Federated Learning for multi-omics: a performance evaluation in Parkinson's disease. 这篇文章基于两个帕金森病研究的数据集(PPMI和PDBP),这两个数据集都入组了数百例患者和对照健康人,分别都进行了WGS和RNA-seq,获得了多组学检测的分析特征结果。通过将PPMI拆分为K折,留出一折后所剩余K-1折用于模型训练,再将模型放到PPMI预先留出的一折数据和PBMP上进行测试和性能评估。建模分别使用了集中化的机器学习方法,以及将数据拆分到多个节点(site)以采取联邦学习法,并使用了不同的联邦学习策略。结果显示,虽然样本在不同site的分散程度、联邦学习的策略等都会对最终性能有所影响,但联邦学习的最优结果,能与集中化训练的性能相当。此外,本文对联邦学习的训练时间进行评估,比集中化的方法至少高出一个数量级。虽然如此,由于联邦学习可以避免大规模数据在不同sites之间分享和传输,对于整合更广泛的数据,提升模型性能,还是有优势的。提供了对联邦学习在多组学和特别是在帕金森病预测中的应用的深入分析,展示了其作为一种协作工具在处理大规模异构数据时的潜力和挑战。
bioRxiv : the preprint server for biology,
2024-Feb-12.
DOI: 10.1101/2023.10.04.560604
PMID: 37986893
Federated Learning for multi-omics: a performance evaluation in Parkinson's disease
翻译
Abstract:
While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated Learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's Disease prediction. We find that FL model performance tracks centrally trained ML models, where the most performant FL model achieves an AUC-PR of 0.876 ± 0.009, 0.014 ± 0.003 less than its centrally trained variation. We also determine that the dispersion of samples within a federation plays a meaningful role in model performance. Our study implements several open source FL frameworks and aims to highlight some of the challenges and opportunities when applying these collaborative methods in multi-omics studies.
翻译