Vincent (2022-08-31 13:52):
#paper  https://doi.org/10.1038/s41580-021-00407-0, Nat Rev Mol Cell Biol, 2021, A guide to machine learning for biologists. 这篇review paper深入浅出的介绍了各类机器学习算法和在生物领域的应用。文章一开始先梳理了很多ML的关键概念(例如机器学习算法的分类,overfitting/underfitting,bias-variance tradeoff)。随后分别介绍了传统机器学习算法(PCA, k-means, SVM, ridge regression, randomforest等),基于深度学习的算法(CNN, RNN, transformer, autoencoder等),描述了每种算法的优缺点和并且探讨了在生物学数据中使用机器学习算法的最佳实践。文章最后还介绍了机器学习算法在生物学领域的所面临的的挑战,例如数据可得性, 数据泄露, 模型可解释性,以及隐私保护方面的问题。感兴趣的可以看看,是一篇十分不错的参考文献。
A guide to machine learning for biologists
翻译
Abstract:
The expanding scale and inherent complexity of biological data have encouraged a growing use of machine learning in biology to build informative and predictive models of the underlying biological processes. All machine learning techniques fit models to data; however, the specific methods are quite varied and can at first glance seem bewildering. In this Review, we aim to provide readers with a gentle introduction to a few key machine learning techniques, including the most recently developed and widely used techniques involving deep neural networks. We describe how different techniques may be suited to specific types of biological data, and also discuss some best practices and points to consider when one is embarking on experiments involving machine learning. Some emerging directions in machine learning methodology are also discussed.
翻译
回到顶部