Evaluating the factors influencing accuracy, interpretability, and reproducibility in the use of machine learning classifiers in biology to enable standardization
Abstract The complexity and variability of biological data has promoted the increased use of machine learning methods to understand processes and predict outcomes. These same features complicate reliable, reproducible, interpretable, and responsible use of such methods, resulting in questionable rel...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-00245-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850132778715185152 |
|---|---|
| author | Kaitlyn M. Martinez Kristen Wilding Trent R. Llewellyn Daniel E. Jacobsen Makaela M. Montoya Jessica Z. Kubicek-Sutherland Sweta Batni Carrie Manore Harshini Mukundan |
| author_facet | Kaitlyn M. Martinez Kristen Wilding Trent R. Llewellyn Daniel E. Jacobsen Makaela M. Montoya Jessica Z. Kubicek-Sutherland Sweta Batni Carrie Manore Harshini Mukundan |
| author_sort | Kaitlyn M. Martinez |
| collection | DOAJ |
| description | Abstract The complexity and variability of biological data has promoted the increased use of machine learning methods to understand processes and predict outcomes. These same features complicate reliable, reproducible, interpretable, and responsible use of such methods, resulting in questionable relevance of the derived. outcomes. Here we systematically explore challenges associated with applying machine learning to predict and understand biological processes using a well- characterized in vitro experimental system. We evaluated factors that vary while applying machine learning classifers: (1) type of biochemical signature (transcripts vs. proteins), (2) data curation methods (pre- and post-processing), and (3) choice of machine learning classifier. Using accuracy, generalizability, interpretability, and reproducibility as metrics, we found that the above factors significantly mod- ulate outcomes even within a simple model system. Our results caution against the unregulated use of machine learning methods in the biological sciences, and strongly advocate the need for data standards and validation tool-kits for such studies. |
| format | Article |
| id | doaj-art-da2ab013b04e4669abe50265fb0d76ae |
| institution | OA Journals |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-da2ab013b04e4669abe50265fb0d76ae2025-08-20T02:32:07ZengNature PortfolioScientific Reports2045-23222025-05-0115111210.1038/s41598-025-00245-6Evaluating the factors influencing accuracy, interpretability, and reproducibility in the use of machine learning classifiers in biology to enable standardizationKaitlyn M. Martinez0Kristen Wilding1Trent R. Llewellyn2Daniel E. Jacobsen3Makaela M. Montoya4Jessica Z. Kubicek-Sutherland5Sweta Batni6Carrie Manore7Harshini Mukundan8A-1 Information Systems and Modeling, Los Alamos National LaboratoryT-6 Theoretical Biology and Biophysics, Los Alamos National LaboratoryC-PCS Physical Chemistry and Applied Spectroscopy, Los Alamos National LaboratoryC-PCS Physical Chemistry and Applied Spectroscopy, Los Alamos National LaboratoryC-PCS Physical Chemistry and Applied Spectroscopy, Los Alamos National LaboratoryC-PCS Physical Chemistry and Applied Spectroscopy, Los Alamos National LaboratoryDefense Threat Reduction AgencyT-6 Theoretical Biology and Biophysics, Los Alamos National LaboratoryC-PCS Physical Chemistry and Applied Spectroscopy, Los Alamos National LaboratoryAbstract The complexity and variability of biological data has promoted the increased use of machine learning methods to understand processes and predict outcomes. These same features complicate reliable, reproducible, interpretable, and responsible use of such methods, resulting in questionable relevance of the derived. outcomes. Here we systematically explore challenges associated with applying machine learning to predict and understand biological processes using a well- characterized in vitro experimental system. We evaluated factors that vary while applying machine learning classifers: (1) type of biochemical signature (transcripts vs. proteins), (2) data curation methods (pre- and post-processing), and (3) choice of machine learning classifier. Using accuracy, generalizability, interpretability, and reproducibility as metrics, we found that the above factors significantly mod- ulate outcomes even within a simple model system. Our results caution against the unregulated use of machine learning methods in the biological sciences, and strongly advocate the need for data standards and validation tool-kits for such studies.https://doi.org/10.1038/s41598-025-00245-6Machine learningBiological dataStandardizationLipopolysaccharide |
| spellingShingle | Kaitlyn M. Martinez Kristen Wilding Trent R. Llewellyn Daniel E. Jacobsen Makaela M. Montoya Jessica Z. Kubicek-Sutherland Sweta Batni Carrie Manore Harshini Mukundan Evaluating the factors influencing accuracy, interpretability, and reproducibility in the use of machine learning classifiers in biology to enable standardization Scientific Reports Machine learning Biological data Standardization Lipopolysaccharide |
| title | Evaluating the factors influencing accuracy, interpretability, and reproducibility in the use of machine learning classifiers in biology to enable standardization |
| title_full | Evaluating the factors influencing accuracy, interpretability, and reproducibility in the use of machine learning classifiers in biology to enable standardization |
| title_fullStr | Evaluating the factors influencing accuracy, interpretability, and reproducibility in the use of machine learning classifiers in biology to enable standardization |
| title_full_unstemmed | Evaluating the factors influencing accuracy, interpretability, and reproducibility in the use of machine learning classifiers in biology to enable standardization |
| title_short | Evaluating the factors influencing accuracy, interpretability, and reproducibility in the use of machine learning classifiers in biology to enable standardization |
| title_sort | evaluating the factors influencing accuracy interpretability and reproducibility in the use of machine learning classifiers in biology to enable standardization |
| topic | Machine learning Biological data Standardization Lipopolysaccharide |
| url | https://doi.org/10.1038/s41598-025-00245-6 |
| work_keys_str_mv | AT kaitlynmmartinez evaluatingthefactorsinfluencingaccuracyinterpretabilityandreproducibilityintheuseofmachinelearningclassifiersinbiologytoenablestandardization AT kristenwilding evaluatingthefactorsinfluencingaccuracyinterpretabilityandreproducibilityintheuseofmachinelearningclassifiersinbiologytoenablestandardization AT trentrllewellyn evaluatingthefactorsinfluencingaccuracyinterpretabilityandreproducibilityintheuseofmachinelearningclassifiersinbiologytoenablestandardization AT danielejacobsen evaluatingthefactorsinfluencingaccuracyinterpretabilityandreproducibilityintheuseofmachinelearningclassifiersinbiologytoenablestandardization AT makaelammontoya evaluatingthefactorsinfluencingaccuracyinterpretabilityandreproducibilityintheuseofmachinelearningclassifiersinbiologytoenablestandardization AT jessicazkubiceksutherland evaluatingthefactorsinfluencingaccuracyinterpretabilityandreproducibilityintheuseofmachinelearningclassifiersinbiologytoenablestandardization AT swetabatni evaluatingthefactorsinfluencingaccuracyinterpretabilityandreproducibilityintheuseofmachinelearningclassifiersinbiologytoenablestandardization AT carriemanore evaluatingthefactorsinfluencingaccuracyinterpretabilityandreproducibilityintheuseofmachinelearningclassifiersinbiologytoenablestandardization AT harshinimukundan evaluatingthefactorsinfluencingaccuracyinterpretabilityandreproducibilityintheuseofmachinelearningclassifiersinbiologytoenablestandardization |