Modeling Tick Populations: An Ecological Test Case for Gradient Boosted Trees
General linear models have been the foundational statistical framework used to discover the ecological processes that explain the distribution and abundance of natural populations. Analyses of the rapidly expanding cache of environmental and ecological data, however, require advanced statistical met...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Peer Community In
2023-12-01
|
Series: | Peer Community Journal |
Subjects: | |
Online Access: | https://peercommunityjournal.org/articles/10.24072/pcjournal.353/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1825206417632526336 |
---|---|
author | Manley, William Tran, Tam Prusinski, Melissa Brisson, Dustin |
author_facet | Manley, William Tran, Tam Prusinski, Melissa Brisson, Dustin |
author_sort | Manley, William |
collection | DOAJ |
description | General linear models have been the foundational statistical framework used to discover the ecological processes that explain the distribution and abundance of natural populations. Analyses of the rapidly expanding cache of environmental and ecological data, however, require advanced statistical methods to contend with complexities inherent to extremely large natural data sets. Modern machine learning frameworks such as gradient boosted trees efficiently identify complex ecological relationships in massive data sets, which are expected to result in accurate predictions of the distribution and abundance of organisms in nature. However, rigorous assessments of the theoretical advantages of these methodologies on natural data sets are rare. Here we compare the abilities of gradient boosted and linear models to identify environmental features that explain observed variations in the distribution and abundance of blacklegged tick (Ixodes scapularis) populations in a data set collected across New York State over a ten-year period. The gradient boosted and linear models use similar environmental features to explain tick demography, although the gradient boosted models found non-linear relationships and interactions that are difficult to anticipate and often impractical to identify with a linear modeling framework. Further, the gradient boosted models predicted the distribution and abundance of ticks in years and areas beyond the training data with much greater accuracy than their linear model counterparts. The flexible gradient boosting framework also permitted additional model types that provide practical advantages for tick surveillance and public health. The results highlight the potential of gradient boosted models to discover novel ecological phenomena affecting pathogen demography and as a powerful public health tool to mitigate disease risks.
|
format | Article |
id | doaj-art-98a6190e59a54d1a934dbedc8cddfba3 |
institution | Kabale University |
issn | 2804-3871 |
language | English |
publishDate | 2023-12-01 |
publisher | Peer Community In |
record_format | Article |
series | Peer Community Journal |
spelling | doaj-art-98a6190e59a54d1a934dbedc8cddfba32025-02-07T10:16:48ZengPeer Community InPeer Community Journal2804-38712023-12-01310.24072/pcjournal.35310.24072/pcjournal.353Modeling Tick Populations: An Ecological Test Case for Gradient Boosted Trees Manley, William0https://orcid.org/0009-0004-6436-7845Tran, Tam1https://orcid.org/0000-0002-7750-3592Prusinski, Melissa2https://orcid.org/0000-0001-6538-623XBrisson, Dustin3https://orcid.org/0000-0002-9493-7579Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, USADepartment of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, USANew York State Department of Health, Albany, New York, USADepartment of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, USAGeneral linear models have been the foundational statistical framework used to discover the ecological processes that explain the distribution and abundance of natural populations. Analyses of the rapidly expanding cache of environmental and ecological data, however, require advanced statistical methods to contend with complexities inherent to extremely large natural data sets. Modern machine learning frameworks such as gradient boosted trees efficiently identify complex ecological relationships in massive data sets, which are expected to result in accurate predictions of the distribution and abundance of organisms in nature. However, rigorous assessments of the theoretical advantages of these methodologies on natural data sets are rare. Here we compare the abilities of gradient boosted and linear models to identify environmental features that explain observed variations in the distribution and abundance of blacklegged tick (Ixodes scapularis) populations in a data set collected across New York State over a ten-year period. The gradient boosted and linear models use similar environmental features to explain tick demography, although the gradient boosted models found non-linear relationships and interactions that are difficult to anticipate and often impractical to identify with a linear modeling framework. Further, the gradient boosted models predicted the distribution and abundance of ticks in years and areas beyond the training data with much greater accuracy than their linear model counterparts. The flexible gradient boosting framework also permitted additional model types that provide practical advantages for tick surveillance and public health. The results highlight the potential of gradient boosted models to discover novel ecological phenomena affecting pathogen demography and as a powerful public health tool to mitigate disease risks. https://peercommunityjournal.org/articles/10.24072/pcjournal.353/Ticks; Lyme Disease; Ecology; Statistical Ecology; Species Distribution Modeling; Machine Learning |
spellingShingle | Manley, William Tran, Tam Prusinski, Melissa Brisson, Dustin Modeling Tick Populations: An Ecological Test Case for Gradient Boosted Trees Peer Community Journal Ticks; Lyme Disease; Ecology; Statistical Ecology; Species Distribution Modeling; Machine Learning |
title | Modeling Tick Populations: An Ecological Test Case for Gradient Boosted Trees
|
title_full | Modeling Tick Populations: An Ecological Test Case for Gradient Boosted Trees
|
title_fullStr | Modeling Tick Populations: An Ecological Test Case for Gradient Boosted Trees
|
title_full_unstemmed | Modeling Tick Populations: An Ecological Test Case for Gradient Boosted Trees
|
title_short | Modeling Tick Populations: An Ecological Test Case for Gradient Boosted Trees
|
title_sort | modeling tick populations an ecological test case for gradient boosted trees |
topic | Ticks; Lyme Disease; Ecology; Statistical Ecology; Species Distribution Modeling; Machine Learning |
url | https://peercommunityjournal.org/articles/10.24072/pcjournal.353/ |
work_keys_str_mv | AT manleywilliam modelingtickpopulationsanecologicaltestcaseforgradientboostedtrees AT trantam modelingtickpopulationsanecologicaltestcaseforgradientboostedtrees AT prusinskimelissa modelingtickpopulationsanecologicaltestcaseforgradientboostedtrees AT brissondustin modelingtickpopulationsanecologicaltestcaseforgradientboostedtrees |