To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis
The impact of macroparasites on their hosts is proportional to the number of parasites per host, or parasite abundance. Abundance values are count data, i.e. integers ranging from 0 to some maximum number, depending on the host–parasite system. When using parasite abundance as a predictor in statist...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Cambridge University Press
|
| Series: | Parasitology |
| Subjects: | |
| Online Access: | https://www.cambridge.org/core/product/identifier/S003118202500040X/type/journal_article |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849394608709042176 |
|---|---|
| author | Robert Poulin |
| author_facet | Robert Poulin |
| author_sort | Robert Poulin |
| collection | DOAJ |
| description | The impact of macroparasites on their hosts is proportional to the number of parasites per host, or parasite abundance. Abundance values are count data, i.e. integers ranging from 0 to some maximum number, depending on the host–parasite system. When using parasite abundance as a predictor in statistical analysis, a common approach is to bin values, i.e. group hosts into infection categories based on abundance, and test for differences in some response variable (e.g. a host trait) among these categories. There are well-documented pitfalls associated with this approach. Here, I use a literature review to show that binning abundance values for analysis has been used in one-third of studies published in parasitological journals over the past 15 years, and half of the studies in ecological and behavioural journals, often without any justification. Binning abundance data into arbitrary categories has been much more common among studies using experimental infections than among those using naturally infected hosts. I then use simulated data to demonstrate that true and significant relationships between parasite abundance and host traits can be missed when abundance values are binned for analysis, and vice versa that when there is no underlying relationship between abundance and host traits, analysis of binned data can create a spurious one. This holds regardless of the prevalence of infection or the level of parasite aggregation in a host sample. These findings argue strongly for the practice of binning abundance data as a predictor variable to be abandoned in favour of more appropriate analytical approaches. |
| format | Article |
| id | doaj-art-326b57f7cdcd4ef4b688504b719748e5 |
| institution | Kabale University |
| issn | 0031-1820 1469-8161 |
| language | English |
| publisher | Cambridge University Press |
| record_format | Article |
| series | Parasitology |
| spelling | doaj-art-326b57f7cdcd4ef4b688504b719748e52025-08-20T03:39:57ZengCambridge University PressParasitology0031-18201469-81611810.1017/S003118202500040XTo bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysisRobert Poulin0https://orcid.org/0000-0003-1390-1206Department of Zoology, University of Otago, Dunedin, New ZealandThe impact of macroparasites on their hosts is proportional to the number of parasites per host, or parasite abundance. Abundance values are count data, i.e. integers ranging from 0 to some maximum number, depending on the host–parasite system. When using parasite abundance as a predictor in statistical analysis, a common approach is to bin values, i.e. group hosts into infection categories based on abundance, and test for differences in some response variable (e.g. a host trait) among these categories. There are well-documented pitfalls associated with this approach. Here, I use a literature review to show that binning abundance values for analysis has been used in one-third of studies published in parasitological journals over the past 15 years, and half of the studies in ecological and behavioural journals, often without any justification. Binning abundance data into arbitrary categories has been much more common among studies using experimental infections than among those using naturally infected hosts. I then use simulated data to demonstrate that true and significant relationships between parasite abundance and host traits can be missed when abundance values are binned for analysis, and vice versa that when there is no underlying relationship between abundance and host traits, analysis of binned data can create a spurious one. This holds regardless of the prevalence of infection or the level of parasite aggregation in a host sample. These findings argue strongly for the practice of binning abundance data as a predictor variable to be abandoned in favour of more appropriate analytical approaches.https://www.cambridge.org/core/product/identifier/S003118202500040X/type/journal_articleaggregationcontinuous datacorrelationcount datainferenceprevalence |
| spellingShingle | Robert Poulin To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis Parasitology aggregation continuous data correlation count data inference prevalence |
| title | To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis |
| title_full | To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis |
| title_fullStr | To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis |
| title_full_unstemmed | To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis |
| title_short | To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis |
| title_sort | to bin or not to bin why parasite abundance data should not be lumped into categories for statistical analysis |
| topic | aggregation continuous data correlation count data inference prevalence |
| url | https://www.cambridge.org/core/product/identifier/S003118202500040X/type/journal_article |
| work_keys_str_mv | AT robertpoulin tobinornottobinwhyparasiteabundancedatashouldnotbelumpedintocategoriesforstatisticalanalysis |