To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis

The impact of macroparasites on their hosts is proportional to the number of parasites per host, or parasite abundance. Abundance values are count data, i.e. integers ranging from 0 to some maximum number, depending on the host–parasite system. When using parasite abundance as a predictor in statist...

Full description

Saved in:
Bibliographic Details
Main Author: Robert Poulin
Format: Article
Language:English
Published: Cambridge University Press
Series:Parasitology
Subjects:
Online Access:https://www.cambridge.org/core/product/identifier/S003118202500040X/type/journal_article
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849394608709042176
author Robert Poulin
author_facet Robert Poulin
author_sort Robert Poulin
collection DOAJ
description The impact of macroparasites on their hosts is proportional to the number of parasites per host, or parasite abundance. Abundance values are count data, i.e. integers ranging from 0 to some maximum number, depending on the host–parasite system. When using parasite abundance as a predictor in statistical analysis, a common approach is to bin values, i.e. group hosts into infection categories based on abundance, and test for differences in some response variable (e.g. a host trait) among these categories. There are well-documented pitfalls associated with this approach. Here, I use a literature review to show that binning abundance values for analysis has been used in one-third of studies published in parasitological journals over the past 15 years, and half of the studies in ecological and behavioural journals, often without any justification. Binning abundance data into arbitrary categories has been much more common among studies using experimental infections than among those using naturally infected hosts. I then use simulated data to demonstrate that true and significant relationships between parasite abundance and host traits can be missed when abundance values are binned for analysis, and vice versa that when there is no underlying relationship between abundance and host traits, analysis of binned data can create a spurious one. This holds regardless of the prevalence of infection or the level of parasite aggregation in a host sample. These findings argue strongly for the practice of binning abundance data as a predictor variable to be abandoned in favour of more appropriate analytical approaches.
format Article
id doaj-art-326b57f7cdcd4ef4b688504b719748e5
institution Kabale University
issn 0031-1820
1469-8161
language English
publisher Cambridge University Press
record_format Article
series Parasitology
spelling doaj-art-326b57f7cdcd4ef4b688504b719748e52025-08-20T03:39:57ZengCambridge University PressParasitology0031-18201469-81611810.1017/S003118202500040XTo bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysisRobert Poulin0https://orcid.org/0000-0003-1390-1206Department of Zoology, University of Otago, Dunedin, New ZealandThe impact of macroparasites on their hosts is proportional to the number of parasites per host, or parasite abundance. Abundance values are count data, i.e. integers ranging from 0 to some maximum number, depending on the host–parasite system. When using parasite abundance as a predictor in statistical analysis, a common approach is to bin values, i.e. group hosts into infection categories based on abundance, and test for differences in some response variable (e.g. a host trait) among these categories. There are well-documented pitfalls associated with this approach. Here, I use a literature review to show that binning abundance values for analysis has been used in one-third of studies published in parasitological journals over the past 15 years, and half of the studies in ecological and behavioural journals, often without any justification. Binning abundance data into arbitrary categories has been much more common among studies using experimental infections than among those using naturally infected hosts. I then use simulated data to demonstrate that true and significant relationships between parasite abundance and host traits can be missed when abundance values are binned for analysis, and vice versa that when there is no underlying relationship between abundance and host traits, analysis of binned data can create a spurious one. This holds regardless of the prevalence of infection or the level of parasite aggregation in a host sample. These findings argue strongly for the practice of binning abundance data as a predictor variable to be abandoned in favour of more appropriate analytical approaches.https://www.cambridge.org/core/product/identifier/S003118202500040X/type/journal_articleaggregationcontinuous datacorrelationcount datainferenceprevalence
spellingShingle Robert Poulin
To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis
Parasitology
aggregation
continuous data
correlation
count data
inference
prevalence
title To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis
title_full To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis
title_fullStr To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis
title_full_unstemmed To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis
title_short To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis
title_sort to bin or not to bin why parasite abundance data should not be lumped into categories for statistical analysis
topic aggregation
continuous data
correlation
count data
inference
prevalence
url https://www.cambridge.org/core/product/identifier/S003118202500040X/type/journal_article
work_keys_str_mv AT robertpoulin tobinornottobinwhyparasiteabundancedatashouldnotbelumpedintocategoriesforstatisticalanalysis