A method to simulate multivariate outliers with known mahalanobis distances for normal and non-normal data

Monte Carlo simulations and theoretical analyses have repeatedly demonstrated the impact of outliers on statistical analysis. Most simulation studies generate outliers using one of two general approaches: by multiplying an arbitrary point by a constant or through a finite mixture. The latter can be...

Full description

Saved in:
Bibliographic Details
Main Author: Oscar L. Olvera Astivia
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Methods in Psychology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590260124000237
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850251575644127232
author Oscar L. Olvera Astivia
author_facet Oscar L. Olvera Astivia
author_sort Oscar L. Olvera Astivia
collection DOAJ
description Monte Carlo simulations and theoretical analyses have repeatedly demonstrated the impact of outliers on statistical analysis. Most simulation studies generate outliers using one of two general approaches: by multiplying an arbitrary point by a constant or through a finite mixture. The latter can be extended to multivariate settings by defining the Mahalanobis distance between the centroids of two clusters of points. Nevertheless, when researchers aim to simulate individual data points with population-level Mahalanobis distances, the number of available procedures is very limited. This article generalizes one of the few existing methods to simulate an arbitrary number of outliers in an arbitrary number of dimensions, for both multivariate normal and non-normal data. A small simulation demonstration showcases how this methodology enables new simulation designs that were either unpopular or not possible due to the lack of a data-generating algorithm. A discussion of potential implications highlights the importance of considering multivariate outliers in simulation settings.
format Article
id doaj-art-4cf94d25006048088be5c2d3ffe3ef4f
institution OA Journals
issn 2590-2601
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series Methods in Psychology
spelling doaj-art-4cf94d25006048088be5c2d3ffe3ef4f2025-08-20T01:57:52ZengElsevierMethods in Psychology2590-26012024-12-011110015710.1016/j.metip.2024.100157A method to simulate multivariate outliers with known mahalanobis distances for normal and non-normal dataOscar L. Olvera Astivia0College of Education, University of Washington, 2012 Skagit Ln, Seattle, WA 98105, United StatesMonte Carlo simulations and theoretical analyses have repeatedly demonstrated the impact of outliers on statistical analysis. Most simulation studies generate outliers using one of two general approaches: by multiplying an arbitrary point by a constant or through a finite mixture. The latter can be extended to multivariate settings by defining the Mahalanobis distance between the centroids of two clusters of points. Nevertheless, when researchers aim to simulate individual data points with population-level Mahalanobis distances, the number of available procedures is very limited. This article generalizes one of the few existing methods to simulate an arbitrary number of outliers in an arbitrary number of dimensions, for both multivariate normal and non-normal data. A small simulation demonstration showcases how this methodology enables new simulation designs that were either unpopular or not possible due to the lack of a data-generating algorithm. A discussion of potential implications highlights the importance of considering multivariate outliers in simulation settings.http://www.sciencedirect.com/science/article/pii/S2590260124000237OutlierMultivariateMahalanobis distanceSkewnessKurtosis
spellingShingle Oscar L. Olvera Astivia
A method to simulate multivariate outliers with known mahalanobis distances for normal and non-normal data
Methods in Psychology
Outlier
Multivariate
Mahalanobis distance
Skewness
Kurtosis
title A method to simulate multivariate outliers with known mahalanobis distances for normal and non-normal data
title_full A method to simulate multivariate outliers with known mahalanobis distances for normal and non-normal data
title_fullStr A method to simulate multivariate outliers with known mahalanobis distances for normal and non-normal data
title_full_unstemmed A method to simulate multivariate outliers with known mahalanobis distances for normal and non-normal data
title_short A method to simulate multivariate outliers with known mahalanobis distances for normal and non-normal data
title_sort method to simulate multivariate outliers with known mahalanobis distances for normal and non normal data
topic Outlier
Multivariate
Mahalanobis distance
Skewness
Kurtosis
url http://www.sciencedirect.com/science/article/pii/S2590260124000237
work_keys_str_mv AT oscarlolveraastivia amethodtosimulatemultivariateoutlierswithknownmahalanobisdistancesfornormalandnonnormaldata
AT oscarlolveraastivia methodtosimulatemultivariateoutlierswithknownmahalanobisdistancesfornormalandnonnormaldata