mlr3spatiotempcv: Spatiotemporal Resampling Methods for Machine Learning in R

Spatial and spatiotemporal machine-learning models require a suitable framework for their model assessment, model selection, and hyperparameter tuning, in order to avoid error estimation bias and over-fitting. This contribution provides an overview of the state-of-the-art in spatial and spatiotempo...

Full description

Saved in:
Bibliographic Details
Main Authors: Patrick Schratz, Marc Becker, Michel Lang, Alexander Brenning
Format: Article
Language:English
Published: Foundation for Open Access Statistics 2024-11-01
Series:Journal of Statistical Software
Online Access:https://www.jstatsoft.org/index.php/jss/article/view/4778
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850035394837479424
author Patrick Schratz
Marc Becker
Michel Lang
Alexander Brenning
author_facet Patrick Schratz
Marc Becker
Michel Lang
Alexander Brenning
author_sort Patrick Schratz
collection DOAJ
description Spatial and spatiotemporal machine-learning models require a suitable framework for their model assessment, model selection, and hyperparameter tuning, in order to avoid error estimation bias and over-fitting. This contribution provides an overview of the state-of-the-art in spatial and spatiotemporal cross-validation techniques and their implementations in R while introducing the R package mlr3spatiotempcv as an extension package of the machine-learning framework mlr3. Currently various R packages implementing different spatiotemporal partitioning strategies exist: blockCV, CAST, skmeans and sperrorest. The goal of mlr3spatiotempcv is to gather the available spatiotemporal resampling methods in R and make them available to users through a simple and common interface. This is made possible by integrating the package directly into the mlr3 machine-learning framework, which already has support for generic non-spatiotemporal resampling methods such as random partitioning. One advantage is the use of a consistent nomenclature in an overarching machine-learning toolkit instead of a varying package-specific syntax, making it easier for users to choose from a variety of spatiotemporal resampling methods. This package avoids giving recommendations which method to use in practice as this decision depends on the predictive task at hand, the autocorrelation within the data, and the spatial structure of the sampling design or geographic objects being studied.
format Article
id doaj-art-8ff553d4a8824ab09a09e603eef41fad
institution DOAJ
issn 1548-7660
language English
publishDate 2024-11-01
publisher Foundation for Open Access Statistics
record_format Article
series Journal of Statistical Software
spelling doaj-art-8ff553d4a8824ab09a09e603eef41fad2025-08-20T02:57:30ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602024-11-01111110.18637/jss.v111.i07mlr3spatiotempcv: Spatiotemporal Resampling Methods for Machine Learning in RPatrick Schratz0Marc Becker1Michel Lang2Alexander Brenning3Friedrich-Schiller-University JenaLudwig-Maximilians-Universität MünchenTU Dortmund UniversityFriedrich Schiller University Jena Spatial and spatiotemporal machine-learning models require a suitable framework for their model assessment, model selection, and hyperparameter tuning, in order to avoid error estimation bias and over-fitting. This contribution provides an overview of the state-of-the-art in spatial and spatiotemporal cross-validation techniques and their implementations in R while introducing the R package mlr3spatiotempcv as an extension package of the machine-learning framework mlr3. Currently various R packages implementing different spatiotemporal partitioning strategies exist: blockCV, CAST, skmeans and sperrorest. The goal of mlr3spatiotempcv is to gather the available spatiotemporal resampling methods in R and make them available to users through a simple and common interface. This is made possible by integrating the package directly into the mlr3 machine-learning framework, which already has support for generic non-spatiotemporal resampling methods such as random partitioning. One advantage is the use of a consistent nomenclature in an overarching machine-learning toolkit instead of a varying package-specific syntax, making it easier for users to choose from a variety of spatiotemporal resampling methods. This package avoids giving recommendations which method to use in practice as this decision depends on the predictive task at hand, the autocorrelation within the data, and the spatial structure of the sampling design or geographic objects being studied. https://www.jstatsoft.org/index.php/jss/article/view/4778
spellingShingle Patrick Schratz
Marc Becker
Michel Lang
Alexander Brenning
mlr3spatiotempcv: Spatiotemporal Resampling Methods for Machine Learning in R
Journal of Statistical Software
title mlr3spatiotempcv: Spatiotemporal Resampling Methods for Machine Learning in R
title_full mlr3spatiotempcv: Spatiotemporal Resampling Methods for Machine Learning in R
title_fullStr mlr3spatiotempcv: Spatiotemporal Resampling Methods for Machine Learning in R
title_full_unstemmed mlr3spatiotempcv: Spatiotemporal Resampling Methods for Machine Learning in R
title_short mlr3spatiotempcv: Spatiotemporal Resampling Methods for Machine Learning in R
title_sort mlr3spatiotempcv spatiotemporal resampling methods for machine learning in r
url https://www.jstatsoft.org/index.php/jss/article/view/4778
work_keys_str_mv AT patrickschratz mlr3spatiotempcvspatiotemporalresamplingmethodsformachinelearninginr
AT marcbecker mlr3spatiotempcvspatiotemporalresamplingmethodsformachinelearninginr
AT michellang mlr3spatiotempcvspatiotemporalresamplingmethodsformachinelearninginr
AT alexanderbrenning mlr3spatiotempcvspatiotemporalresamplingmethodsformachinelearninginr