Fast, Robust, and Learned Distribution-Based Sorting

Despite a long history of research and achievements in designing sorting routines, new hardware features and application requirements pose advanced challenges that computer scientists are recently tackling through novel data-aware (learned) algorithms. This paper aims to better understand the streng...

Full description

Saved in:

Bibliographic Details
Main Authors:	Paolo Ferragina, Mattia Odorisio
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Sorting learned sort learned indexes learned-based data structures
Online Access:	https://ieeexplore.ieee.org/document/10918649/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850023106680193024
author	Paolo Ferragina Mattia Odorisio
author_facet	Paolo Ferragina Mattia Odorisio
author_sort	Paolo Ferragina
collection	DOAJ
description	Despite a long history of research and achievements in designing sorting routines, new hardware features and application requirements pose advanced challenges that computer scientists are recently tackling through novel data-aware (learned) algorithms. This paper aims to better understand the strengths and limitations of learned sorters for numerical data, by addressing two main research and engineering questions: (Q1) Which is the best trade-off, in sorting items, between the efficacy of a “learned” model in approximating the distribution of the input data and its time-space complexity? and (Q2) Which algorithmic structure for distribution-based sorting is suited to leverage a learned model in order to achieve state-of-the-art performance? To answer Q1, we implement a variant of the best-known learned model (i.e., the two-layers RMI) and, also, design a fully new learned model that turns out to be space-time efficient and efficacious in approximating adversarial input distributions. We experimentally test them over 11 datasets of 200M and 800M 64-bit floating-point items, thus offering a comprehensive answer to Q1. We then address Q2 by plugging the best-learned models from above into a distribution-based sorting scheme that leads to three new sorters whose performance is tested over 33 datasets (real and synthetic) of sizes up to 800M items. Our experimental results will show that our sorters perform better than 6 (classic and learned) best-known sorters on 29 out of those 33 datasets, thus achieving new state-of-the-art tradeoffs.
format	Article
id	doaj-art-1f30e175a781437a928fa88685bf0bb4
institution	DOAJ
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-1f30e175a781437a928fa88685bf0bb42025-08-20T03:01:28ZengIEEEIEEE Access2169-35362025-01-0113451984521410.1109/ACCESS.2025.354962610918649Fast, Robust, and Learned Distribution-Based SortingPaolo Ferragina0https://orcid.org/0000-0003-1353-360XMattia Odorisio1https://orcid.org/0009-0001-5320-8513Department of L’EMbeDS, Sant’Anna School of Advanced Studies, Pisa, ItalyDepartment of Computer Science, University of Pisa, Pisa, ItalyDespite a long history of research and achievements in designing sorting routines, new hardware features and application requirements pose advanced challenges that computer scientists are recently tackling through novel data-aware (learned) algorithms. This paper aims to better understand the strengths and limitations of learned sorters for numerical data, by addressing two main research and engineering questions: (Q1) Which is the best trade-off, in sorting items, between the efficacy of a “learned” model in approximating the distribution of the input data and its time-space complexity? and (Q2) Which algorithmic structure for distribution-based sorting is suited to leverage a learned model in order to achieve state-of-the-art performance? To answer Q1, we implement a variant of the best-known learned model (i.e., the two-layers RMI) and, also, design a fully new learned model that turns out to be space-time efficient and efficacious in approximating adversarial input distributions. We experimentally test them over 11 datasets of 200M and 800M 64-bit floating-point items, thus offering a comprehensive answer to Q1. We then address Q2 by plugging the best-learned models from above into a distribution-based sorting scheme that leads to three new sorters whose performance is tested over 33 datasets (real and synthetic) of sizes up to 800M items. Our experimental results will show that our sorters perform better than 6 (classic and learned) best-known sorters on 29 out of those 33 datasets, thus achieving new state-of-the-art tradeoffs.https://ieeexplore.ieee.org/document/10918649/Sortinglearned sortlearned indexeslearned-based data structures
spellingShingle	Paolo Ferragina Mattia Odorisio Fast, Robust, and Learned Distribution-Based Sorting IEEE Access Sorting learned sort learned indexes learned-based data structures
title	Fast, Robust, and Learned Distribution-Based Sorting
title_full	Fast, Robust, and Learned Distribution-Based Sorting
title_fullStr	Fast, Robust, and Learned Distribution-Based Sorting
title_full_unstemmed	Fast, Robust, and Learned Distribution-Based Sorting
title_short	Fast, Robust, and Learned Distribution-Based Sorting
title_sort	fast robust and learned distribution based sorting
topic	Sorting learned sort learned indexes learned-based data structures
url	https://ieeexplore.ieee.org/document/10918649/
work_keys_str_mv	AT paoloferragina fastrobustandlearneddistributionbasedsorting AT mattiaodorisio fastrobustandlearneddistributionbasedsorting

Fast, Robust, and Learned Distribution-Based Sorting

Similar Items