The art of misclassification: too many classes, not enough points

Abstract Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of dat...

Full description

Saved in:
Bibliographic Details
Main Authors: Mario Franco, Gerardo Febres, Nelson Fernández, Carlos Gershenson
Format: Article
Language:English
Published: SpringerOpen 2025-07-01
Series:EPJ Data Science
Subjects:
Online Access:https://doi.org/10.1140/epjds/s13688-025-00565-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849332882770755584
author Mario Franco
Gerardo Febres
Nelson Fernández
Carlos Gershenson
author_facet Mario Franco
Gerardo Febres
Nelson Fernández
Carlos Gershenson
author_sort Mario Franco
collection DOAJ
description Abstract Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of datasets, independently of computational power or model complexity. In this work, we introduce a formal entropy-based measure of classifiability, which quantifies the inherent difficulty of a classification problem by assessing the uncertainty in class assignments given feature representations. This measure captures the degree of class overlap and aligns with human intuition, serving as an upper bound on classification performance for classification problems. Our results establish a theoretical limit beyond which no classifier can improve the classification accuracy, regardless of the architecture or amount of data, in a given problem. Our approach provides a principled framework for understanding when classification is inherently fallible and fundamentally ambiguous.
format Article
id doaj-art-3d6d9ac9bb3b407cbbe43c53337f8061
institution Kabale University
issn 2193-1127
language English
publishDate 2025-07-01
publisher SpringerOpen
record_format Article
series EPJ Data Science
spelling doaj-art-3d6d9ac9bb3b407cbbe43c53337f80612025-08-20T03:46:04ZengSpringerOpenEPJ Data Science2193-11272025-07-0114112210.1140/epjds/s13688-025-00565-7The art of misclassification: too many classes, not enough pointsMario Franco0Gerardo Febres1Nelson Fernández2Carlos Gershenson3School of Systems Science and Industrial Enginnering, Binghamton UniversitySchool of Systems Science and Industrial Enginnering, Binghamton UniversitySchool of Systems Science and Industrial Enginnering, Binghamton UniversitySchool of Systems Science and Industrial Enginnering, Binghamton UniversityAbstract Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of datasets, independently of computational power or model complexity. In this work, we introduce a formal entropy-based measure of classifiability, which quantifies the inherent difficulty of a classification problem by assessing the uncertainty in class assignments given feature representations. This measure captures the degree of class overlap and aligns with human intuition, serving as an upper bound on classification performance for classification problems. Our results establish a theoretical limit beyond which no classifier can improve the classification accuracy, regardless of the architecture or amount of data, in a given problem. Our approach provides a principled framework for understanding when classification is inherently fallible and fundamentally ambiguous.https://doi.org/10.1140/epjds/s13688-025-00565-7Classification limitsData limitsEntropy-based measureMachine learningArtificial intelligence
spellingShingle Mario Franco
Gerardo Febres
Nelson Fernández
Carlos Gershenson
The art of misclassification: too many classes, not enough points
EPJ Data Science
Classification limits
Data limits
Entropy-based measure
Machine learning
Artificial intelligence
title The art of misclassification: too many classes, not enough points
title_full The art of misclassification: too many classes, not enough points
title_fullStr The art of misclassification: too many classes, not enough points
title_full_unstemmed The art of misclassification: too many classes, not enough points
title_short The art of misclassification: too many classes, not enough points
title_sort art of misclassification too many classes not enough points
topic Classification limits
Data limits
Entropy-based measure
Machine learning
Artificial intelligence
url https://doi.org/10.1140/epjds/s13688-025-00565-7
work_keys_str_mv AT mariofranco theartofmisclassificationtoomanyclassesnotenoughpoints
AT gerardofebres theartofmisclassificationtoomanyclassesnotenoughpoints
AT nelsonfernandez theartofmisclassificationtoomanyclassesnotenoughpoints
AT carlosgershenson theartofmisclassificationtoomanyclassesnotenoughpoints
AT mariofranco artofmisclassificationtoomanyclassesnotenoughpoints
AT gerardofebres artofmisclassificationtoomanyclassesnotenoughpoints
AT nelsonfernandez artofmisclassificationtoomanyclassesnotenoughpoints
AT carlosgershenson artofmisclassificationtoomanyclassesnotenoughpoints