The art of misclassification: too many classes, not enough points

Abstract Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of dat...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mario Franco, Gerardo Febres, Nelson Fernández, Carlos Gershenson
Format:	Article
Language:	English
Published:	SpringerOpen 2025-07-01
Series:	EPJ Data Science
Subjects:	Classification limits Data limits Entropy-based measure Machine learning Artificial intelligence
Online Access:	https://doi.org/10.1140/epjds/s13688-025-00565-7
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849332882770755584
author	Mario Franco Gerardo Febres Nelson Fernández Carlos Gershenson
author_facet	Mario Franco Gerardo Febres Nelson Fernández Carlos Gershenson
author_sort	Mario Franco
collection	DOAJ
description	Abstract Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of datasets, independently of computational power or model complexity. In this work, we introduce a formal entropy-based measure of classifiability, which quantifies the inherent difficulty of a classification problem by assessing the uncertainty in class assignments given feature representations. This measure captures the degree of class overlap and aligns with human intuition, serving as an upper bound on classification performance for classification problems. Our results establish a theoretical limit beyond which no classifier can improve the classification accuracy, regardless of the architecture or amount of data, in a given problem. Our approach provides a principled framework for understanding when classification is inherently fallible and fundamentally ambiguous.
format	Article
id	doaj-art-3d6d9ac9bb3b407cbbe43c53337f8061
institution	Kabale University
issn	2193-1127
language	English
publishDate	2025-07-01
publisher	SpringerOpen
record_format	Article
series	EPJ Data Science
spelling	doaj-art-3d6d9ac9bb3b407cbbe43c53337f80612025-08-20T03:46:04ZengSpringerOpenEPJ Data Science2193-11272025-07-0114112210.1140/epjds/s13688-025-00565-7The art of misclassification: too many classes, not enough pointsMario Franco0Gerardo Febres1Nelson Fernández2Carlos Gershenson3School of Systems Science and Industrial Enginnering, Binghamton UniversitySchool of Systems Science and Industrial Enginnering, Binghamton UniversitySchool of Systems Science and Industrial Enginnering, Binghamton UniversitySchool of Systems Science and Industrial Enginnering, Binghamton UniversityAbstract Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of datasets, independently of computational power or model complexity. In this work, we introduce a formal entropy-based measure of classifiability, which quantifies the inherent difficulty of a classification problem by assessing the uncertainty in class assignments given feature representations. This measure captures the degree of class overlap and aligns with human intuition, serving as an upper bound on classification performance for classification problems. Our results establish a theoretical limit beyond which no classifier can improve the classification accuracy, regardless of the architecture or amount of data, in a given problem. Our approach provides a principled framework for understanding when classification is inherently fallible and fundamentally ambiguous.https://doi.org/10.1140/epjds/s13688-025-00565-7Classification limitsData limitsEntropy-based measureMachine learningArtificial intelligence
spellingShingle	Mario Franco Gerardo Febres Nelson Fernández Carlos Gershenson The art of misclassification: too many classes, not enough points EPJ Data Science Classification limits Data limits Entropy-based measure Machine learning Artificial intelligence
title	The art of misclassification: too many classes, not enough points
title_full	The art of misclassification: too many classes, not enough points
title_fullStr	The art of misclassification: too many classes, not enough points
title_full_unstemmed	The art of misclassification: too many classes, not enough points
title_short	The art of misclassification: too many classes, not enough points
title_sort	art of misclassification too many classes not enough points
topic	Classification limits Data limits Entropy-based measure Machine learning Artificial intelligence
url	https://doi.org/10.1140/epjds/s13688-025-00565-7
work_keys_str_mv	AT mariofranco theartofmisclassificationtoomanyclassesnotenoughpoints AT gerardofebres theartofmisclassificationtoomanyclassesnotenoughpoints AT nelsonfernandez theartofmisclassificationtoomanyclassesnotenoughpoints AT carlosgershenson theartofmisclassificationtoomanyclassesnotenoughpoints AT mariofranco artofmisclassificationtoomanyclassesnotenoughpoints AT gerardofebres artofmisclassificationtoomanyclassesnotenoughpoints AT nelsonfernandez artofmisclassificationtoomanyclassesnotenoughpoints AT carlosgershenson artofmisclassificationtoomanyclassesnotenoughpoints

The art of misclassification: too many classes, not enough points

Similar Items