Statistical Validity of Neural-Net Benchmarks

Claims of better, faster or more efficient neural-net designs often hinge on low single digit percentage improvements (or less) in accuracy or speed compared to others. Current benchmark differences used for comparison have been based on a number of different metrics such as recall, the best of five...

Full description

Saved in:

Bibliographic Details
Main Authors:	Alain Hadges, Srikar Bellur
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Open Journal of the Computer Society
Subjects:	Bayesian credible interval benchmark essay comparison factorial experiment hyper-parameters machine learning
Online Access:	https://ieeexplore.ieee.org/document/10816528/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850271730490146816
author	Alain Hadges Srikar Bellur
author_facet	Alain Hadges Srikar Bellur
author_sort	Alain Hadges
collection	DOAJ
description	Claims of better, faster or more efficient neural-net designs often hinge on low single digit percentage improvements (or less) in accuracy or speed compared to others. Current benchmark differences used for comparison have been based on a number of different metrics such as recall, the best of five-runs, the median of five runs, Top-1, Top-5, BLEU, ROC, RMS, etc. These metrics implicitly assert comparable distributions of metrics. Conspicuous by their absence are measures of statistical validity of these benchmark comparisons. This study examined neural-net benchmark metric distributions and determined there are researcher degrees of freedom that may affect comparison validity. An essay is developed and proposed for benchmarking and comparing reasonably expected neural-net performance metrics that minimizes researcher degrees of freedom. The essay includes an estimate of the effects and the interactions of hyper-parameter settings on the benchmark metrics of a neural-net as a measure of its optimization complexity.
format	Article
id	doaj-art-ea33a80f195c40cfa155b8e148d6ba9d
institution	OA Journals
issn	2644-1268
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Open Journal of the Computer Society
spelling	doaj-art-ea33a80f195c40cfa155b8e148d6ba9d2025-08-20T01:52:07ZengIEEEIEEE Open Journal of the Computer Society2644-12682025-01-01621122210.1109/OJCS.2024.352318310816528Statistical Validity of Neural-Net BenchmarksAlain Hadges0https://orcid.org/0009-0003-7996-6528Srikar Bellur1Harrisburg University of Science & Technology, Harrisburg, PA, USADepartment of Data Analytics, Harrisburg University of Science & Technology, Harrisburg, PA, USAClaims of better, faster or more efficient neural-net designs often hinge on low single digit percentage improvements (or less) in accuracy or speed compared to others. Current benchmark differences used for comparison have been based on a number of different metrics such as recall, the best of five-runs, the median of five runs, Top-1, Top-5, BLEU, ROC, RMS, etc. These metrics implicitly assert comparable distributions of metrics. Conspicuous by their absence are measures of statistical validity of these benchmark comparisons. This study examined neural-net benchmark metric distributions and determined there are researcher degrees of freedom that may affect comparison validity. An essay is developed and proposed for benchmarking and comparing reasonably expected neural-net performance metrics that minimizes researcher degrees of freedom. The essay includes an estimate of the effects and the interactions of hyper-parameter settings on the benchmark metrics of a neural-net as a measure of its optimization complexity.https://ieeexplore.ieee.org/document/10816528/Bayesian credible intervalbenchmark essaycomparisonfactorial experimenthyper-parametersmachine learning
spellingShingle	Alain Hadges Srikar Bellur Statistical Validity of Neural-Net Benchmarks IEEE Open Journal of the Computer Society Bayesian credible interval benchmark essay comparison factorial experiment hyper-parameters machine learning
title	Statistical Validity of Neural-Net Benchmarks
title_full	Statistical Validity of Neural-Net Benchmarks
title_fullStr	Statistical Validity of Neural-Net Benchmarks
title_full_unstemmed	Statistical Validity of Neural-Net Benchmarks
title_short	Statistical Validity of Neural-Net Benchmarks
title_sort	statistical validity of neural net benchmarks
topic	Bayesian credible interval benchmark essay comparison factorial experiment hyper-parameters machine learning
url	https://ieeexplore.ieee.org/document/10816528/
work_keys_str_mv	AT alainhadges statisticalvalidityofneuralnetbenchmarks AT srikarbellur statisticalvalidityofneuralnetbenchmarks

Statistical Validity of Neural-Net Benchmarks

Similar Items