Statistical Validity of Neural-Net Benchmarks
Claims of better, faster or more efficient neural-net designs often hinge on low single digit percentage improvements (or less) in accuracy or speed compared to others. Current benchmark differences used for comparison have been based on a number of different metrics such as recall, the best of five...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Open Journal of the Computer Society |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10816528/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850271730490146816 |
|---|---|
| author | Alain Hadges Srikar Bellur |
| author_facet | Alain Hadges Srikar Bellur |
| author_sort | Alain Hadges |
| collection | DOAJ |
| description | Claims of better, faster or more efficient neural-net designs often hinge on low single digit percentage improvements (or less) in accuracy or speed compared to others. Current benchmark differences used for comparison have been based on a number of different metrics such as recall, the best of five-runs, the median of five runs, Top-1, Top-5, BLEU, ROC, RMS, etc. These metrics implicitly assert comparable distributions of metrics. Conspicuous by their absence are measures of statistical validity of these benchmark comparisons. This study examined neural-net benchmark metric distributions and determined there are researcher degrees of freedom that may affect comparison validity. An essay is developed and proposed for benchmarking and comparing reasonably expected neural-net performance metrics that minimizes researcher degrees of freedom. The essay includes an estimate of the effects and the interactions of hyper-parameter settings on the benchmark metrics of a neural-net as a measure of its optimization complexity. |
| format | Article |
| id | doaj-art-ea33a80f195c40cfa155b8e148d6ba9d |
| institution | OA Journals |
| issn | 2644-1268 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Open Journal of the Computer Society |
| spelling | doaj-art-ea33a80f195c40cfa155b8e148d6ba9d2025-08-20T01:52:07ZengIEEEIEEE Open Journal of the Computer Society2644-12682025-01-01621122210.1109/OJCS.2024.352318310816528Statistical Validity of Neural-Net BenchmarksAlain Hadges0https://orcid.org/0009-0003-7996-6528Srikar Bellur1Harrisburg University of Science & Technology, Harrisburg, PA, USADepartment of Data Analytics, Harrisburg University of Science & Technology, Harrisburg, PA, USAClaims of better, faster or more efficient neural-net designs often hinge on low single digit percentage improvements (or less) in accuracy or speed compared to others. Current benchmark differences used for comparison have been based on a number of different metrics such as recall, the best of five-runs, the median of five runs, Top-1, Top-5, BLEU, ROC, RMS, etc. These metrics implicitly assert comparable distributions of metrics. Conspicuous by their absence are measures of statistical validity of these benchmark comparisons. This study examined neural-net benchmark metric distributions and determined there are researcher degrees of freedom that may affect comparison validity. An essay is developed and proposed for benchmarking and comparing reasonably expected neural-net performance metrics that minimizes researcher degrees of freedom. The essay includes an estimate of the effects and the interactions of hyper-parameter settings on the benchmark metrics of a neural-net as a measure of its optimization complexity.https://ieeexplore.ieee.org/document/10816528/Bayesian credible intervalbenchmark essaycomparisonfactorial experimenthyper-parametersmachine learning |
| spellingShingle | Alain Hadges Srikar Bellur Statistical Validity of Neural-Net Benchmarks IEEE Open Journal of the Computer Society Bayesian credible interval benchmark essay comparison factorial experiment hyper-parameters machine learning |
| title | Statistical Validity of Neural-Net Benchmarks |
| title_full | Statistical Validity of Neural-Net Benchmarks |
| title_fullStr | Statistical Validity of Neural-Net Benchmarks |
| title_full_unstemmed | Statistical Validity of Neural-Net Benchmarks |
| title_short | Statistical Validity of Neural-Net Benchmarks |
| title_sort | statistical validity of neural net benchmarks |
| topic | Bayesian credible interval benchmark essay comparison factorial experiment hyper-parameters machine learning |
| url | https://ieeexplore.ieee.org/document/10816528/ |
| work_keys_str_mv | AT alainhadges statisticalvalidityofneuralnetbenchmarks AT srikarbellur statisticalvalidityofneuralnetbenchmarks |