Uncertainty quantification from ensemble variance scaling laws in deep neural networks
Quantifying the uncertainty from machine learning analyses is critical to their use in the physical sciences. In this work we focus on uncertainty inherited from the initialization distribution of neural networks. We compute the mean $\mu_{\mathcal{L}}$ and variance $\sigma_{\mathcal{L}}^2$ of the t...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IOP Publishing
2025-01-01
|
| Series: | Machine Learning: Science and Technology |
| Subjects: | |
| Online Access: | https://doi.org/10.1088/2632-2153/adf7fe |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849222172145352704 |
|---|---|
| author | Ibrahim Elsharkawy Benjamin Hooberman Yonatan Kahn |
| author_facet | Ibrahim Elsharkawy Benjamin Hooberman Yonatan Kahn |
| author_sort | Ibrahim Elsharkawy |
| collection | DOAJ |
| description | Quantifying the uncertainty from machine learning analyses is critical to their use in the physical sciences. In this work we focus on uncertainty inherited from the initialization distribution of neural networks. We compute the mean $\mu_{\mathcal{L}}$ and variance $\sigma_{\mathcal{L}}^2$ of the test loss $\mathcal{L}$ for an ensemble of multi-layer perceptrons with neural tangent kernel initialization in the infinite-width limit, and compare empirically to the results from finite-width networks for three example tasks: MNIST classification, CIFAR classification and calorimeter energy regression. We observe scaling laws as a function of training set size $N_\mathcal{D}$ for both $\mu_{\mathcal{L}}$ and $\sigma_{\mathcal{L}}$ , but find that the coefficient of variation $\epsilon_{\mathcal{L}} \equiv \sigma_{\mathcal{L}}/\mu_{\mathcal{L}}$ becomes independent of $N_\mathcal{D}$ at both infinite and finite width for sufficiently large $N_\mathcal{D}$ . This implies that the coefficient of variation of a finite-width network may be approximated by its infinite-width value, and may in principle be calculable using finite-width perturbation theory. |
| format | Article |
| id | doaj-art-e04b1bf240a046deb9f062a97d19b5d3 |
| institution | Kabale University |
| issn | 2632-2153 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IOP Publishing |
| record_format | Article |
| series | Machine Learning: Science and Technology |
| spelling | doaj-art-e04b1bf240a046deb9f062a97d19b5d32025-08-26T07:37:09ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016303504010.1088/2632-2153/adf7feUncertainty quantification from ensemble variance scaling laws in deep neural networksIbrahim Elsharkawy0https://orcid.org/0009-0000-2743-6785Benjamin Hooberman1Yonatan Kahn2https://orcid.org/0000-0002-9379-1838Department of Physics, University of Illinois Urbana-Champaign , Urbana, IL, United States of AmericaDepartment of Physics, University of Illinois Urbana-Champaign , Urbana, IL, United States of AmericaDepartment of Physics, University of Illinois Urbana-Champaign , Urbana, IL, United States of America; Department of Physics, University of Toronto , Toronto, ON, Canada; Vector Institute , Toronto, ON, CanadaQuantifying the uncertainty from machine learning analyses is critical to their use in the physical sciences. In this work we focus on uncertainty inherited from the initialization distribution of neural networks. We compute the mean $\mu_{\mathcal{L}}$ and variance $\sigma_{\mathcal{L}}^2$ of the test loss $\mathcal{L}$ for an ensemble of multi-layer perceptrons with neural tangent kernel initialization in the infinite-width limit, and compare empirically to the results from finite-width networks for three example tasks: MNIST classification, CIFAR classification and calorimeter energy regression. We observe scaling laws as a function of training set size $N_\mathcal{D}$ for both $\mu_{\mathcal{L}}$ and $\sigma_{\mathcal{L}}$ , but find that the coefficient of variation $\epsilon_{\mathcal{L}} \equiv \sigma_{\mathcal{L}}/\mu_{\mathcal{L}}$ becomes independent of $N_\mathcal{D}$ at both infinite and finite width for sufficiently large $N_\mathcal{D}$ . This implies that the coefficient of variation of a finite-width network may be approximated by its infinite-width value, and may in principle be calculable using finite-width perturbation theory.https://doi.org/10.1088/2632-2153/adf7feuncertainty quantificationneural network ensembleneural tangent kernelinfinite-width limitscaling laws |
| spellingShingle | Ibrahim Elsharkawy Benjamin Hooberman Yonatan Kahn Uncertainty quantification from ensemble variance scaling laws in deep neural networks Machine Learning: Science and Technology uncertainty quantification neural network ensemble neural tangent kernel infinite-width limit scaling laws |
| title | Uncertainty quantification from ensemble variance scaling laws in deep neural networks |
| title_full | Uncertainty quantification from ensemble variance scaling laws in deep neural networks |
| title_fullStr | Uncertainty quantification from ensemble variance scaling laws in deep neural networks |
| title_full_unstemmed | Uncertainty quantification from ensemble variance scaling laws in deep neural networks |
| title_short | Uncertainty quantification from ensemble variance scaling laws in deep neural networks |
| title_sort | uncertainty quantification from ensemble variance scaling laws in deep neural networks |
| topic | uncertainty quantification neural network ensemble neural tangent kernel infinite-width limit scaling laws |
| url | https://doi.org/10.1088/2632-2153/adf7fe |
| work_keys_str_mv | AT ibrahimelsharkawy uncertaintyquantificationfromensemblevariancescalinglawsindeepneuralnetworks AT benjaminhooberman uncertaintyquantificationfromensemblevariancescalinglawsindeepneuralnetworks AT yonatankahn uncertaintyquantificationfromensemblevariancescalinglawsindeepneuralnetworks |