Uncertainty quantification from ensemble variance scaling laws in deep neural networks

Quantifying the uncertainty from machine learning analyses is critical to their use in the physical sciences. In this work we focus on uncertainty inherited from the initialization distribution of neural networks. We compute the mean $\mu_{\mathcal{L}}$ and variance $\sigma_{\mathcal{L}}^2$ of the t...

Full description

Saved in:
Bibliographic Details
Main Authors: Ibrahim Elsharkawy, Benjamin Hooberman, Yonatan Kahn
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:Machine Learning: Science and Technology
Subjects:
Online Access:https://doi.org/10.1088/2632-2153/adf7fe
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849222172145352704
author Ibrahim Elsharkawy
Benjamin Hooberman
Yonatan Kahn
author_facet Ibrahim Elsharkawy
Benjamin Hooberman
Yonatan Kahn
author_sort Ibrahim Elsharkawy
collection DOAJ
description Quantifying the uncertainty from machine learning analyses is critical to their use in the physical sciences. In this work we focus on uncertainty inherited from the initialization distribution of neural networks. We compute the mean $\mu_{\mathcal{L}}$ and variance $\sigma_{\mathcal{L}}^2$ of the test loss $\mathcal{L}$ for an ensemble of multi-layer perceptrons with neural tangent kernel initialization in the infinite-width limit, and compare empirically to the results from finite-width networks for three example tasks: MNIST classification, CIFAR classification and calorimeter energy regression. We observe scaling laws as a function of training set size $N_\mathcal{D}$ for both $\mu_{\mathcal{L}}$ and $\sigma_{\mathcal{L}}$ , but find that the coefficient of variation $\epsilon_{\mathcal{L}} \equiv \sigma_{\mathcal{L}}/\mu_{\mathcal{L}}$ becomes independent of $N_\mathcal{D}$ at both infinite and finite width for sufficiently large $N_\mathcal{D}$ . This implies that the coefficient of variation of a finite-width network may be approximated by its infinite-width value, and may in principle be calculable using finite-width perturbation theory.
format Article
id doaj-art-e04b1bf240a046deb9f062a97d19b5d3
institution Kabale University
issn 2632-2153
language English
publishDate 2025-01-01
publisher IOP Publishing
record_format Article
series Machine Learning: Science and Technology
spelling doaj-art-e04b1bf240a046deb9f062a97d19b5d32025-08-26T07:37:09ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016303504010.1088/2632-2153/adf7feUncertainty quantification from ensemble variance scaling laws in deep neural networksIbrahim Elsharkawy0https://orcid.org/0009-0000-2743-6785Benjamin Hooberman1Yonatan Kahn2https://orcid.org/0000-0002-9379-1838Department of Physics, University of Illinois Urbana-Champaign , Urbana, IL, United States of AmericaDepartment of Physics, University of Illinois Urbana-Champaign , Urbana, IL, United States of AmericaDepartment of Physics, University of Illinois Urbana-Champaign , Urbana, IL, United States of America; Department of Physics, University of Toronto , Toronto, ON, Canada; Vector Institute , Toronto, ON, CanadaQuantifying the uncertainty from machine learning analyses is critical to their use in the physical sciences. In this work we focus on uncertainty inherited from the initialization distribution of neural networks. We compute the mean $\mu_{\mathcal{L}}$ and variance $\sigma_{\mathcal{L}}^2$ of the test loss $\mathcal{L}$ for an ensemble of multi-layer perceptrons with neural tangent kernel initialization in the infinite-width limit, and compare empirically to the results from finite-width networks for three example tasks: MNIST classification, CIFAR classification and calorimeter energy regression. We observe scaling laws as a function of training set size $N_\mathcal{D}$ for both $\mu_{\mathcal{L}}$ and $\sigma_{\mathcal{L}}$ , but find that the coefficient of variation $\epsilon_{\mathcal{L}} \equiv \sigma_{\mathcal{L}}/\mu_{\mathcal{L}}$ becomes independent of $N_\mathcal{D}$ at both infinite and finite width for sufficiently large $N_\mathcal{D}$ . This implies that the coefficient of variation of a finite-width network may be approximated by its infinite-width value, and may in principle be calculable using finite-width perturbation theory.https://doi.org/10.1088/2632-2153/adf7feuncertainty quantificationneural network ensembleneural tangent kernelinfinite-width limitscaling laws
spellingShingle Ibrahim Elsharkawy
Benjamin Hooberman
Yonatan Kahn
Uncertainty quantification from ensemble variance scaling laws in deep neural networks
Machine Learning: Science and Technology
uncertainty quantification
neural network ensemble
neural tangent kernel
infinite-width limit
scaling laws
title Uncertainty quantification from ensemble variance scaling laws in deep neural networks
title_full Uncertainty quantification from ensemble variance scaling laws in deep neural networks
title_fullStr Uncertainty quantification from ensemble variance scaling laws in deep neural networks
title_full_unstemmed Uncertainty quantification from ensemble variance scaling laws in deep neural networks
title_short Uncertainty quantification from ensemble variance scaling laws in deep neural networks
title_sort uncertainty quantification from ensemble variance scaling laws in deep neural networks
topic uncertainty quantification
neural network ensemble
neural tangent kernel
infinite-width limit
scaling laws
url https://doi.org/10.1088/2632-2153/adf7fe
work_keys_str_mv AT ibrahimelsharkawy uncertaintyquantificationfromensemblevariancescalinglawsindeepneuralnetworks
AT benjaminhooberman uncertaintyquantificationfromensemblevariancescalinglawsindeepneuralnetworks
AT yonatankahn uncertaintyquantificationfromensemblevariancescalinglawsindeepneuralnetworks