Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks

Abstract Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances...

Full description

Saved in:

Bibliographic Details
Main Authors:	R. Aiudi, R. Pacelli, P. Baglioni, A. Vezzani, R. Burioni, P. Rotondo
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-01-01
Series:	Nature Communications
Online Access:	https://doi.org/10.1038/s41467-024-55229-3
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850119213464682496
author	R. Aiudi R. Pacelli P. Baglioni A. Vezzani R. Burioni P. Rotondo
author_facet	R. Aiudi R. Pacelli P. Baglioni A. Vezzani R. Burioni P. Rotondo
author_sort	R. Aiudi
collection	DOAJ
description	Abstract Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances in the finite-width regime. In this work, we present a theoretical framework that provides a rationale for these differences in one-hidden-layer networks; we derive an effective action in the so-called proportional limit for an architecture with one convolutional hidden layer and compare it with the result available for fully-connected networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the fully-connected architecture is just globally renormalized by a single scalar parameter, the convolutional kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow convolutional neural networks, but not in shallow fully-connected architectures or in locally connected neural networks without weight sharing.
format	Article
id	doaj-art-fb21f72bd9de43adbe285d81a46a30f8
institution	OA Journals
issn	2041-1723
language	English
publishDate	2025-01-01
publisher	Nature Portfolio
record_format	Article
series	Nature Communications
spelling	doaj-art-fb21f72bd9de43adbe285d81a46a30f82025-08-20T02:35:40ZengNature PortfolioNature Communications2041-17232025-01-0116111010.1038/s41467-024-55229-3Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networksR. Aiudi0R. Pacelli1P. Baglioni2A. Vezzani3R. Burioni4P. Rotondo5Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di ParmaINFN, sezione di PadovaINFN, sezione di Milano BicoccaDipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di ParmaDipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di ParmaDipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di ParmaAbstract Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances in the finite-width regime. In this work, we present a theoretical framework that provides a rationale for these differences in one-hidden-layer networks; we derive an effective action in the so-called proportional limit for an architecture with one convolutional hidden layer and compare it with the result available for fully-connected networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the fully-connected architecture is just globally renormalized by a single scalar parameter, the convolutional kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow convolutional neural networks, but not in shallow fully-connected architectures or in locally connected neural networks without weight sharing.https://doi.org/10.1038/s41467-024-55229-3
spellingShingle	R. Aiudi R. Pacelli P. Baglioni A. Vezzani R. Burioni P. Rotondo Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks Nature Communications
title	Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks
title_full	Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks
title_fullStr	Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks
title_full_unstemmed	Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks
title_short	Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks
title_sort	local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks
url	https://doi.org/10.1038/s41467-024-55229-3
work_keys_str_mv	AT raiudi localkernelrenormalizationasamechanismforfeaturelearninginoverparametrizedconvolutionalneuralnetworks AT rpacelli localkernelrenormalizationasamechanismforfeaturelearninginoverparametrizedconvolutionalneuralnetworks AT pbaglioni localkernelrenormalizationasamechanismforfeaturelearninginoverparametrizedconvolutionalneuralnetworks AT avezzani localkernelrenormalizationasamechanismforfeaturelearninginoverparametrizedconvolutionalneuralnetworks AT rburioni localkernelrenormalizationasamechanismforfeaturelearninginoverparametrizedconvolutionalneuralnetworks AT protondo localkernelrenormalizationasamechanismforfeaturelearninginoverparametrizedconvolutionalneuralnetworks

Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks

Similar Items