Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks
Abstract Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-01-01
|
| Series: | Nature Communications |
| Online Access: | https://doi.org/10.1038/s41467-024-55229-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850119213464682496 |
|---|---|
| author | R. Aiudi R. Pacelli P. Baglioni A. Vezzani R. Burioni P. Rotondo |
| author_facet | R. Aiudi R. Pacelli P. Baglioni A. Vezzani R. Burioni P. Rotondo |
| author_sort | R. Aiudi |
| collection | DOAJ |
| description | Abstract Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances in the finite-width regime. In this work, we present a theoretical framework that provides a rationale for these differences in one-hidden-layer networks; we derive an effective action in the so-called proportional limit for an architecture with one convolutional hidden layer and compare it with the result available for fully-connected networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the fully-connected architecture is just globally renormalized by a single scalar parameter, the convolutional kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow convolutional neural networks, but not in shallow fully-connected architectures or in locally connected neural networks without weight sharing. |
| format | Article |
| id | doaj-art-fb21f72bd9de43adbe285d81a46a30f8 |
| institution | OA Journals |
| issn | 2041-1723 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Nature Communications |
| spelling | doaj-art-fb21f72bd9de43adbe285d81a46a30f82025-08-20T02:35:40ZengNature PortfolioNature Communications2041-17232025-01-0116111010.1038/s41467-024-55229-3Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networksR. Aiudi0R. Pacelli1P. Baglioni2A. Vezzani3R. Burioni4P. Rotondo5Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di ParmaINFN, sezione di PadovaINFN, sezione di Milano BicoccaDipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di ParmaDipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di ParmaDipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di ParmaAbstract Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances in the finite-width regime. In this work, we present a theoretical framework that provides a rationale for these differences in one-hidden-layer networks; we derive an effective action in the so-called proportional limit for an architecture with one convolutional hidden layer and compare it with the result available for fully-connected networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the fully-connected architecture is just globally renormalized by a single scalar parameter, the convolutional kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow convolutional neural networks, but not in shallow fully-connected architectures or in locally connected neural networks without weight sharing.https://doi.org/10.1038/s41467-024-55229-3 |
| spellingShingle | R. Aiudi R. Pacelli P. Baglioni A. Vezzani R. Burioni P. Rotondo Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks Nature Communications |
| title | Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks |
| title_full | Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks |
| title_fullStr | Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks |
| title_full_unstemmed | Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks |
| title_short | Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks |
| title_sort | local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks |
| url | https://doi.org/10.1038/s41467-024-55229-3 |
| work_keys_str_mv | AT raiudi localkernelrenormalizationasamechanismforfeaturelearninginoverparametrizedconvolutionalneuralnetworks AT rpacelli localkernelrenormalizationasamechanismforfeaturelearninginoverparametrizedconvolutionalneuralnetworks AT pbaglioni localkernelrenormalizationasamechanismforfeaturelearninginoverparametrizedconvolutionalneuralnetworks AT avezzani localkernelrenormalizationasamechanismforfeaturelearninginoverparametrizedconvolutionalneuralnetworks AT rburioni localkernelrenormalizationasamechanismforfeaturelearninginoverparametrizedconvolutionalneuralnetworks AT protondo localkernelrenormalizationasamechanismforfeaturelearninginoverparametrizedconvolutionalneuralnetworks |