Exploring Flatter Loss Landscape Surface via Sharpness-Aware Minimization with Linear Mode Connectivity

The Sharpness-Aware Minimization (SAM) optimizer connects flatness and generalization, suggesting that loss basins with lower sharpness are correlated with better generalization. However, SAM requires manually tuning the open ball radius, which complicates its practical application. To address this,...

Full description

Saved in:
Bibliographic Details
Main Authors: Hailun Liang, Haowen Zheng, Hao Wang, Liu He, Haoyi Lin, Yanyan Liang
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/8/1259
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The Sharpness-Aware Minimization (SAM) optimizer connects flatness and generalization, suggesting that loss basins with lower sharpness are correlated with better generalization. However, SAM requires manually tuning the open ball radius, which complicates its practical application. To address this, we propose a method inspired by linear connectivity, using two models initialized differently as endpoints to automatically determine the optimal open ball radius. Specifically, we introduce distance regularization between the two models during training, which encourages them to approach each other, thus dynamically adjusting the open ball radius. We design an optimization algorithm called ’Twin Stars Entwined’ (TSE), where the stopping condition is defined by the models’ linear connectivity, i.e., when they converge to a region of sufficiently low distance. As the models iteratively reduce their distance, they converge to a flatter region of the loss landscape. Our approach complements SAM by dynamically identifying flatter regions and exploring the geometric properties of multiple connected loss basins. Instead of searching for a single large-radius basin, we identify a group of connected basins as potential optimization targets. Experiments conducted across multiple models and in varied noise environments showed that our method achieved a performance on par with state-of-the-art techniques.
ISSN:2227-7390