Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation

In industrial scenarios, image segmentation is essential for accurately identifying defect regions. Recently, the emergence of foundation models driven by powerful computational resources and large-scale training data has brought about a paradigm shift in deep learning-based image segmentation. The...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yifan Jiang, Jinshui Chen, Jiangang Lu
Format:	Article
Language:	English
Published:	MDPI AG 2025-04-01
Series:	Sensors
Subjects:	defect segmentation vision foundation model segment anything model low-rank adaptation partial convolution automated prompter
Online Access:	https://www.mdpi.com/1424-8220/25/8/2417
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In industrial scenarios, image segmentation is essential for accurately identifying defect regions. Recently, the emergence of foundation models driven by powerful computational resources and large-scale training data has brought about a paradigm shift in deep learning-based image segmentation. The Segment Anything Model (SAM) has shown exceptional performance across various downstream tasks, owing to its vast semantic knowledge and strong generalization capabilities. However, the feature distribution discrepancy, reliance on manually labeled prompts, and limited category information of SAM reduce its scalability in industrial settings. To address these issues, we propose PA-SAM, an industrial defect segmentation framework based on SAM. Firstly, to bridge the gap between SAM’s pre-training data and distinct characteristics of industrial defects, we introduce a parameter-efficient fine-tuning (PEFT) technique incorporating lightweight Multi-Scale Partial Convolution Aggregation (MSPCA) into Low-Rank Adaptation (LoRA), named MSPCA-LoRA, which effectively enhances the image encoder’s sensitivity to prior knowledge biases, while maintaining PEFT efficiency. Furthermore, we present the Image-to-Prompt Embedding Generator (IPEG), which utilizes image embeddings to autonomously create high-quality prompt embeddings for directing mask segmentation, eliminating the limitations of manually provided prompts. Finally, we apply effective refinements to SAM’s mask decoder, transforming SAM into an end-to-end semantic segmentation framework. On two real-world defect segmentation datasets, PA-SAM achieves mean Intersections over Union of 73.87% and 68.30%, as well as mean Dice coefficients of 84.90% and 80.22%, outperforming other state-of-the-art algorithms, further demonstrating its robust generalization and application potential.
ISSN:	1424-8220

Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation

Similar Items