Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation

In industrial scenarios, image segmentation is essential for accurately identifying defect regions. Recently, the emergence of foundation models driven by powerful computational resources and large-scale training data has brought about a paradigm shift in deep learning-based image segmentation. The...

Full description

Saved in:
Bibliographic Details
Main Authors: Yifan Jiang, Jinshui Chen, Jiangang Lu
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/8/2417
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850179976055226368
author Yifan Jiang
Jinshui Chen
Jiangang Lu
author_facet Yifan Jiang
Jinshui Chen
Jiangang Lu
author_sort Yifan Jiang
collection DOAJ
description In industrial scenarios, image segmentation is essential for accurately identifying defect regions. Recently, the emergence of foundation models driven by powerful computational resources and large-scale training data has brought about a paradigm shift in deep learning-based image segmentation. The Segment Anything Model (SAM) has shown exceptional performance across various downstream tasks, owing to its vast semantic knowledge and strong generalization capabilities. However, the feature distribution discrepancy, reliance on manually labeled prompts, and limited category information of SAM reduce its scalability in industrial settings. To address these issues, we propose PA-SAM, an industrial defect segmentation framework based on SAM. Firstly, to bridge the gap between SAM’s pre-training data and distinct characteristics of industrial defects, we introduce a parameter-efficient fine-tuning (PEFT) technique incorporating lightweight Multi-Scale Partial Convolution Aggregation (MSPCA) into Low-Rank Adaptation (LoRA), named MSPCA-LoRA, which effectively enhances the image encoder’s sensitivity to prior knowledge biases, while maintaining PEFT efficiency. Furthermore, we present the Image-to-Prompt Embedding Generator (IPEG), which utilizes image embeddings to autonomously create high-quality prompt embeddings for directing mask segmentation, eliminating the limitations of manually provided prompts. Finally, we apply effective refinements to SAM’s mask decoder, transforming SAM into an end-to-end semantic segmentation framework. On two real-world defect segmentation datasets, PA-SAM achieves mean Intersections over Union of 73.87% and 68.30%, as well as mean Dice coefficients of 84.90% and 80.22%, outperforming other state-of-the-art algorithms, further demonstrating its robust generalization and application potential.
format Article
id doaj-art-a013b420643b4243b2fed32decdd9705
institution OA Journals
issn 1424-8220
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-a013b420643b4243b2fed32decdd97052025-08-20T02:18:20ZengMDPI AGSensors1424-82202025-04-01258241710.3390/s25082417Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect SegmentationYifan Jiang0Jinshui Chen1Jiangang Lu2State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, ChinaState Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, ChinaState Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, ChinaIn industrial scenarios, image segmentation is essential for accurately identifying defect regions. Recently, the emergence of foundation models driven by powerful computational resources and large-scale training data has brought about a paradigm shift in deep learning-based image segmentation. The Segment Anything Model (SAM) has shown exceptional performance across various downstream tasks, owing to its vast semantic knowledge and strong generalization capabilities. However, the feature distribution discrepancy, reliance on manually labeled prompts, and limited category information of SAM reduce its scalability in industrial settings. To address these issues, we propose PA-SAM, an industrial defect segmentation framework based on SAM. Firstly, to bridge the gap between SAM’s pre-training data and distinct characteristics of industrial defects, we introduce a parameter-efficient fine-tuning (PEFT) technique incorporating lightweight Multi-Scale Partial Convolution Aggregation (MSPCA) into Low-Rank Adaptation (LoRA), named MSPCA-LoRA, which effectively enhances the image encoder’s sensitivity to prior knowledge biases, while maintaining PEFT efficiency. Furthermore, we present the Image-to-Prompt Embedding Generator (IPEG), which utilizes image embeddings to autonomously create high-quality prompt embeddings for directing mask segmentation, eliminating the limitations of manually provided prompts. Finally, we apply effective refinements to SAM’s mask decoder, transforming SAM into an end-to-end semantic segmentation framework. On two real-world defect segmentation datasets, PA-SAM achieves mean Intersections over Union of 73.87% and 68.30%, as well as mean Dice coefficients of 84.90% and 80.22%, outperforming other state-of-the-art algorithms, further demonstrating its robust generalization and application potential.https://www.mdpi.com/1424-8220/25/8/2417defect segmentationvision foundation modelsegment anything modellow-rank adaptationpartial convolutionautomated prompter
spellingShingle Yifan Jiang
Jinshui Chen
Jiangang Lu
Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation
Sensors
defect segmentation
vision foundation model
segment anything model
low-rank adaptation
partial convolution
automated prompter
title Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation
title_full Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation
title_fullStr Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation
title_full_unstemmed Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation
title_short Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation
title_sort leveraging vision foundation model via pconv based fine tuning with automated prompter for defect segmentation
topic defect segmentation
vision foundation model
segment anything model
low-rank adaptation
partial convolution
automated prompter
url https://www.mdpi.com/1424-8220/25/8/2417
work_keys_str_mv AT yifanjiang leveragingvisionfoundationmodelviapconvbasedfinetuningwithautomatedprompterfordefectsegmentation
AT jinshuichen leveragingvisionfoundationmodelviapconvbasedfinetuningwithautomatedprompterfordefectsegmentation
AT jianganglu leveragingvisionfoundationmodelviapconvbasedfinetuningwithautomatedprompterfordefectsegmentation