Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation
In industrial scenarios, image segmentation is essential for accurately identifying defect regions. Recently, the emergence of foundation models driven by powerful computational resources and large-scale training data has brought about a paradigm shift in deep learning-based image segmentation. The...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/8/2417 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850179976055226368 |
|---|---|
| author | Yifan Jiang Jinshui Chen Jiangang Lu |
| author_facet | Yifan Jiang Jinshui Chen Jiangang Lu |
| author_sort | Yifan Jiang |
| collection | DOAJ |
| description | In industrial scenarios, image segmentation is essential for accurately identifying defect regions. Recently, the emergence of foundation models driven by powerful computational resources and large-scale training data has brought about a paradigm shift in deep learning-based image segmentation. The Segment Anything Model (SAM) has shown exceptional performance across various downstream tasks, owing to its vast semantic knowledge and strong generalization capabilities. However, the feature distribution discrepancy, reliance on manually labeled prompts, and limited category information of SAM reduce its scalability in industrial settings. To address these issues, we propose PA-SAM, an industrial defect segmentation framework based on SAM. Firstly, to bridge the gap between SAM’s pre-training data and distinct characteristics of industrial defects, we introduce a parameter-efficient fine-tuning (PEFT) technique incorporating lightweight Multi-Scale Partial Convolution Aggregation (MSPCA) into Low-Rank Adaptation (LoRA), named MSPCA-LoRA, which effectively enhances the image encoder’s sensitivity to prior knowledge biases, while maintaining PEFT efficiency. Furthermore, we present the Image-to-Prompt Embedding Generator (IPEG), which utilizes image embeddings to autonomously create high-quality prompt embeddings for directing mask segmentation, eliminating the limitations of manually provided prompts. Finally, we apply effective refinements to SAM’s mask decoder, transforming SAM into an end-to-end semantic segmentation framework. On two real-world defect segmentation datasets, PA-SAM achieves mean Intersections over Union of 73.87% and 68.30%, as well as mean Dice coefficients of 84.90% and 80.22%, outperforming other state-of-the-art algorithms, further demonstrating its robust generalization and application potential. |
| format | Article |
| id | doaj-art-a013b420643b4243b2fed32decdd9705 |
| institution | OA Journals |
| issn | 1424-8220 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Sensors |
| spelling | doaj-art-a013b420643b4243b2fed32decdd97052025-08-20T02:18:20ZengMDPI AGSensors1424-82202025-04-01258241710.3390/s25082417Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect SegmentationYifan Jiang0Jinshui Chen1Jiangang Lu2State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, ChinaState Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, ChinaState Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, ChinaIn industrial scenarios, image segmentation is essential for accurately identifying defect regions. Recently, the emergence of foundation models driven by powerful computational resources and large-scale training data has brought about a paradigm shift in deep learning-based image segmentation. The Segment Anything Model (SAM) has shown exceptional performance across various downstream tasks, owing to its vast semantic knowledge and strong generalization capabilities. However, the feature distribution discrepancy, reliance on manually labeled prompts, and limited category information of SAM reduce its scalability in industrial settings. To address these issues, we propose PA-SAM, an industrial defect segmentation framework based on SAM. Firstly, to bridge the gap between SAM’s pre-training data and distinct characteristics of industrial defects, we introduce a parameter-efficient fine-tuning (PEFT) technique incorporating lightweight Multi-Scale Partial Convolution Aggregation (MSPCA) into Low-Rank Adaptation (LoRA), named MSPCA-LoRA, which effectively enhances the image encoder’s sensitivity to prior knowledge biases, while maintaining PEFT efficiency. Furthermore, we present the Image-to-Prompt Embedding Generator (IPEG), which utilizes image embeddings to autonomously create high-quality prompt embeddings for directing mask segmentation, eliminating the limitations of manually provided prompts. Finally, we apply effective refinements to SAM’s mask decoder, transforming SAM into an end-to-end semantic segmentation framework. On two real-world defect segmentation datasets, PA-SAM achieves mean Intersections over Union of 73.87% and 68.30%, as well as mean Dice coefficients of 84.90% and 80.22%, outperforming other state-of-the-art algorithms, further demonstrating its robust generalization and application potential.https://www.mdpi.com/1424-8220/25/8/2417defect segmentationvision foundation modelsegment anything modellow-rank adaptationpartial convolutionautomated prompter |
| spellingShingle | Yifan Jiang Jinshui Chen Jiangang Lu Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation Sensors defect segmentation vision foundation model segment anything model low-rank adaptation partial convolution automated prompter |
| title | Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation |
| title_full | Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation |
| title_fullStr | Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation |
| title_full_unstemmed | Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation |
| title_short | Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation |
| title_sort | leveraging vision foundation model via pconv based fine tuning with automated prompter for defect segmentation |
| topic | defect segmentation vision foundation model segment anything model low-rank adaptation partial convolution automated prompter |
| url | https://www.mdpi.com/1424-8220/25/8/2417 |
| work_keys_str_mv | AT yifanjiang leveragingvisionfoundationmodelviapconvbasedfinetuningwithautomatedprompterfordefectsegmentation AT jinshuichen leveragingvisionfoundationmodelviapconvbasedfinetuningwithautomatedprompterfordefectsegmentation AT jianganglu leveragingvisionfoundationmodelviapconvbasedfinetuningwithautomatedprompterfordefectsegmentation |