Comparative Analysis of Vision Foundation Models For Building Segmentation in Aerial Imagery
Visual Foundation Models (VFMs) demonstrate impressive generalization capabilities for image segmentation and classification tasks, leading to their increasing adoption in the remote sensing field. This study investigates the performance of VFMs in zero-shot building segmentation from aerial imagery...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Copernicus Publications
2025-05-01
|
| Series: | The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences |
| Online Access: | https://isprs-archives.copernicus.org/articles/XLVIII-M-6-2025/23/2025/isprs-archives-XLVIII-M-6-2025-23-2025.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Visual Foundation Models (VFMs) demonstrate impressive generalization capabilities for image segmentation and classification tasks, leading to their increasing adoption in the remote sensing field. This study investigates the performance of VFMs in zero-shot building segmentation from aerial imagery using two model pipelines: Grounded-SAM and SAM+CLIP. Grounded-SAM integrates the Grounding DINO backbone with a Segment Anything Model (SAM) while SAM+CLIP first employs SAM for generating masks followed by Contrastive Language Image Pretraining (CLIP) for classification. The evaluation, performed on the WHU building dataset using Precision, Recall, F1 score, and intersection over union (IoU) metrics, revealed that Grounded-SAM achieved F1-score of 0.83 and IoU of 0.71. SAM+CLIP achieved F1-score of 0.65 and IoU of 0.49. While Grounded-SAM excelled at accurately delineating partially occluded and irregularly shaped buildings, SAM+CLIP was able to segment larger buildings but struggled with delineating smaller ones. Given the impressive performance of VFMs in zero-shot building segmentation, future efforts aimed at refining these models through fine-tuning or few-shot learning could significantly expand their application in remote sensing. |
|---|---|
| ISSN: | 1682-1750 2194-9034 |