Enhanced Object Detection in Thangka Images Using Gabor, Wavelet, and Color Feature Fusion

Thangka image detection poses unique challenges due to complex iconography, densely packed small-scale elements, and stylized color–texture compositions. Existing detectors often struggle to capture both global structures and local details and rarely leverage domain-specific visual priors. To addres...

Full description

Saved in:
Bibliographic Details
Main Authors: Yukai Xian, Yurui Lee, Te Shen, Ping Lan, Qijun Zhao, Liang Yan
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/11/3565
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Thangka image detection poses unique challenges due to complex iconography, densely packed small-scale elements, and stylized color–texture compositions. Existing detectors often struggle to capture both global structures and local details and rarely leverage domain-specific visual priors. To address this, we propose a frequency- and prior-enhanced detection framework based on YOLOv11, specifically tailored for Thangka images. We introduce a Learnable Lifting Wavelet Block (LLWB) to decompose features into low- and high-frequency components, while LLWB_Down and LLWB_Up enable frequency-guided multi-scale fusion. To incorporate chromatic and directional cues, we design a Color-Gabor Block (CGBlock), a dual-branch attention module based on HSV histograms and Gabor responses, and embed it via the Color-Gabor Cross Gate (C2CG) residual fusion module. Furthermore, we redesign all detection heads with decoupled branches and introduce center-ness prediction, alongside an additional shallow detection head to improve recall for ultra-small targets. Extensive experiments on a curated Thangka dataset demonstrate that our model achieves 89.5% mAP@0.5, 59.4% mAP@[0.5:0.95], and 84.7% recall, surpassing all baseline detectors while maintaining a compact size of 20.9 M parameters. Ablation studies validate the individual and synergistic contributions of each proposed component. Our method provides a robust and interpretable solution for fine-grained object detection in complex heritage images.
ISSN:1424-8220