Semantic-Injected Bidirectional Multiscale Flow Estimation Network for Infrared and Visible Image Registration
Infrared and visible image registration ensures consistency in spatial positions across different modalities. Cross-modal images contain different scales objects and cluttered backgrounds. However, most existing image registration methods adopt the same alignment strategy for different objects, whic...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10833818/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Infrared and visible image registration ensures consistency in spatial positions across different modalities. Cross-modal images contain different scales objects and cluttered backgrounds. However, most existing image registration methods adopt the same alignment strategy for different objects, which leads to insufficient multiscale feature representation and inaccurate registration of foreground objects. To address these issues, we propose a semantic-injected bidirectional multiscale flow estimation (SI-BMFE) network for infrared and visible image registration. SI-BMFE leverages feature complementarity across different scales and employs a pretrained segmentation network to extract the spatial positions of foreground objects to improve registration accuracy. Specifically, we first design a bidirectional multiscale feature enhancement (BMFE) module to integrate feature complementarity across different scales, effectively extracts both global structures and local details. BMFE pushes the network to roughly align infrared and visible images. Then, the semantic-injected flow estimation (SFE) module is introduced to estimate multilevel deformation fields for fine-grained registration of different objects. SFE utilizes a pretrained segmentation network to obtain spatial location information of foreground objects. Object location cues help the network distinguish and focus on different foreground objects from the background. SFE exploits semantic knowledge to promote fine alignment of different foreground objects and improve the accuracy of cross-modal image registration. Extensive experiments demonstrate that our proposed method outperforms state-of-the-art registration networks on both the MSRS and RoadScene infrared and visible image registration datasets. |
---|---|
ISSN: | 1939-1404 2151-1535 |