TBKIN: Threshold-based explicit selection for enhanced cross-modal semantic alignments.

TBKIN: Threshold-based explicit selection for enhanced cross-modal semantic alignments.

Vision-language models aim to seamlessly integrate visual and linguistic information for multi-modal tasks, demanding precise semantic alignments between image-text pairs while minimizing the influence of irrelevant data. While existing methods leverage intra-modal and cross-modal knowledge to enhan...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zihan Guo, Xiang Shen, Chongqing Chen
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0325543
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy Prediction
by: Ziyi Xu, et al.
Published: (2025-01-01)

Semantic enhancement and cross-modal interaction fusion for sentiment analysis in social media.
by: Guangyu Mu, et al.
Published: (2025-01-01)

Enhancing Word Embeddings for Improved Semantic Alignment
by: Julian Szymański, et al.
Published: (2024-12-01)

Sequence, gaze, and modal semantics: modal verb selection in German permission inquiries
by: Zinken Jörg, et al.
Published: (2025-01-01)

Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing
by: Tom Sherborne, et al.
Published: (2023-11-01)

Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions
by: Lucia Giagnolini, et al.
Published: (2025-07-01)

PASeg: positional-guided segmenter with multimodal semantic alignment for enhancing urban scene 3D semantic segmentation
by: Yang Luo, et al.
Published: (2025-08-01)

Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentation
by: emad mousavian, et al.
Published: (2025-04-01)

Regulatory Alignment for Multi-Modal Infrastructure Corridors
by: Rowland Harrison
Published: (2023-06-01)

Multi-Semantic Alignment Graph Convolutional Network
by: Jisheng Qin, et al.
Published: (2022-12-01)

Spatial-Topological-Semantic alignment for cross domain scene classification of remote sensing images with few source labels
by: Binquan Li, et al.
Published: (2024-12-01)

CAFNet: Cross-Modal Adaptive Fusion Network With Attention and Gated Weighting for RGB-T Semantic Segmentation
by: Meili Fu, et al.
Published: (2025-01-01)

Retrieval, alignment, and clustering of computational models based on semantic annotations
by: Marvin Schulz, et al.
Published: (2011-07-01)

Detecting Adversarial Examples Using Cross-Modal Semantic Embeddings From Images and Text
by: Sohee Park, et al.
Published: (2025-01-01)

CFANet: The Cross-Modal Fusion Attention Network for Indoor RGB-D Semantic Segmentation
by: Long-Fei Wu, et al.
Published: (2025-05-01)

Cross-Modal Behavioral Intelligence in Regard to a Ship Bridge: A Rough Set-Driven Framework with Enhanced Spatiotemporal Perception and Object Semantics
by: Chen Chen, et al.
Published: (2025-06-01)

The Influence of the Semantic Material on the Assessment of Speech Reception Threshold
by: Magdalena KRENZ, et al.
Published: (2015-01-01)

Modality and Negation in SIMT Use of Modality and Negation in Semantically-Informed Syntactic MT
by: Kathryn Baker, et al.
Published: (2021-03-01)

CADFormer: Fine-Grained Cross-Modal Alignment and Decoding Transformer for Referring Remote Sensing Image Segmentation
by: Maofu Liu, et al.
Published: (2025-01-01)

Word formation patterns in the perception domain: a typological study of cross-modal semantic associations
by: Norcliffe Elisabeth, et al.
Published: (2024-10-01)

Remote sensing semantic segmentation based on multimodal feature alignment and fusion
by: B. Chang, et al.
Published: (2025-08-01)

Semi-Supervised Semantic Role Labeling via Structural Alignment
by: Hagen Fürstenau, et al.
Published: (2021-03-01)

SNN-Based Semantic Segmentation Method Using Adaptive Threshold and Multi-Feature Fusion
by: HUANG Yongbin, et al.
Published: (2024-12-01)

CrossModalSync: joint temporal-spatial fusion for semantic scene segmentation in large-scale scenes
by: Shuyi Tan, et al.
Published: (2025-07-01)

A Cross-Modal Attention-Driven Multi-Sensor Fusion Method for Semantic Segmentation of Point Clouds
by: Huisheng Shi, et al.
Published: (2025-04-01)

The modulation of selective attention and divided attention on cross-modal congruence
by: Honghui Xu, et al.
Published: (2025-04-01)

Contrastive learning of cross-modal information enhancement for multimodal fake news detection
by: Weijie Chen, et al.
Published: (2025-05-01)

A Spatial and Semantic Alignment Fusion Network for SeaLand Port Segmentation
by: Bo Zhang, et al.
Published: (2025-01-01)

Spatiotemporal Semantic Modeling and Cross-Modal Collaboration-Based Gait Recognition Under Multiple Views and Various Walking Conditions
by: Qianqing Duan, et al.
Published: (2025-01-01)

Invariant Representation Learning in Multimedia Recommendation with Modality Alignment and Model Fusion
by: Xinghang Hu, et al.
Published: (2025-01-01)

MSA: Mamba Semantic Alignment Networks for Remote Sensing Change Detection
by: Zhenyang Huang, et al.
Published: (2025-01-01)

CMENet: A Cross-Modal Enhancement Network for Tobacco Leaf Grading
by: Qinglin He, et al.
Published: (2023-01-01)

Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval
by: Qiang Zou, et al.
Published: (2024-10-01)

Functional-semantic interpretation of the desire modality in Ibero-Romance languages
by: N. Zenenko
Published: (2018-06-01)

Pseudo Multi-Modal Approach to LiDAR Semantic Segmentation
by: Kyungmin Kim
Published: (2024-12-01)

Cross-modal unsupervised domain adaptation for 3D semantic segmentation via multi-scale fusion-then-distillation
by: Maomao Sun, et al.
Published: (2025-08-01)

Cross-modal knowledge distillation for enhanced depression detection
by: Huang Huang, et al.
Published: (2025-08-01)

Reduced-Threshold Emission of Capillary Filled by Doped-Dye Cholesteric Liquid Crystals With Photo-Alignment Polyimide Films
by: Yue-Lan Lu, et al.
Published: (2018-01-01)

TF-CMFA: Robust Multimodal 3D Object Detection for Dynamic Environments Using Temporal Fusion and Cross-Modal Alignment
by: Yujing Wang, et al.
Published: (2025-01-01)

Enhancing children’s numeracy and executive functions via their explicit integration
by: Gaia Scerif, et al.
Published: (2025-02-01)