Heterogeneous Graph Neural Network with Multi-View Contrastive Learning for Cross-Lingual Text Classification

The cross-lingual text classification task remains a long-standing challenge that aims to train a classifier on high-resource source languages and apply it to classify texts in low-resource target languages, bridging linguistic gaps while maintaining accuracy. Most existing methods achieve exception...

Full description

Saved in:
Bibliographic Details
Main Authors: Xun Li, Kun Zhang
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3454
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The cross-lingual text classification task remains a long-standing challenge that aims to train a classifier on high-resource source languages and apply it to classify texts in low-resource target languages, bridging linguistic gaps while maintaining accuracy. Most existing methods achieve exceptional performance by relying on multilingual pretrained language models to transfer knowledge across languages. However, little attention has been paid to factors beyond semantic similarity, which leads to the degradation of classification performance in the target languages. This study proposes a novel framework, a heterogeneous graph neural network with multi-view contrastive learning for cross-lingual text classification, which integrates a heterogeneous graph architecture with multi-view contrastive learning for the cross-lingual text classification task. This study constructs a heterogeneous graph to capture both syntactic and semantic knowledge by connecting document and word nodes using different types of edges, including Part-of-Speech tagging, dependency, similarity, and translation edges. A Graph Attention Network is applied to aggregate information from neighboring nodes. Furthermore, this study devises a multi-view contrastive learning strategy to enhance model performance by pulling positive examples closer together and pushing negative examples further apart. Extensive experiments show that the framework outperforms the previous state-of-the-art model, achieving improvements of 2.20% in accuracy and 1.96% in F1-score on the XGLUE and Amazon Review datasets, respectively. These findings demonstrate that the proposed model makes a positive impact on the cross-lingual text classification task overall.
ISSN:2076-3417