Question-matching approach based on gradual machine learning

Question matching attempts to determine whether the intentions of two different questions are similar. Recently, with the development of large-scale pretrained DNN (Deep neural network) language models, state-of-the-art question-matching performance has been achieved. However, due to the independent...

Full description

Saved in:
Bibliographic Details
Main Authors: Xuejian HE, Anqi CHEN, Zhiqiang GUO, Zhiru WANG, Qun CHEN
Format: Article
Language:zho
Published: Science Press 2025-01-01
Series:工程科学学报
Subjects:
Online Access:http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2023.11.05.002
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850248671514329088
author Xuejian HE
Anqi CHEN
Zhiqiang GUO
Zhiru WANG
Qun CHEN
author_facet Xuejian HE
Anqi CHEN
Zhiqiang GUO
Zhiru WANG
Qun CHEN
author_sort Xuejian HE
collection DOAJ
description Question matching attempts to determine whether the intentions of two different questions are similar. Recently, with the development of large-scale pretrained DNN (Deep neural network) language models, state-of-the-art question-matching performance has been achieved. However, due to the independent and identically distributed assumption, the performance of these DNN models in real-world scenarios is limited by the adequacy of the training data and the distribution drift between the target and training data. In this study, we propose a novel gradual machine learning (GML)-based approach for Chinese question matching. Beginning with initially labeled instances, this approach gradually labels target instances in order of increasing hardness via iterative factor inference on a factor graph. The proposed solution first extracts diverse semantic features from different perspectives and then constructs a factor graph by fusing the extracted features to facilitate gradual learning from easy to hard. In feature modeling, we extract and model two complementary types of features: 1) TF-IDF-based keyword features, which can capture the shallow semantic similarity between two questions; 2) DNN-based deep semantic features, which can capture the latent semantic similarity between two questions. We model keyword features as unary factors in a factor graph, which define their influence on the matching status of the two questions. The DNN-based features contain global and local features, where the global features correspond to a question pair’s matching probability as estimated by a DNN model, and the local features correspond to the semantic similarity between two neighboring question pairs estimated by their vector representations in a DNN’s embedding space. To facilitate gradual inference, we model the DNN-based global and local features as unary and binary factors, respectively, in a factor graph. Finally, we implement a GML solution for question matching based on an open-sourced GML inference engine. We validated the efficacy of the proposed approach through a comparative study on two open-sourced Chinese benchmark datasets, LCQMC and the BQ corpus. Extensive experiments demonstrate that compared with pure deep learning models, the proposed solution effectively improves the accuracy of question matching, and its performance advantage generally increases with a decrease in labeled training data. Our experiments also demonstrate that the performance of the proposed solution is very robust w.r.t key algorithmic parameters, indicating its applicability in real-world scenarios. In addition, our work on the GML solution is orthogonal to existing deep learning-based question-matching algorithms because our solution can easily accommodates and leverages other deep language models.
format Article
id doaj-art-96bd59785b094c7587fbdeed03d68801
institution OA Journals
issn 2095-9389
language zho
publishDate 2025-01-01
publisher Science Press
record_format Article
series 工程科学学报
spelling doaj-art-96bd59785b094c7587fbdeed03d688012025-08-20T01:58:38ZzhoScience Press工程科学学报2095-93892025-01-01471799010.13374/j.issn2095-9389.2023.11.05.002231105-0002Question-matching approach based on gradual machine learningXuejian HE0Anqi CHEN1Zhiqiang GUO2Zhiru WANG3Qun CHEN4Henan Forestry Vocational College, Luoyang 471002, ChinaSchool of Software, Northwestern Polytechnical University, Xi’an 710072, ChinaHenan Forestry Vocational College, Luoyang 471002, ChinaSchool of Computer Science, Northwestern Polytechnical University, Xi’an 710072, ChinaSchool of Software, Northwestern Polytechnical University, Xi’an 710072, ChinaQuestion matching attempts to determine whether the intentions of two different questions are similar. Recently, with the development of large-scale pretrained DNN (Deep neural network) language models, state-of-the-art question-matching performance has been achieved. However, due to the independent and identically distributed assumption, the performance of these DNN models in real-world scenarios is limited by the adequacy of the training data and the distribution drift between the target and training data. In this study, we propose a novel gradual machine learning (GML)-based approach for Chinese question matching. Beginning with initially labeled instances, this approach gradually labels target instances in order of increasing hardness via iterative factor inference on a factor graph. The proposed solution first extracts diverse semantic features from different perspectives and then constructs a factor graph by fusing the extracted features to facilitate gradual learning from easy to hard. In feature modeling, we extract and model two complementary types of features: 1) TF-IDF-based keyword features, which can capture the shallow semantic similarity between two questions; 2) DNN-based deep semantic features, which can capture the latent semantic similarity between two questions. We model keyword features as unary factors in a factor graph, which define their influence on the matching status of the two questions. The DNN-based features contain global and local features, where the global features correspond to a question pair’s matching probability as estimated by a DNN model, and the local features correspond to the semantic similarity between two neighboring question pairs estimated by their vector representations in a DNN’s embedding space. To facilitate gradual inference, we model the DNN-based global and local features as unary and binary factors, respectively, in a factor graph. Finally, we implement a GML solution for question matching based on an open-sourced GML inference engine. We validated the efficacy of the proposed approach through a comparative study on two open-sourced Chinese benchmark datasets, LCQMC and the BQ corpus. Extensive experiments demonstrate that compared with pure deep learning models, the proposed solution effectively improves the accuracy of question matching, and its performance advantage generally increases with a decrease in labeled training data. Our experiments also demonstrate that the performance of the proposed solution is very robust w.r.t key algorithmic parameters, indicating its applicability in real-world scenarios. In addition, our work on the GML solution is orthogonal to existing deep learning-based question-matching algorithms because our solution can easily accommodates and leverages other deep language models.http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2023.11.05.002natural language understandingchinese question matchinggradual machine learningnatural language pretraining modelfactor graph inference
spellingShingle Xuejian HE
Anqi CHEN
Zhiqiang GUO
Zhiru WANG
Qun CHEN
Question-matching approach based on gradual machine learning
工程科学学报
natural language understanding
chinese question matching
gradual machine learning
natural language pretraining model
factor graph inference
title Question-matching approach based on gradual machine learning
title_full Question-matching approach based on gradual machine learning
title_fullStr Question-matching approach based on gradual machine learning
title_full_unstemmed Question-matching approach based on gradual machine learning
title_short Question-matching approach based on gradual machine learning
title_sort question matching approach based on gradual machine learning
topic natural language understanding
chinese question matching
gradual machine learning
natural language pretraining model
factor graph inference
url http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2023.11.05.002
work_keys_str_mv AT xuejianhe questionmatchingapproachbasedongradualmachinelearning
AT anqichen questionmatchingapproachbasedongradualmachinelearning
AT zhiqiangguo questionmatchingapproachbasedongradualmachinelearning
AT zhiruwang questionmatchingapproachbasedongradualmachinelearning
AT qunchen questionmatchingapproachbasedongradualmachinelearning