Enhancing Text Similarity Measurement with Hybrid Siamese Neural Networks and Lexical Features

Accurately measuring text similarity holds significant importance in various text-centric applications, including text clustering, information retrieval, and question/answer systems. This study focuses on enhancing the precision of deep learning models in gauging text similarity. To achieve this, a...

Full description

Saved in:
Bibliographic Details
Main Author: Bei Zhou
Format: Article
Language:English
Published: Bilijipub publisher 2025-03-01
Series:Advances in Engineering and Intelligence Systems
Subjects:
Online Access:https://aeis.bilijipub.com/article_218018_edc285458ddacd0913c93d26caca7639.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Accurately measuring text similarity holds significant importance in various text-centric applications, including text clustering, information retrieval, and question/answer systems. This study focuses on enhancing the precision of deep learning models in gauging text similarity. To achieve this, a novel hybrid approach is proposed, integrating a Siamese neural network with lexical similarity features. The Siamese network comprises two parallel sub-networks, each featuring a word embedding layer and a deep neural network. This study explores three variations of deep neural networks (CNN, LSTM, Bi-LSTM), alongside two types of word embedding models and lexical similarity features, constructing diverse models. Evaluation across three distinct datasets demonstrates the superiority of the hybrid Siamese neural network model, leveraging convolutional networks and lexical features, showcasing higher Pearson's correlation and lower mean square errors (MSE) compared to literature models. These results signify advancements in accurately assessing text similarity. The combined Siamese network model, incorporating a convolutional network, lexical features, and the cross-embedding layer (SNN_CNN_feat), achieved the highest correlation value (0.7590) and the lowest MSE error value (1.0235), as established.
ISSN:2821-0263