Design of hybrid machine learning and TF-IDF models to discard irrelevant reviews on public transport stations

Users' opinions on social media about city landmarks are valuable tools for the responsible authorities. However, if the name of a city landmark is similar to a slightly related but different item, then users may be confused and inadvertently comment on the wrong item. Public transport stations...

Full description

Saved in:
Bibliographic Details
Main Authors: Manuel Méndez, Mercedes G. Merayo, Manuel Núñez
Format: Article
Language:English
Published: Taylor & Francis Group 2025-03-01
Series:Journal of Information and Telecommunication
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/24751839.2025.2472503
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849763371616829440
author Manuel Méndez
Mercedes G. Merayo
Manuel Núñez
author_facet Manuel Méndez
Mercedes G. Merayo
Manuel Núñez
author_sort Manuel Méndez
collection DOAJ
description Users' opinions on social media about city landmarks are valuable tools for the responsible authorities. However, if the name of a city landmark is similar to a slightly related but different item, then users may be confused and inadvertently comment on the wrong item. Public transport stations are a good example of this situation because the stations' names often refer to their location (e.g. a square, a neighbourhood, a hospital, etc.). In this paper, we use artificial intelligence models to develop a classification system that distinguishes reviews referring to the station itself from those that do not. To achieve this, we apply Natural Language Processing (NLP) techniques to numerically represent words and phrases, and artificial intelligence models to classify the text once it is numerically represented. Our experiments show that the combination of Term Frequency-Inverse Document Frequency (TF-IDF) and machine learning models, such as Support Vector Machine and Random Forest, yields the best results overall. To establish a precise setting for evaluating our system, we consider reviews on Google Maps about Madrid metro stations. However, our methodology should be easily extrapolated to other transport networks.
format Article
id doaj-art-c10cf4d6fa0f4ea6b6b637ee84756c0c
institution DOAJ
issn 2475-1839
2475-1847
language English
publishDate 2025-03-01
publisher Taylor & Francis Group
record_format Article
series Journal of Information and Telecommunication
spelling doaj-art-c10cf4d6fa0f4ea6b6b637ee84756c0c2025-08-20T03:05:26ZengTaylor & Francis GroupJournal of Information and Telecommunication2475-18392475-18472025-03-0112410.1080/24751839.2025.2472503Design of hybrid machine learning and TF-IDF models to discard irrelevant reviews on public transport stationsManuel Méndez0Mercedes G. Merayo1Manuel Núñez2Design and Testing of Reliable Systems Research Group, Complutense University of Madrid, Madrid, SpainDesign and Testing of Reliable Systems Research Group, Complutense University of Madrid, Madrid, SpainDesign and Testing of Reliable Systems Research Group, Complutense University of Madrid, Madrid, SpainUsers' opinions on social media about city landmarks are valuable tools for the responsible authorities. However, if the name of a city landmark is similar to a slightly related but different item, then users may be confused and inadvertently comment on the wrong item. Public transport stations are a good example of this situation because the stations' names often refer to their location (e.g. a square, a neighbourhood, a hospital, etc.). In this paper, we use artificial intelligence models to develop a classification system that distinguishes reviews referring to the station itself from those that do not. To achieve this, we apply Natural Language Processing (NLP) techniques to numerically represent words and phrases, and artificial intelligence models to classify the text once it is numerically represented. Our experiments show that the combination of Term Frequency-Inverse Document Frequency (TF-IDF) and machine learning models, such as Support Vector Machine and Random Forest, yields the best results overall. To establish a precise setting for evaluating our system, we consider reviews on Google Maps about Madrid metro stations. However, our methodology should be easily extrapolated to other transport networks.https://www.tandfonline.com/doi/10.1080/24751839.2025.2472503Machine learningdeep learningnatural language processingclassification of comments on social networks
spellingShingle Manuel Méndez
Mercedes G. Merayo
Manuel Núñez
Design of hybrid machine learning and TF-IDF models to discard irrelevant reviews on public transport stations
Journal of Information and Telecommunication
Machine learning
deep learning
natural language processing
classification of comments on social networks
title Design of hybrid machine learning and TF-IDF models to discard irrelevant reviews on public transport stations
title_full Design of hybrid machine learning and TF-IDF models to discard irrelevant reviews on public transport stations
title_fullStr Design of hybrid machine learning and TF-IDF models to discard irrelevant reviews on public transport stations
title_full_unstemmed Design of hybrid machine learning and TF-IDF models to discard irrelevant reviews on public transport stations
title_short Design of hybrid machine learning and TF-IDF models to discard irrelevant reviews on public transport stations
title_sort design of hybrid machine learning and tf idf models to discard irrelevant reviews on public transport stations
topic Machine learning
deep learning
natural language processing
classification of comments on social networks
url https://www.tandfonline.com/doi/10.1080/24751839.2025.2472503
work_keys_str_mv AT manuelmendez designofhybridmachinelearningandtfidfmodelstodiscardirrelevantreviewsonpublictransportstations
AT mercedesgmerayo designofhybridmachinelearningandtfidfmodelstodiscardirrelevantreviewsonpublictransportstations
AT manuelnunez designofhybridmachinelearningandtfidfmodelstodiscardirrelevantreviewsonpublictransportstations