Supervised Sentiment Analysis of Indirect Qualitative Student Feedback for Unbiased Opinion Mining

In the education domain, the significance of student feedback and other stakeholders for raising educational standards has received more attention in recent years. As a result, numerous instruments and strategies for obtaining student input and assessing faculty performance, as well as other facets...

Full description

Saved in:
Bibliographic Details
Main Authors: Smitha Bidadi Anjan Prasad, Raja Praveen Kumar Nakka
Format: Article
Language:English
Published: MDPI AG 2023-12-01
Series:Engineering Proceedings
Subjects:
Online Access:https://www.mdpi.com/2673-4591/59/1/15
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the education domain, the significance of student feedback and other stakeholders for raising educational standards has received more attention in recent years. As a result, numerous instruments and strategies for obtaining student input and assessing faculty performance, as well as other facets of education, have been developed. There are two main methods to collect feedback from students, as follows: the direct and indirect methods. In the direct method, feedback is collected by distributing a questionnaire and taking their responses. The limitation of this method is that the true experience of students is not revealed, and there is room for bias in the collection and assessment of such a questionnaire. To overcome this limitation, the indirect method can be followed where social media posts can be used to collect feedback from students as they are active on social media and use it to express their opinions as posts. To address the problem of the manual annotation of large volumes of data, this paper proposes a machine learning method that uses the sentiment 140 dataset as the training set to automate the process of annotations of tweets. The same method can be used to label any qualitative data. In total, 5000 tweets were scraped and considered for this study. Various pre-processing methods, including byte-order-mark removal, hashtag removal, stop word removal, and tokenization, were applied to the data. The term frequency-inverse document frequency (TF-IDF) trigrams technique was then used to process the cleaned data. The TF-IDF technique using trigrams captures negation for sentiment analysis. The vectorized data are then processed using various machine learning algorithms to classify the polarity of tweets. Performance parameters such as the F1-score, recall, accuracy, and precision are compared. With a 94.16% F1-score, 94% precision, 94% recall, and 95.16% accuracy, the Ridge Classifier performed better than the others.
ISSN:2673-4591