Text Classification by Genre Based on Rhythm Features

The article is devoted to the analysis of the rhythm of texts of different genres: fiction novels, advertisements, scientific articles, reviews, tweets, and political articles. The authors identified lexico-grammatical figures in the texts: anaphora, epiphora, diacope, aposiopesis, etc., that are ma...

Full description

Saved in:
Bibliographic Details
Main Authors: Ksenia Vladimirovna Lagutina, Nadezhda Stanislavovna Lagutina, Elena Igorevna Boychuk
Format: Article
Language:English
Published: Yaroslavl State University 2021-10-01
Series:Моделирование и анализ информационных систем
Subjects:
Online Access:https://www.mais-journal.ru/jour/article/view/1528
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849688283729100800
author Ksenia Vladimirovna Lagutina
Nadezhda Stanislavovna Lagutina
Elena Igorevna Boychuk
author_facet Ksenia Vladimirovna Lagutina
Nadezhda Stanislavovna Lagutina
Elena Igorevna Boychuk
author_sort Ksenia Vladimirovna Lagutina
collection DOAJ
description The article is devoted to the analysis of the rhythm of texts of different genres: fiction novels, advertisements, scientific articles, reviews, tweets, and political articles. The authors identified lexico-grammatical figures in the texts: anaphora, epiphora, diacope, aposiopesis, etc., that are markers of the text rhythm. On their basis, statistical features were calculated that describe quantitatively and structurally these rhythm features.The resulting text model was visualized for statistical analysis using boxplots and heat maps that showed differences in the rhythm of texts of different genres. The boxplots showed that almost all genres differ from each other in terms of the overall density of rhythm features. Heatmaps showed different rhythm patterns across genres. Further, the rhythm features were successfully used to classify texts into six genres. The classification was carried out in two ways: a binary classification for each genre in order to separate a particular genre from the rest genres, and a multi-class classification of the text corpus into six genres at once. Two text corpora in English and Russian were used for the experiments. Each corpus contains 100 fiction novels, scientific articles, advertisements and tweets, 50 reviews and political articles, i.e. a total of 500 texts. The high quality of the classification with neural networks showed that rhythm features are a good marker for most genres, especially fiction. The experiments were carried out using the ProseRhythmDetector software tool for Russian and English languages. Text corpora contains 300 texts for each language.
format Article
id doaj-art-2dd6ca4034d144288910b3860f3e0981
institution DOAJ
issn 1818-1015
2313-5417
language English
publishDate 2021-10-01
publisher Yaroslavl State University
record_format Article
series Моделирование и анализ информационных систем
spelling doaj-art-2dd6ca4034d144288910b3860f3e09812025-08-20T03:22:03ZengYaroslavl State UniversityМоделирование и анализ информационных систем1818-10152313-54172021-10-0128328029110.18255/1818-1015-2021-3-280-2911163Text Classification by Genre Based on Rhythm FeaturesKsenia Vladimirovna Lagutina0Nadezhda Stanislavovna Lagutina1Elena Igorevna Boychuk2P.G. Demidov Yaroslavl State UniversityP.G. Demidov Yaroslavl State UniversityYaroslavl State Pedagogical University named after K.D. UshinskyThe article is devoted to the analysis of the rhythm of texts of different genres: fiction novels, advertisements, scientific articles, reviews, tweets, and political articles. The authors identified lexico-grammatical figures in the texts: anaphora, epiphora, diacope, aposiopesis, etc., that are markers of the text rhythm. On their basis, statistical features were calculated that describe quantitatively and structurally these rhythm features.The resulting text model was visualized for statistical analysis using boxplots and heat maps that showed differences in the rhythm of texts of different genres. The boxplots showed that almost all genres differ from each other in terms of the overall density of rhythm features. Heatmaps showed different rhythm patterns across genres. Further, the rhythm features were successfully used to classify texts into six genres. The classification was carried out in two ways: a binary classification for each genre in order to separate a particular genre from the rest genres, and a multi-class classification of the text corpus into six genres at once. Two text corpora in English and Russian were used for the experiments. Each corpus contains 100 fiction novels, scientific articles, advertisements and tweets, 50 reviews and political articles, i.e. a total of 500 texts. The high quality of the classification with neural networks showed that rhythm features are a good marker for most genres, especially fiction. The experiments were carried out using the ProseRhythmDetector software tool for Russian and English languages. Text corpora contains 300 texts for each language.https://www.mais-journal.ru/jour/article/view/1528stylometrynatural language processingrhythm featuresgenrestext classification
spellingShingle Ksenia Vladimirovna Lagutina
Nadezhda Stanislavovna Lagutina
Elena Igorevna Boychuk
Text Classification by Genre Based on Rhythm Features
Моделирование и анализ информационных систем
stylometry
natural language processing
rhythm features
genres
text classification
title Text Classification by Genre Based on Rhythm Features
title_full Text Classification by Genre Based on Rhythm Features
title_fullStr Text Classification by Genre Based on Rhythm Features
title_full_unstemmed Text Classification by Genre Based on Rhythm Features
title_short Text Classification by Genre Based on Rhythm Features
title_sort text classification by genre based on rhythm features
topic stylometry
natural language processing
rhythm features
genres
text classification
url https://www.mais-journal.ru/jour/article/view/1528
work_keys_str_mv AT kseniavladimirovnalagutina textclassificationbygenrebasedonrhythmfeatures
AT nadezhdastanislavovnalagutina textclassificationbygenrebasedonrhythmfeatures
AT elenaigorevnaboychuk textclassificationbygenrebasedonrhythmfeatures