Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts
Stylometric analysis is a tool across the social sciences and humanities, aiding disciplines like education, psychology, history, anthropology, and linguistics. However, most tools are developed for English, limiting their effectiveness for Spanish texts, which involve complex inflections. This pape...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2025-12-01
|
Series: | Cogent Arts & Humanities |
Subjects: | |
Online Access: | https://www.tandfonline.com/doi/10.1080/23311983.2025.2451513 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841527664937009152 |
---|---|
author | Francisco J. Carreras-Riudavets Zenón J. Hernández-Figueroa |
author_facet | Francisco J. Carreras-Riudavets Zenón J. Hernández-Figueroa |
author_sort | Francisco J. Carreras-Riudavets |
collection | DOAJ |
description | Stylometric analysis is a tool across the social sciences and humanities, aiding disciplines like education, psychology, history, anthropology, and linguistics. However, most tools are developed for English, limiting their effectiveness for Spanish texts, which involve complex inflections. This paper addresses this gap by introducing Estilometría TIP, a web-based tool specifically designed for the stylometric analysis of Spanish texts. Estilometría TIP overcomes the challenges posed by Spanish’s inflected forms through two primary functionalities. First, it offers customizable metrics: researchers can define and compute their own metrics using a configuration file, allowing them to tailor their analyses to specific research needs across different fields. This feature dynamically adjusts the user interface, adding or modifying menus to facilitate seamless exploration of customized results. Second, Estilometría TIP incorporates Lexicon TIP, a highly accurate lexical recognition service for Spanish with an accuracy of over 99.8%. Lexicon TIP draws on a comprehensive database of more than 320,000 lemmas and 8 million inflected forms, accounting for variations in number, gender, superlatives, diminutives, augmentatives, derogatory terms, and verb conjugations. Two key algorithms enhance this functionality: prefix detection, which accurately identifies prefixed words (e.g. ‘predeterminar’), and enclitic pronoun identification, which handles verb forms combined with enclitic pronouns (e.g. ‘comiéndotelas’). |
format | Article |
id | doaj-art-36649c6ad62d4cc1b5bc17969fbb997c |
institution | Kabale University |
issn | 2331-1983 |
language | English |
publishDate | 2025-12-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Cogent Arts & Humanities |
spelling | doaj-art-36649c6ad62d4cc1b5bc17969fbb997c2025-01-15T09:34:59ZengTaylor & Francis GroupCogent Arts & Humanities2331-19832025-12-0112110.1080/23311983.2025.2451513Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish textsFrancisco J. Carreras-Riudavets0Zenón J. Hernández-Figueroa1Research Institute of Text Analysis and Applications (IATEXT) University of Las Palmas de Gran Canaria, Las Palmas de G.C, SpainResearch Institute of Text Analysis and Applications (IATEXT) University of Las Palmas de Gran Canaria, Las Palmas de G.C, SpainStylometric analysis is a tool across the social sciences and humanities, aiding disciplines like education, psychology, history, anthropology, and linguistics. However, most tools are developed for English, limiting their effectiveness for Spanish texts, which involve complex inflections. This paper addresses this gap by introducing Estilometría TIP, a web-based tool specifically designed for the stylometric analysis of Spanish texts. Estilometría TIP overcomes the challenges posed by Spanish’s inflected forms through two primary functionalities. First, it offers customizable metrics: researchers can define and compute their own metrics using a configuration file, allowing them to tailor their analyses to specific research needs across different fields. This feature dynamically adjusts the user interface, adding or modifying menus to facilitate seamless exploration of customized results. Second, Estilometría TIP incorporates Lexicon TIP, a highly accurate lexical recognition service for Spanish with an accuracy of over 99.8%. Lexicon TIP draws on a comprehensive database of more than 320,000 lemmas and 8 million inflected forms, accounting for variations in number, gender, superlatives, diminutives, augmentatives, derogatory terms, and verb conjugations. Two key algorithms enhance this functionality: prefix detection, which accurately identifies prefixed words (e.g. ‘predeterminar’), and enclitic pronoun identification, which handles verb forms combined with enclitic pronouns (e.g. ‘comiéndotelas’).https://www.tandfonline.com/doi/10.1080/23311983.2025.2451513Computational linguisticsstylometric analysistext analysis toolsreadability metricsmorphologyComputer Science (General) |
spellingShingle | Francisco J. Carreras-Riudavets Zenón J. Hernández-Figueroa Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts Cogent Arts & Humanities Computational linguistics stylometric analysis text analysis tools readability metrics morphology Computer Science (General) |
title | Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts |
title_full | Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts |
title_fullStr | Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts |
title_full_unstemmed | Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts |
title_short | Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts |
title_sort | estilometria tip enhanced text analysis tool with customisable metrics for spanish texts |
topic | Computational linguistics stylometric analysis text analysis tools readability metrics morphology Computer Science (General) |
url | https://www.tandfonline.com/doi/10.1080/23311983.2025.2451513 |
work_keys_str_mv | AT franciscojcarrerasriudavets estilometriatipenhancedtextanalysistoolwithcustomisablemetricsforspanishtexts AT zenonjhernandezfigueroa estilometriatipenhancedtextanalysistoolwithcustomisablemetricsforspanishtexts |