Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts

Stylometric analysis is a tool across the social sciences and humanities, aiding disciplines like education, psychology, history, anthropology, and linguistics. However, most tools are developed for English, limiting their effectiveness for Spanish texts, which involve complex inflections. This pape...

Full description

Saved in:
Bibliographic Details
Main Authors: Francisco J. Carreras-Riudavets, Zenón J. Hernández-Figueroa
Format: Article
Language:English
Published: Taylor & Francis Group 2025-12-01
Series:Cogent Arts & Humanities
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/23311983.2025.2451513
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841527664937009152
author Francisco J. Carreras-Riudavets
Zenón J. Hernández-Figueroa
author_facet Francisco J. Carreras-Riudavets
Zenón J. Hernández-Figueroa
author_sort Francisco J. Carreras-Riudavets
collection DOAJ
description Stylometric analysis is a tool across the social sciences and humanities, aiding disciplines like education, psychology, history, anthropology, and linguistics. However, most tools are developed for English, limiting their effectiveness for Spanish texts, which involve complex inflections. This paper addresses this gap by introducing Estilometría TIP, a web-based tool specifically designed for the stylometric analysis of Spanish texts. Estilometría TIP overcomes the challenges posed by Spanish’s inflected forms through two primary functionalities. First, it offers customizable metrics: researchers can define and compute their own metrics using a configuration file, allowing them to tailor their analyses to specific research needs across different fields. This feature dynamically adjusts the user interface, adding or modifying menus to facilitate seamless exploration of customized results. Second, Estilometría TIP incorporates Lexicon TIP, a highly accurate lexical recognition service for Spanish with an accuracy of over 99.8%. Lexicon TIP draws on a comprehensive database of more than 320,000 lemmas and 8 million inflected forms, accounting for variations in number, gender, superlatives, diminutives, augmentatives, derogatory terms, and verb conjugations. Two key algorithms enhance this functionality: prefix detection, which accurately identifies prefixed words (e.g. ‘predeterminar’), and enclitic pronoun identification, which handles verb forms combined with enclitic pronouns (e.g. ‘comiéndotelas’).
format Article
id doaj-art-36649c6ad62d4cc1b5bc17969fbb997c
institution Kabale University
issn 2331-1983
language English
publishDate 2025-12-01
publisher Taylor & Francis Group
record_format Article
series Cogent Arts & Humanities
spelling doaj-art-36649c6ad62d4cc1b5bc17969fbb997c2025-01-15T09:34:59ZengTaylor & Francis GroupCogent Arts & Humanities2331-19832025-12-0112110.1080/23311983.2025.2451513Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish textsFrancisco J. Carreras-Riudavets0Zenón J. Hernández-Figueroa1Research Institute of Text Analysis and Applications (IATEXT) University of Las Palmas de Gran Canaria, Las Palmas de G.C, SpainResearch Institute of Text Analysis and Applications (IATEXT) University of Las Palmas de Gran Canaria, Las Palmas de G.C, SpainStylometric analysis is a tool across the social sciences and humanities, aiding disciplines like education, psychology, history, anthropology, and linguistics. However, most tools are developed for English, limiting their effectiveness for Spanish texts, which involve complex inflections. This paper addresses this gap by introducing Estilometría TIP, a web-based tool specifically designed for the stylometric analysis of Spanish texts. Estilometría TIP overcomes the challenges posed by Spanish’s inflected forms through two primary functionalities. First, it offers customizable metrics: researchers can define and compute their own metrics using a configuration file, allowing them to tailor their analyses to specific research needs across different fields. This feature dynamically adjusts the user interface, adding or modifying menus to facilitate seamless exploration of customized results. Second, Estilometría TIP incorporates Lexicon TIP, a highly accurate lexical recognition service for Spanish with an accuracy of over 99.8%. Lexicon TIP draws on a comprehensive database of more than 320,000 lemmas and 8 million inflected forms, accounting for variations in number, gender, superlatives, diminutives, augmentatives, derogatory terms, and verb conjugations. Two key algorithms enhance this functionality: prefix detection, which accurately identifies prefixed words (e.g. ‘predeterminar’), and enclitic pronoun identification, which handles verb forms combined with enclitic pronouns (e.g. ‘comiéndotelas’).https://www.tandfonline.com/doi/10.1080/23311983.2025.2451513Computational linguisticsstylometric analysistext analysis toolsreadability metricsmorphologyComputer Science (General)
spellingShingle Francisco J. Carreras-Riudavets
Zenón J. Hernández-Figueroa
Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts
Cogent Arts & Humanities
Computational linguistics
stylometric analysis
text analysis tools
readability metrics
morphology
Computer Science (General)
title Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts
title_full Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts
title_fullStr Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts
title_full_unstemmed Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts
title_short Estilometría TIP: enhanced text analysis tool with customisable metrics for Spanish texts
title_sort estilometria tip enhanced text analysis tool with customisable metrics for spanish texts
topic Computational linguistics
stylometric analysis
text analysis tools
readability metrics
morphology
Computer Science (General)
url https://www.tandfonline.com/doi/10.1080/23311983.2025.2451513
work_keys_str_mv AT franciscojcarrerasriudavets estilometriatipenhancedtextanalysistoolwithcustomisablemetricsforspanishtexts
AT zenonjhernandezfigueroa estilometriatipenhancedtextanalysistoolwithcustomisablemetricsforspanishtexts