A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals

Automatic Text Simplification (ATS) is a crucial task in natural language processing, aimed at making texts more comprehensible, particularly for specific groups such as individuals with visual impairments. One of the primary challenges in developing models for ATS is the scarcity of data, especiall...

Full description

Saved in:
Bibliographic Details
Main Authors: Nelson Perez-Rojas, Saul Calderon-Ramirez, Martin Solis, Mario Alberto Romero-Sandoval, Monica Arias-Monge, Horacio Saggion
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10994808/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850239940866080768
author Nelson Perez-Rojas
Saul Calderon-Ramirez
Martin Solis
Mario Alberto Romero-Sandoval
Monica Arias-Monge
Horacio Saggion
author_facet Nelson Perez-Rojas
Saul Calderon-Ramirez
Martin Solis
Mario Alberto Romero-Sandoval
Monica Arias-Monge
Horacio Saggion
author_sort Nelson Perez-Rojas
collection DOAJ
description Automatic Text Simplification (ATS) is a crucial task in natural language processing, aimed at making texts more comprehensible, particularly for specific groups such as individuals with visual impairments. One of the primary challenges in developing models for ATS is the scarcity of data, especially in Spanish. This manuscript introduces a novel dataset tailored for Spanish speakers with visual impairments, consisting of 5,314 pairs of original and simplified sentences created using established simplification rules. Additionally, we evaluate the feasibility of augmenting this dataset using large language models such as Generative Pre-training Transformer (GPT)-3, TUNER, and Multilingual T5 (mT5). We compare the simplifications generated by these models with our dataset to assess their effectiveness in data augmentation. The characteristics of our dataset and the findings from these comparisons are discussed in detail. The dataset is publicly available on Hugging Face at <uri>https://huggingface.co/datasets/saul1917/FEINA</uri>
format Article
id doaj-art-8fc1ae49662e4e02a26670baff2ab723
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-8fc1ae49662e4e02a26670baff2ab7232025-08-20T02:01:00ZengIEEEIEEE Access2169-35362025-01-0113874728748410.1109/ACCESS.2025.356869310994808A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired IndividualsNelson Perez-Rojas0https://orcid.org/0000-0001-8929-3249Saul Calderon-Ramirez1https://orcid.org/0000-0001-9993-4388Martin Solis2https://orcid.org/0000-0003-4750-1198Mario Alberto Romero-Sandoval3Monica Arias-Monge4https://orcid.org/0000-0003-0836-0775Horacio Saggion5https://orcid.org/0000-0003-0016-7807Doctorado en Ciencias Naturales para el Desarrollo (DOCINADE), Instituto Tecnol&#x00F3;gico de Costa Rica, Universidad Nacional, Universidad Estatal a Distancia, Costa RicaEscuela de Ingenier&#x00ED;a en Computaci&#x00F3;n, Instituto Tecnol&#x00F3;gico de Costa Rica, Cartago, Costa RicaEscuela de Administraci&#x00F3;n de Empresas, Instituto Tecnol&#x00F3;gico de Costa Rica, Cartago, Costa RicaMaestr&#x00ED;a en Computaci&#x00F3;n, Instituto Tecnol&#x00F3;gico de Costa Rica, Cartago, Costa RicaInstituto de Investigaciones Psicol&#x00F3;gicas, Universidad de Costa Rica, San Jos&#x00E9;, Costa RicaDepartamento de Tecnolog&#x00ED;as de la Informaci&#x00F3;n y las Comunicaciones, Universidad Pompeu Fabra, Barcelona, SpainAutomatic Text Simplification (ATS) is a crucial task in natural language processing, aimed at making texts more comprehensible, particularly for specific groups such as individuals with visual impairments. One of the primary challenges in developing models for ATS is the scarcity of data, especially in Spanish. This manuscript introduces a novel dataset tailored for Spanish speakers with visual impairments, consisting of 5,314 pairs of original and simplified sentences created using established simplification rules. Additionally, we evaluate the feasibility of augmenting this dataset using large language models such as Generative Pre-training Transformer (GPT)-3, TUNER, and Multilingual T5 (mT5). We compare the simplifications generated by these models with our dataset to assess their effectiveness in data augmentation. The characteristics of our dataset and the findings from these comparisons are discussed in detail. The dataset is publicly available on Hugging Face at <uri>https://huggingface.co/datasets/saul1917/FEINA</uri>https://ieeexplore.ieee.org/document/10994808/Automatic text simplificationlexical simplificationword complexitylexical complexity prediction
spellingShingle Nelson Perez-Rojas
Saul Calderon-Ramirez
Martin Solis
Mario Alberto Romero-Sandoval
Monica Arias-Monge
Horacio Saggion
A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals
IEEE Access
Automatic text simplification
lexical simplification
word complexity
lexical complexity prediction
title A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals
title_full A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals
title_fullStr A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals
title_full_unstemmed A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals
title_short A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals
title_sort novel spanish dataset for financial education text simplification targeting visually impaired individuals
topic Automatic text simplification
lexical simplification
word complexity
lexical complexity prediction
url https://ieeexplore.ieee.org/document/10994808/
work_keys_str_mv AT nelsonperezrojas anovelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals
AT saulcalderonramirez anovelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals
AT martinsolis anovelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals
AT marioalbertoromerosandoval anovelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals
AT monicaariasmonge anovelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals
AT horaciosaggion anovelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals
AT nelsonperezrojas novelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals
AT saulcalderonramirez novelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals
AT martinsolis novelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals
AT marioalbertoromerosandoval novelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals
AT monicaariasmonge novelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals
AT horaciosaggion novelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals