A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals
Automatic Text Simplification (ATS) is a crucial task in natural language processing, aimed at making texts more comprehensible, particularly for specific groups such as individuals with visual impairments. One of the primary challenges in developing models for ATS is the scarcity of data, especiall...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10994808/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850239940866080768 |
|---|---|
| author | Nelson Perez-Rojas Saul Calderon-Ramirez Martin Solis Mario Alberto Romero-Sandoval Monica Arias-Monge Horacio Saggion |
| author_facet | Nelson Perez-Rojas Saul Calderon-Ramirez Martin Solis Mario Alberto Romero-Sandoval Monica Arias-Monge Horacio Saggion |
| author_sort | Nelson Perez-Rojas |
| collection | DOAJ |
| description | Automatic Text Simplification (ATS) is a crucial task in natural language processing, aimed at making texts more comprehensible, particularly for specific groups such as individuals with visual impairments. One of the primary challenges in developing models for ATS is the scarcity of data, especially in Spanish. This manuscript introduces a novel dataset tailored for Spanish speakers with visual impairments, consisting of 5,314 pairs of original and simplified sentences created using established simplification rules. Additionally, we evaluate the feasibility of augmenting this dataset using large language models such as Generative Pre-training Transformer (GPT)-3, TUNER, and Multilingual T5 (mT5). We compare the simplifications generated by these models with our dataset to assess their effectiveness in data augmentation. The characteristics of our dataset and the findings from these comparisons are discussed in detail. The dataset is publicly available on Hugging Face at <uri>https://huggingface.co/datasets/saul1917/FEINA</uri> |
| format | Article |
| id | doaj-art-8fc1ae49662e4e02a26670baff2ab723 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-8fc1ae49662e4e02a26670baff2ab7232025-08-20T02:01:00ZengIEEEIEEE Access2169-35362025-01-0113874728748410.1109/ACCESS.2025.356869310994808A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired IndividualsNelson Perez-Rojas0https://orcid.org/0000-0001-8929-3249Saul Calderon-Ramirez1https://orcid.org/0000-0001-9993-4388Martin Solis2https://orcid.org/0000-0003-4750-1198Mario Alberto Romero-Sandoval3Monica Arias-Monge4https://orcid.org/0000-0003-0836-0775Horacio Saggion5https://orcid.org/0000-0003-0016-7807Doctorado en Ciencias Naturales para el Desarrollo (DOCINADE), Instituto Tecnológico de Costa Rica, Universidad Nacional, Universidad Estatal a Distancia, Costa RicaEscuela de Ingeniería en Computación, Instituto Tecnológico de Costa Rica, Cartago, Costa RicaEscuela de Administración de Empresas, Instituto Tecnológico de Costa Rica, Cartago, Costa RicaMaestría en Computación, Instituto Tecnológico de Costa Rica, Cartago, Costa RicaInstituto de Investigaciones Psicológicas, Universidad de Costa Rica, San José, Costa RicaDepartamento de Tecnologías de la Información y las Comunicaciones, Universidad Pompeu Fabra, Barcelona, SpainAutomatic Text Simplification (ATS) is a crucial task in natural language processing, aimed at making texts more comprehensible, particularly for specific groups such as individuals with visual impairments. One of the primary challenges in developing models for ATS is the scarcity of data, especially in Spanish. This manuscript introduces a novel dataset tailored for Spanish speakers with visual impairments, consisting of 5,314 pairs of original and simplified sentences created using established simplification rules. Additionally, we evaluate the feasibility of augmenting this dataset using large language models such as Generative Pre-training Transformer (GPT)-3, TUNER, and Multilingual T5 (mT5). We compare the simplifications generated by these models with our dataset to assess their effectiveness in data augmentation. The characteristics of our dataset and the findings from these comparisons are discussed in detail. The dataset is publicly available on Hugging Face at <uri>https://huggingface.co/datasets/saul1917/FEINA</uri>https://ieeexplore.ieee.org/document/10994808/Automatic text simplificationlexical simplificationword complexitylexical complexity prediction |
| spellingShingle | Nelson Perez-Rojas Saul Calderon-Ramirez Martin Solis Mario Alberto Romero-Sandoval Monica Arias-Monge Horacio Saggion A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals IEEE Access Automatic text simplification lexical simplification word complexity lexical complexity prediction |
| title | A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals |
| title_full | A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals |
| title_fullStr | A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals |
| title_full_unstemmed | A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals |
| title_short | A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals |
| title_sort | novel spanish dataset for financial education text simplification targeting visually impaired individuals |
| topic | Automatic text simplification lexical simplification word complexity lexical complexity prediction |
| url | https://ieeexplore.ieee.org/document/10994808/ |
| work_keys_str_mv | AT nelsonperezrojas anovelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals AT saulcalderonramirez anovelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals AT martinsolis anovelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals AT marioalbertoromerosandoval anovelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals AT monicaariasmonge anovelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals AT horaciosaggion anovelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals AT nelsonperezrojas novelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals AT saulcalderonramirez novelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals AT martinsolis novelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals AT marioalbertoromerosandoval novelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals AT monicaariasmonge novelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals AT horaciosaggion novelspanishdatasetforfinancialeducationtextsimplificationtargetingvisuallyimpairedindividuals |