Mixtec–Spanish Parallel Text Dataset for Language Technology Development
This article introduces a freely available Spanish–Mixtec parallel corpus designed to foster natural language processing (NLP) development for an indigenous language that remains digitally low-resourced. The dataset, comprising 14,587 sentence pairs, covers Mixtec variants from Guerrero (Tlacoachist...
Saved in:
| Main Authors: | Hermilo Santiago-Benito, Diana-Margarita Córdova-Esparza, Juan Terven, Noé-Alejandro Castro-Sánchez, Teresa García-Ramirez, Julio-Alejandro Romero-González, José M. Álvarez-Alvarado |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Data |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2306-5729/10/7/94 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Automatic grammatical tagger for a Spanish–Mixtec parallel corpus
by: Hermilo Santiago-Benito, et al.
Published: (2025-02-01) -
Mixtec Sound Change Database 2.0: Integrating Tone Change
by: Sandra Auderset
Published: (2025-06-01) -
Mixtec social memory in Late Renaissance Rome: Ulisse Aldrovandi, Tommaso de’ Cavalieri, and “the skull of an Indian king”
by: Davide Domenici
Published: (2024-12-01) -
La classification de la diversité de maïs des Mixtèques et des Chatines de la Sierra Sur, Oaxaca Mexique
by: Quetzalcóatl Orozco-Ramírez, et al.
Published: (2021-11-01) -
Harmony search for hyperparameters optimization of a low resource language transformer model trained with a novel parallel corpus Ocelotl Nahuatl – Spanish
by: Máximo Enrique Pacheco Martínez, et al.
Published: (2024-12-01)