LLM performance on mathematical reasoning in Catalan language
Large Language Models (LLMs) have revolutionized Artificial Intelligence (AI), with OpenAI's reasoning models and, more recently, DeepSeek reshaping the landscape. Despite AI being a strategic priority in Europe, the region lags behind global leaders. Spain's ALIA initiative, trained in Sp...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-03-01
|
| Series: | Results in Engineering |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2590123025004475 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850084850240847872 |
|---|---|
| author | Lamyae Rhomrasi Yusef Ahsini Arnau Igualde-Sáez Ricardo Vinuesa Sergio Hoyas José Pedro García-Sabater Màrius J. Fullana-i-Alfonso J. Alberto Conejero |
| author_facet | Lamyae Rhomrasi Yusef Ahsini Arnau Igualde-Sáez Ricardo Vinuesa Sergio Hoyas José Pedro García-Sabater Màrius J. Fullana-i-Alfonso J. Alberto Conejero |
| author_sort | Lamyae Rhomrasi |
| collection | DOAJ |
| description | Large Language Models (LLMs) have revolutionized Artificial Intelligence (AI), with OpenAI's reasoning models and, more recently, DeepSeek reshaping the landscape. Despite AI being a strategic priority in Europe, the region lags behind global leaders. Spain's ALIA initiative, trained in Spanish and Catalan, seeks to bridge this gap. We assess ALIA and DeepSeek's performance against top LLMs using a dataset of high-school-level mathematical problems in Catalan from the Kangaroo Mathematics Competition. These exams are multiple-choice, with five options. We compiled each LLM's solution and the reasoning behind their answers. The results indicate that ALIA underperforms compared to all other evaluated LLMs, scoring worse than random guessing. Furthermore, it frequently failed to provide complete reasoning, while models like DeepSeek achieved up to 96% accuracy. Open-source LLMs are as powerful as closed ones for this task. These findings underscore challenges in European AI competitiveness and highlight the need to distill knowledge from large models into smaller, more efficient ones for specialized applications. |
| format | Article |
| id | doaj-art-0b218c2621cb4d6f867c8e4d01866a23 |
| institution | DOAJ |
| issn | 2590-1230 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Results in Engineering |
| spelling | doaj-art-0b218c2621cb4d6f867c8e4d01866a232025-08-20T02:43:54ZengElsevierResults in Engineering2590-12302025-03-012510436610.1016/j.rineng.2025.104366LLM performance on mathematical reasoning in Catalan languageLamyae Rhomrasi0Yusef Ahsini1Arnau Igualde-Sáez2Ricardo Vinuesa3Sergio Hoyas4José Pedro García-Sabater5Màrius J. Fullana-i-Alfonso6J. Alberto Conejero7Instituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València, 46022, Valencia, SpainInstituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València, 46022, Valencia, SpainGrupo de Investigación en Reingeniería, Organización, trabajo en Grupo y Logística Empresarial - ROGLE, Universitat Politècnica de València, 46022, Valencia, SpainFLOW, Engineering Mechanics, KTH Royal Institute of Technology, SE-100 44, Stockholm, SwedenInstituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València, 46022, Valencia, SpainGrupo de Investigación en Reingeniería, Organización, trabajo en Grupo y Logística Empresarial - ROGLE, Universitat Politècnica de València, 46022, Valencia, SpainInstitut Universitari de Matemàtica Multidisciplinària, Universitat Politècnica de València, 46022, Valencia, Spain; Corresponding author.Instituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València, 46022, Valencia, SpainLarge Language Models (LLMs) have revolutionized Artificial Intelligence (AI), with OpenAI's reasoning models and, more recently, DeepSeek reshaping the landscape. Despite AI being a strategic priority in Europe, the region lags behind global leaders. Spain's ALIA initiative, trained in Spanish and Catalan, seeks to bridge this gap. We assess ALIA and DeepSeek's performance against top LLMs using a dataset of high-school-level mathematical problems in Catalan from the Kangaroo Mathematics Competition. These exams are multiple-choice, with five options. We compiled each LLM's solution and the reasoning behind their answers. The results indicate that ALIA underperforms compared to all other evaluated LLMs, scoring worse than random guessing. Furthermore, it frequently failed to provide complete reasoning, while models like DeepSeek achieved up to 96% accuracy. Open-source LLMs are as powerful as closed ones for this task. These findings underscore challenges in European AI competitiveness and highlight the need to distill knowledge from large models into smaller, more efficient ones for specialized applications.http://www.sciencedirect.com/science/article/pii/S2590123025004475Large language modelMathematical reasoningBenchmarksKangaroo contest |
| spellingShingle | Lamyae Rhomrasi Yusef Ahsini Arnau Igualde-Sáez Ricardo Vinuesa Sergio Hoyas José Pedro García-Sabater Màrius J. Fullana-i-Alfonso J. Alberto Conejero LLM performance on mathematical reasoning in Catalan language Results in Engineering Large language model Mathematical reasoning Benchmarks Kangaroo contest |
| title | LLM performance on mathematical reasoning in Catalan language |
| title_full | LLM performance on mathematical reasoning in Catalan language |
| title_fullStr | LLM performance on mathematical reasoning in Catalan language |
| title_full_unstemmed | LLM performance on mathematical reasoning in Catalan language |
| title_short | LLM performance on mathematical reasoning in Catalan language |
| title_sort | llm performance on mathematical reasoning in catalan language |
| topic | Large language model Mathematical reasoning Benchmarks Kangaroo contest |
| url | http://www.sciencedirect.com/science/article/pii/S2590123025004475 |
| work_keys_str_mv | AT lamyaerhomrasi llmperformanceonmathematicalreasoningincatalanlanguage AT yusefahsini llmperformanceonmathematicalreasoningincatalanlanguage AT arnauigualdesaez llmperformanceonmathematicalreasoningincatalanlanguage AT ricardovinuesa llmperformanceonmathematicalreasoningincatalanlanguage AT sergiohoyas llmperformanceonmathematicalreasoningincatalanlanguage AT josepedrogarciasabater llmperformanceonmathematicalreasoningincatalanlanguage AT mariusjfullanaialfonso llmperformanceonmathematicalreasoningincatalanlanguage AT jalbertoconejero llmperformanceonmathematicalreasoningincatalanlanguage |