LLM performance on mathematical reasoning in Catalan language

Large Language Models (LLMs) have revolutionized Artificial Intelligence (AI), with OpenAI's reasoning models and, more recently, DeepSeek reshaping the landscape. Despite AI being a strategic priority in Europe, the region lags behind global leaders. Spain's ALIA initiative, trained in Sp...

Full description

Saved in:
Bibliographic Details
Main Authors: Lamyae Rhomrasi, Yusef Ahsini, Arnau Igualde-Sáez, Ricardo Vinuesa, Sergio Hoyas, José Pedro García-Sabater, Màrius J. Fullana-i-Alfonso, J. Alberto Conejero
Format: Article
Language:English
Published: Elsevier 2025-03-01
Series:Results in Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590123025004475
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850084850240847872
author Lamyae Rhomrasi
Yusef Ahsini
Arnau Igualde-Sáez
Ricardo Vinuesa
Sergio Hoyas
José Pedro García-Sabater
Màrius J. Fullana-i-Alfonso
J. Alberto Conejero
author_facet Lamyae Rhomrasi
Yusef Ahsini
Arnau Igualde-Sáez
Ricardo Vinuesa
Sergio Hoyas
José Pedro García-Sabater
Màrius J. Fullana-i-Alfonso
J. Alberto Conejero
author_sort Lamyae Rhomrasi
collection DOAJ
description Large Language Models (LLMs) have revolutionized Artificial Intelligence (AI), with OpenAI's reasoning models and, more recently, DeepSeek reshaping the landscape. Despite AI being a strategic priority in Europe, the region lags behind global leaders. Spain's ALIA initiative, trained in Spanish and Catalan, seeks to bridge this gap. We assess ALIA and DeepSeek's performance against top LLMs using a dataset of high-school-level mathematical problems in Catalan from the Kangaroo Mathematics Competition. These exams are multiple-choice, with five options. We compiled each LLM's solution and the reasoning behind their answers. The results indicate that ALIA underperforms compared to all other evaluated LLMs, scoring worse than random guessing. Furthermore, it frequently failed to provide complete reasoning, while models like DeepSeek achieved up to 96% accuracy. Open-source LLMs are as powerful as closed ones for this task. These findings underscore challenges in European AI competitiveness and highlight the need to distill knowledge from large models into smaller, more efficient ones for specialized applications.
format Article
id doaj-art-0b218c2621cb4d6f867c8e4d01866a23
institution DOAJ
issn 2590-1230
language English
publishDate 2025-03-01
publisher Elsevier
record_format Article
series Results in Engineering
spelling doaj-art-0b218c2621cb4d6f867c8e4d01866a232025-08-20T02:43:54ZengElsevierResults in Engineering2590-12302025-03-012510436610.1016/j.rineng.2025.104366LLM performance on mathematical reasoning in Catalan languageLamyae Rhomrasi0Yusef Ahsini1Arnau Igualde-Sáez2Ricardo Vinuesa3Sergio Hoyas4José Pedro García-Sabater5Màrius J. Fullana-i-Alfonso6J. Alberto Conejero7Instituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València, 46022, Valencia, SpainInstituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València, 46022, Valencia, SpainGrupo de Investigación en Reingeniería, Organización, trabajo en Grupo y Logística Empresarial - ROGLE, Universitat Politècnica de València, 46022, Valencia, SpainFLOW, Engineering Mechanics, KTH Royal Institute of Technology, SE-100 44, Stockholm, SwedenInstituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València, 46022, Valencia, SpainGrupo de Investigación en Reingeniería, Organización, trabajo en Grupo y Logística Empresarial - ROGLE, Universitat Politècnica de València, 46022, Valencia, SpainInstitut Universitari de Matemàtica Multidisciplinària, Universitat Politècnica de València, 46022, Valencia, Spain; Corresponding author.Instituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València, 46022, Valencia, SpainLarge Language Models (LLMs) have revolutionized Artificial Intelligence (AI), with OpenAI's reasoning models and, more recently, DeepSeek reshaping the landscape. Despite AI being a strategic priority in Europe, the region lags behind global leaders. Spain's ALIA initiative, trained in Spanish and Catalan, seeks to bridge this gap. We assess ALIA and DeepSeek's performance against top LLMs using a dataset of high-school-level mathematical problems in Catalan from the Kangaroo Mathematics Competition. These exams are multiple-choice, with five options. We compiled each LLM's solution and the reasoning behind their answers. The results indicate that ALIA underperforms compared to all other evaluated LLMs, scoring worse than random guessing. Furthermore, it frequently failed to provide complete reasoning, while models like DeepSeek achieved up to 96% accuracy. Open-source LLMs are as powerful as closed ones for this task. These findings underscore challenges in European AI competitiveness and highlight the need to distill knowledge from large models into smaller, more efficient ones for specialized applications.http://www.sciencedirect.com/science/article/pii/S2590123025004475Large language modelMathematical reasoningBenchmarksKangaroo contest
spellingShingle Lamyae Rhomrasi
Yusef Ahsini
Arnau Igualde-Sáez
Ricardo Vinuesa
Sergio Hoyas
José Pedro García-Sabater
Màrius J. Fullana-i-Alfonso
J. Alberto Conejero
LLM performance on mathematical reasoning in Catalan language
Results in Engineering
Large language model
Mathematical reasoning
Benchmarks
Kangaroo contest
title LLM performance on mathematical reasoning in Catalan language
title_full LLM performance on mathematical reasoning in Catalan language
title_fullStr LLM performance on mathematical reasoning in Catalan language
title_full_unstemmed LLM performance on mathematical reasoning in Catalan language
title_short LLM performance on mathematical reasoning in Catalan language
title_sort llm performance on mathematical reasoning in catalan language
topic Large language model
Mathematical reasoning
Benchmarks
Kangaroo contest
url http://www.sciencedirect.com/science/article/pii/S2590123025004475
work_keys_str_mv AT lamyaerhomrasi llmperformanceonmathematicalreasoningincatalanlanguage
AT yusefahsini llmperformanceonmathematicalreasoningincatalanlanguage
AT arnauigualdesaez llmperformanceonmathematicalreasoningincatalanlanguage
AT ricardovinuesa llmperformanceonmathematicalreasoningincatalanlanguage
AT sergiohoyas llmperformanceonmathematicalreasoningincatalanlanguage
AT josepedrogarciasabater llmperformanceonmathematicalreasoningincatalanlanguage
AT mariusjfullanaialfonso llmperformanceonmathematicalreasoningincatalanlanguage
AT jalbertoconejero llmperformanceonmathematicalreasoningincatalanlanguage