MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

Abstract Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. To support rigorous evalua...

Full description

Saved in:
Bibliographic Details
Main Authors: Meng Fang, Xiangpeng Wan, Fei Lu, Fei Xing, Kai Zou
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-05283-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849767040270729216
author Meng Fang
Xiangpeng Wan
Fei Lu
Fei Xing
Kai Zou
author_facet Meng Fang
Xiangpeng Wan
Fei Lu
Fei Xing
Kai Zou
author_sort Meng Fang
collection DOAJ
description Abstract Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. To support rigorous evaluation of mathematical reasoning in LLMs, we introduce the “MathOdyssey” dataset - a curated collection of 387 expert-generated mathematical problems spanning high school, university, and Olympiad-level topics. Each problem is accompanied by a detailed solution and categorized by difficulty level, subject area, and answer type. The dataset was developed through a rigorous multi-stage process involving contributions from subject experts, peer review, and standardized formatting. We provide detailed metadata and a standardized schema to facilitate consistent use in downstream applications. To demonstrate the dataset’s utility, we evaluate several representative LLMs and report their performance across problem types. We release MathOdyssey as an open-access resource to enable reproducible and fine-grained assessment of mathematical capabilities in LLMs and to foster further research in mathematical reasoning and education.
format Article
id doaj-art-07c2e740654e4ec48ff1dc01a4eab489
institution DOAJ
issn 2052-4463
language English
publishDate 2025-08-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-07c2e740654e4ec48ff1dc01a4eab4892025-08-20T03:04:22ZengNature PortfolioScientific Data2052-44632025-08-011211810.1038/s41597-025-05283-3MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataMeng Fang0Xiangpeng Wan1Fei Lu2Fei Xing3Kai Zou4Department of Computer Science, University of LiverpoolNetMind.AIDepartment of Mathematics, Johns Hopkins UniversityMathematica Policy ResearchNetMind.AIAbstract Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. To support rigorous evaluation of mathematical reasoning in LLMs, we introduce the “MathOdyssey” dataset - a curated collection of 387 expert-generated mathematical problems spanning high school, university, and Olympiad-level topics. Each problem is accompanied by a detailed solution and categorized by difficulty level, subject area, and answer type. The dataset was developed through a rigorous multi-stage process involving contributions from subject experts, peer review, and standardized formatting. We provide detailed metadata and a standardized schema to facilitate consistent use in downstream applications. To demonstrate the dataset’s utility, we evaluate several representative LLMs and report their performance across problem types. We release MathOdyssey as an open-access resource to enable reproducible and fine-grained assessment of mathematical capabilities in LLMs and to foster further research in mathematical reasoning and education.https://doi.org/10.1038/s41597-025-05283-3
spellingShingle Meng Fang
Xiangpeng Wan
Fei Lu
Fei Xing
Kai Zou
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
Scientific Data
title MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
title_full MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
title_fullStr MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
title_full_unstemmed MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
title_short MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
title_sort mathodyssey benchmarking mathematical problem solving skills in large language models using odyssey math data
url https://doi.org/10.1038/s41597-025-05283-3
work_keys_str_mv AT mengfang mathodysseybenchmarkingmathematicalproblemsolvingskillsinlargelanguagemodelsusingodysseymathdata
AT xiangpengwan mathodysseybenchmarkingmathematicalproblemsolvingskillsinlargelanguagemodelsusingodysseymathdata
AT feilu mathodysseybenchmarkingmathematicalproblemsolvingskillsinlargelanguagemodelsusingodysseymathdata
AT feixing mathodysseybenchmarkingmathematicalproblemsolvingskillsinlargelanguagemodelsusingodysseymathdata
AT kaizou mathodysseybenchmarkingmathematicalproblemsolvingskillsinlargelanguagemodelsusingodysseymathdata