SLM-MATRIX: a multi-agent trajectory reasoning and verification framework for enhancing language models in materials data extraction

Abstract Small Language Models offer an efficient alternative for structured information extraction. We present SLM-MATRIX, a multi-path collaborative reasoning and verification framework based on SLMs, designed to extract material names, numerical values, and physical units from materials science l...

Full description

Saved in:
Bibliographic Details
Main Authors: Xin Li, Zhixuan Huang, Shu Quan, Cheng Peng, Xiaoming Ma
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:npj Computational Materials
Online Access:https://doi.org/10.1038/s41524-025-01719-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849331838812684288
author Xin Li
Zhixuan Huang
Shu Quan
Cheng Peng
Xiaoming Ma
author_facet Xin Li
Zhixuan Huang
Shu Quan
Cheng Peng
Xiaoming Ma
author_sort Xin Li
collection DOAJ
description Abstract Small Language Models offer an efficient alternative for structured information extraction. We present SLM-MATRIX, a multi-path collaborative reasoning and verification framework based on SLMs, designed to extract material names, numerical values, and physical units from materials science literature. The framework integrates three complementary reasoning paths: a multi-agent collaborative path, a generator–discriminator path, and a dual cross-verification path. SLM-MATRIX achieves an accuracy of 92.85% on the BulkModulus dataset and reaches 77.68% accuracy on the MatSynTriplet dataset, both outperforming conventional methods and single-path models. Moreover, experiments on general reasoning benchmarks such as GSM8K and SVAMP validate the framework’s strong generalization capability. Ablation studies evaluate the effects of agent number, Mixture-of-Agents (MoA) depth, and discriminator design on overall performance. Overall, SLM-MATRIX presents an effective approach for high-quality material information extraction in resource-constrained and offers new insights into structured scientific text understanding tasks.
format Article
id doaj-art-c18ab65eaff64bf7880e197751175db9
institution Kabale University
issn 2057-3960
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series npj Computational Materials
spelling doaj-art-c18ab65eaff64bf7880e197751175db92025-08-20T03:46:23ZengNature Portfolionpj Computational Materials2057-39602025-07-0111111710.1038/s41524-025-01719-xSLM-MATRIX: a multi-agent trajectory reasoning and verification framework for enhancing language models in materials data extractionXin Li0Zhixuan Huang1Shu Quan2Cheng Peng3Xiaoming Ma4Environmental Finance Lab, School of Environment and Energy, Peking University Shenzhen Graduate SchoolEnvironmental Finance Lab, School of Environment and Energy, Peking University Shenzhen Graduate SchoolEnvironmental Finance Lab, School of Environment and Energy, Peking University Shenzhen Graduate SchoolEnvironmental Finance Lab, School of Environment and Energy, Peking University Shenzhen Graduate SchoolEnvironmental Finance Lab, School of Environment and Energy, Peking University Shenzhen Graduate SchoolAbstract Small Language Models offer an efficient alternative for structured information extraction. We present SLM-MATRIX, a multi-path collaborative reasoning and verification framework based on SLMs, designed to extract material names, numerical values, and physical units from materials science literature. The framework integrates three complementary reasoning paths: a multi-agent collaborative path, a generator–discriminator path, and a dual cross-verification path. SLM-MATRIX achieves an accuracy of 92.85% on the BulkModulus dataset and reaches 77.68% accuracy on the MatSynTriplet dataset, both outperforming conventional methods and single-path models. Moreover, experiments on general reasoning benchmarks such as GSM8K and SVAMP validate the framework’s strong generalization capability. Ablation studies evaluate the effects of agent number, Mixture-of-Agents (MoA) depth, and discriminator design on overall performance. Overall, SLM-MATRIX presents an effective approach for high-quality material information extraction in resource-constrained and offers new insights into structured scientific text understanding tasks.https://doi.org/10.1038/s41524-025-01719-x
spellingShingle Xin Li
Zhixuan Huang
Shu Quan
Cheng Peng
Xiaoming Ma
SLM-MATRIX: a multi-agent trajectory reasoning and verification framework for enhancing language models in materials data extraction
npj Computational Materials
title SLM-MATRIX: a multi-agent trajectory reasoning and verification framework for enhancing language models in materials data extraction
title_full SLM-MATRIX: a multi-agent trajectory reasoning and verification framework for enhancing language models in materials data extraction
title_fullStr SLM-MATRIX: a multi-agent trajectory reasoning and verification framework for enhancing language models in materials data extraction
title_full_unstemmed SLM-MATRIX: a multi-agent trajectory reasoning and verification framework for enhancing language models in materials data extraction
title_short SLM-MATRIX: a multi-agent trajectory reasoning and verification framework for enhancing language models in materials data extraction
title_sort slm matrix a multi agent trajectory reasoning and verification framework for enhancing language models in materials data extraction
url https://doi.org/10.1038/s41524-025-01719-x
work_keys_str_mv AT xinli slmmatrixamultiagenttrajectoryreasoningandverificationframeworkforenhancinglanguagemodelsinmaterialsdataextraction
AT zhixuanhuang slmmatrixamultiagenttrajectoryreasoningandverificationframeworkforenhancinglanguagemodelsinmaterialsdataextraction
AT shuquan slmmatrixamultiagenttrajectoryreasoningandverificationframeworkforenhancinglanguagemodelsinmaterialsdataextraction
AT chengpeng slmmatrixamultiagenttrajectoryreasoningandverificationframeworkforenhancinglanguagemodelsinmaterialsdataextraction
AT xiaomingma slmmatrixamultiagenttrajectoryreasoningandverificationframeworkforenhancinglanguagemodelsinmaterialsdataextraction