Performance evaluation of GPT-4o on South Korean national exams for building mechanical equipment maintenance
Abstract This study evaluates the applicability of large language models (LLMs) in mechanical equipment maintenance in buildings by assessing GPT-4o’s performance on two national certification exams in South Korea: Engineer Energy Management (EEM) and Engineer Air-Conditioning Refrigerating Machiner...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-08-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-16118-x |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract This study evaluates the applicability of large language models (LLMs) in mechanical equipment maintenance in buildings by assessing GPT-4o’s performance on two national certification exams in South Korea: Engineer Energy Management (EEM) and Engineer Air-Conditioning Refrigerating Machinery (EACRM). GPT-4o achieved average scores of 80.6 and 81.25 on the EEM and EACRM exams, respectively, passing all five attempts. The model performed well on both non-calculation and calculation problems and demonstrated high consistency, with an average response consistency of 97%. Despite these strengths, three key limitations were identified: weak advanced reasoning, difficulty in solving legal questions, and poor interpretation of scientific figures. Experimental results indicate that advanced reasoning can be improved using reasoning-optimized models, while legal question accuracy can be significantly enhanced with retrieval-augmented generation (RAG). However, figure interpretation remains dependent on advancements in visual recognition capabilities. These findings suggest that GPT-4o possesses foundational knowledge applicable to mechanical equipment maintenance in buildings but also highlight the need to address certain limitations for practical implementation. This study provides a foundation for future research on integrating LLMs into industrial applications, such as maintenance management software, to enhance maintenance efficiency and address workforce shortages. |
|---|---|
| ISSN: | 2045-2322 |