Beyond words: evaluating large language models in transportation planning
The rapid advancement of Generative Artificial Intelligence (GenAI) in 2023 has catalyzed transformative shifts across various industries, including urban transportation planning. This study evaluates the applicability of Large Language Models (LLMs) in transportation decision-making, focusing on tw...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Taylor & Francis Group
2025-04-01
|
| Series: | Geo-spatial Information Science |
| Subjects: | |
| Online Access: | https://www.tandfonline.com/doi/10.1080/10095020.2025.2493073 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850143203388293120 |
|---|---|
| author | Shaowei Ying Zhenlong Li Manzhu Yu |
| author_facet | Shaowei Ying Zhenlong Li Manzhu Yu |
| author_sort | Shaowei Ying |
| collection | DOAJ |
| description | The rapid advancement of Generative Artificial Intelligence (GenAI) in 2023 has catalyzed transformative shifts across various industries, including urban transportation planning. This study evaluates the applicability of Large Language Models (LLMs) in transportation decision-making, focusing on two hypotheses: (H1) out-of-the-box LLMs exhibit basic transportation knowledge and reasoning capabilities, enabling them to design and execute analytical workflows; and (H2) larger parameter models and fine-tuned models demonstrate superior accuracy and contextual understanding, outperforming smaller and general-purpose models. Using a three-level evaluation framework, we assessed GPT-4 and Phi-3-mini across (1) geospatial skills, (2) domain-specific transportation knowledge, and (3) real-world transport problem-solving in congestion pricing scenarios. Results confirm that while LLMs possess baseline geospatial and transportation reasoning abilities, their effectiveness varies by task complexity. GPT-4 outperformed Phi-3-mini across all evaluation levels, achieving 86% accuracy in GIS tasks, 81% in MATSim comprehension, and 91% in real-world transport decision support, while Phi-3-mini scored 43–72%. These findings highlight the advantages of larger models in structured decision-making tasks and their potential as analytical copilots for transportation planners. The study contributes to the ongoing scientific debate on the role of GenAI in transportation governance, reinforcing the need for fine-tuning and retrieval-augmented generation (RAG) to enhance LLM performance in structured analytics. Future research should explore newer LLMs, transport-specific fine-tuning, and hybrid AI architectures to improve AI-driven transportation planning and decision support. |
| format | Article |
| id | doaj-art-4b854cc5165043228cdf8fb3c2687bb5 |
| institution | OA Journals |
| issn | 1009-5020 1993-5153 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | Taylor & Francis Group |
| record_format | Article |
| series | Geo-spatial Information Science |
| spelling | doaj-art-4b854cc5165043228cdf8fb3c2687bb52025-08-20T02:28:47ZengTaylor & Francis GroupGeo-spatial Information Science1009-50201993-51532025-04-0112310.1080/10095020.2025.2493073Beyond words: evaluating large language models in transportation planningShaowei Ying0Zhenlong Li1Manzhu Yu2Geoinformation and Big Data Research Laboratory, Department of Geography, The Pennsylvania State University, University Park, PA, USAGeoinformation and Big Data Research Laboratory, Department of Geography, The Pennsylvania State University, University Park, PA, USAGeoinformation and Big Data Research Laboratory, Department of Geography, The Pennsylvania State University, University Park, PA, USAThe rapid advancement of Generative Artificial Intelligence (GenAI) in 2023 has catalyzed transformative shifts across various industries, including urban transportation planning. This study evaluates the applicability of Large Language Models (LLMs) in transportation decision-making, focusing on two hypotheses: (H1) out-of-the-box LLMs exhibit basic transportation knowledge and reasoning capabilities, enabling them to design and execute analytical workflows; and (H2) larger parameter models and fine-tuned models demonstrate superior accuracy and contextual understanding, outperforming smaller and general-purpose models. Using a three-level evaluation framework, we assessed GPT-4 and Phi-3-mini across (1) geospatial skills, (2) domain-specific transportation knowledge, and (3) real-world transport problem-solving in congestion pricing scenarios. Results confirm that while LLMs possess baseline geospatial and transportation reasoning abilities, their effectiveness varies by task complexity. GPT-4 outperformed Phi-3-mini across all evaluation levels, achieving 86% accuracy in GIS tasks, 81% in MATSim comprehension, and 91% in real-world transport decision support, while Phi-3-mini scored 43–72%. These findings highlight the advantages of larger models in structured decision-making tasks and their potential as analytical copilots for transportation planners. The study contributes to the ongoing scientific debate on the role of GenAI in transportation governance, reinforcing the need for fine-tuning and retrieval-augmented generation (RAG) to enhance LLM performance in structured analytics. Future research should explore newer LLMs, transport-specific fine-tuning, and hybrid AI architectures to improve AI-driven transportation planning and decision support.https://www.tandfonline.com/doi/10.1080/10095020.2025.2493073Large language models (LLM)transportation planninggeospatial AIgenerative AIcongestion pricing |
| spellingShingle | Shaowei Ying Zhenlong Li Manzhu Yu Beyond words: evaluating large language models in transportation planning Geo-spatial Information Science Large language models (LLM) transportation planning geospatial AI generative AI congestion pricing |
| title | Beyond words: evaluating large language models in transportation planning |
| title_full | Beyond words: evaluating large language models in transportation planning |
| title_fullStr | Beyond words: evaluating large language models in transportation planning |
| title_full_unstemmed | Beyond words: evaluating large language models in transportation planning |
| title_short | Beyond words: evaluating large language models in transportation planning |
| title_sort | beyond words evaluating large language models in transportation planning |
| topic | Large language models (LLM) transportation planning geospatial AI generative AI congestion pricing |
| url | https://www.tandfonline.com/doi/10.1080/10095020.2025.2493073 |
| work_keys_str_mv | AT shaoweiying beyondwordsevaluatinglargelanguagemodelsintransportationplanning AT zhenlongli beyondwordsevaluatinglargelanguagemodelsintransportationplanning AT manzhuyu beyondwordsevaluatinglargelanguagemodelsintransportationplanning |