Beyond words: evaluating large language models in transportation planning

The rapid advancement of Generative Artificial Intelligence (GenAI) in 2023 has catalyzed transformative shifts across various industries, including urban transportation planning. This study evaluates the applicability of Large Language Models (LLMs) in transportation decision-making, focusing on tw...

Full description

Saved in:
Bibliographic Details
Main Authors: Shaowei Ying, Zhenlong Li, Manzhu Yu
Format: Article
Language:English
Published: Taylor & Francis Group 2025-04-01
Series:Geo-spatial Information Science
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/10095020.2025.2493073
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850143203388293120
author Shaowei Ying
Zhenlong Li
Manzhu Yu
author_facet Shaowei Ying
Zhenlong Li
Manzhu Yu
author_sort Shaowei Ying
collection DOAJ
description The rapid advancement of Generative Artificial Intelligence (GenAI) in 2023 has catalyzed transformative shifts across various industries, including urban transportation planning. This study evaluates the applicability of Large Language Models (LLMs) in transportation decision-making, focusing on two hypotheses: (H1) out-of-the-box LLMs exhibit basic transportation knowledge and reasoning capabilities, enabling them to design and execute analytical workflows; and (H2) larger parameter models and fine-tuned models demonstrate superior accuracy and contextual understanding, outperforming smaller and general-purpose models. Using a three-level evaluation framework, we assessed GPT-4 and Phi-3-mini across (1) geospatial skills, (2) domain-specific transportation knowledge, and (3) real-world transport problem-solving in congestion pricing scenarios. Results confirm that while LLMs possess baseline geospatial and transportation reasoning abilities, their effectiveness varies by task complexity. GPT-4 outperformed Phi-3-mini across all evaluation levels, achieving 86% accuracy in GIS tasks, 81% in MATSim comprehension, and 91% in real-world transport decision support, while Phi-3-mini scored 43–72%. These findings highlight the advantages of larger models in structured decision-making tasks and their potential as analytical copilots for transportation planners. The study contributes to the ongoing scientific debate on the role of GenAI in transportation governance, reinforcing the need for fine-tuning and retrieval-augmented generation (RAG) to enhance LLM performance in structured analytics. Future research should explore newer LLMs, transport-specific fine-tuning, and hybrid AI architectures to improve AI-driven transportation planning and decision support.
format Article
id doaj-art-4b854cc5165043228cdf8fb3c2687bb5
institution OA Journals
issn 1009-5020
1993-5153
language English
publishDate 2025-04-01
publisher Taylor & Francis Group
record_format Article
series Geo-spatial Information Science
spelling doaj-art-4b854cc5165043228cdf8fb3c2687bb52025-08-20T02:28:47ZengTaylor & Francis GroupGeo-spatial Information Science1009-50201993-51532025-04-0112310.1080/10095020.2025.2493073Beyond words: evaluating large language models in transportation planningShaowei Ying0Zhenlong Li1Manzhu Yu2Geoinformation and Big Data Research Laboratory, Department of Geography, The Pennsylvania State University, University Park, PA, USAGeoinformation and Big Data Research Laboratory, Department of Geography, The Pennsylvania State University, University Park, PA, USAGeoinformation and Big Data Research Laboratory, Department of Geography, The Pennsylvania State University, University Park, PA, USAThe rapid advancement of Generative Artificial Intelligence (GenAI) in 2023 has catalyzed transformative shifts across various industries, including urban transportation planning. This study evaluates the applicability of Large Language Models (LLMs) in transportation decision-making, focusing on two hypotheses: (H1) out-of-the-box LLMs exhibit basic transportation knowledge and reasoning capabilities, enabling them to design and execute analytical workflows; and (H2) larger parameter models and fine-tuned models demonstrate superior accuracy and contextual understanding, outperforming smaller and general-purpose models. Using a three-level evaluation framework, we assessed GPT-4 and Phi-3-mini across (1) geospatial skills, (2) domain-specific transportation knowledge, and (3) real-world transport problem-solving in congestion pricing scenarios. Results confirm that while LLMs possess baseline geospatial and transportation reasoning abilities, their effectiveness varies by task complexity. GPT-4 outperformed Phi-3-mini across all evaluation levels, achieving 86% accuracy in GIS tasks, 81% in MATSim comprehension, and 91% in real-world transport decision support, while Phi-3-mini scored 43–72%. These findings highlight the advantages of larger models in structured decision-making tasks and their potential as analytical copilots for transportation planners. The study contributes to the ongoing scientific debate on the role of GenAI in transportation governance, reinforcing the need for fine-tuning and retrieval-augmented generation (RAG) to enhance LLM performance in structured analytics. Future research should explore newer LLMs, transport-specific fine-tuning, and hybrid AI architectures to improve AI-driven transportation planning and decision support.https://www.tandfonline.com/doi/10.1080/10095020.2025.2493073Large language models (LLM)transportation planninggeospatial AIgenerative AIcongestion pricing
spellingShingle Shaowei Ying
Zhenlong Li
Manzhu Yu
Beyond words: evaluating large language models in transportation planning
Geo-spatial Information Science
Large language models (LLM)
transportation planning
geospatial AI
generative AI
congestion pricing
title Beyond words: evaluating large language models in transportation planning
title_full Beyond words: evaluating large language models in transportation planning
title_fullStr Beyond words: evaluating large language models in transportation planning
title_full_unstemmed Beyond words: evaluating large language models in transportation planning
title_short Beyond words: evaluating large language models in transportation planning
title_sort beyond words evaluating large language models in transportation planning
topic Large language models (LLM)
transportation planning
geospatial AI
generative AI
congestion pricing
url https://www.tandfonline.com/doi/10.1080/10095020.2025.2493073
work_keys_str_mv AT shaoweiying beyondwordsevaluatinglargelanguagemodelsintransportationplanning
AT zhenlongli beyondwordsevaluatinglargelanguagemodelsintransportationplanning
AT manzhuyu beyondwordsevaluatinglargelanguagemodelsintransportationplanning