Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models

In traffic safety analysis, previous research has often focused on tabular data or textual crash narratives in isolation, neglecting the potential benefits of a hybrid multimodal approach. This study introduces the Multimodal Data Fusion (MDF) framework, which fuses tabular data with textual narrati...

Full description

Saved in:
Bibliographic Details
Main Authors: Shadi Jaradat, Mohammed Elhenawy, Richi Nayak, Alexander Paz, Huthaifa I. Ashqar, Sebastien Glaser
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:AI
Subjects:
Online Access:https://www.mdpi.com/2673-2688/6/4/72
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849712236150390784
author Shadi Jaradat
Mohammed Elhenawy
Richi Nayak
Alexander Paz
Huthaifa I. Ashqar
Sebastien Glaser
author_facet Shadi Jaradat
Mohammed Elhenawy
Richi Nayak
Alexander Paz
Huthaifa I. Ashqar
Sebastien Glaser
author_sort Shadi Jaradat
collection DOAJ
description In traffic safety analysis, previous research has often focused on tabular data or textual crash narratives in isolation, neglecting the potential benefits of a hybrid multimodal approach. This study introduces the Multimodal Data Fusion (MDF) framework, which fuses tabular data with textual narratives by leveraging advanced Large Language Models (LLMs), such as GPT-2, GPT-3.5, and GPT-4.5, using zero-shot (ZS), few-shot (FS), and fine-tuning (FT) learning strategies. We employed few-shot learning with GPT-4.5 to generate new labels for traffic crash analysis, such as driver fault, driver actions, and crash factors, alongside the existing label for severity. Our methodology was tested on crash data from the Missouri State Highway Patrol, demonstrating significant improvements in model performance. GPT-2 (fine-tuned) was used as the baseline model, against which more advanced models were evaluated. GPT-4.5 few-shot learning achieved 98.9% accuracy for crash severity prediction and 98.1% accuracy for driver fault classification. In crash factor extraction, GPT-4.5 few-shot achieved the highest Jaccard score (82.9%), surpassing GPT-3.5 and fine-tuned GPT-2 models. Similarly, in driver actions extraction, GPT-4.5 few-shot attained a Jaccard score of 73.1%, while fine-tuned GPT-2 closely followed with 72.2%, demonstrating that task-specific fine-tuning can achieve performance close to state-of-the-art models when adapted to domain-specific data. These findings highlight the superior performance of GPT-4.5 few-shot learning, particularly in classification and information extraction tasks, while also underscoring the effectiveness of fine-tuning on domain-specific datasets to bridge performance gaps with more advanced models. The MDF framework’s success demonstrates its potential for broader applications beyond traffic crash analysis, particularly in domains where labeled data are scarce and predictive modeling is essential.
format Article
id doaj-art-614a9eac696e4a21904862a2f4ea8434
institution DOAJ
issn 2673-2688
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series AI
spelling doaj-art-614a9eac696e4a21904862a2f4ea84342025-08-20T03:14:20ZengMDPI AGAI2673-26882025-04-01647210.3390/ai6040072Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer ModelsShadi Jaradat0Mohammed Elhenawy1Richi Nayak2Alexander Paz3Huthaifa I. Ashqar4Sebastien Glaser5CARRS-Q, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4059, AustraliaCARRS-Q, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4059, AustraliaCentre for Data Science, Queensland University of Technology, Garden Point, Brisbane, QLD 4000, AustraliaSchool of Civil Engineering, Queensland University of Technology, Brisbane, QLD 4000, AustraliaCivil Engineering Department, Arab American University, Jenin P.O. Box 240, PalestineCARRS-Q, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4059, AustraliaIn traffic safety analysis, previous research has often focused on tabular data or textual crash narratives in isolation, neglecting the potential benefits of a hybrid multimodal approach. This study introduces the Multimodal Data Fusion (MDF) framework, which fuses tabular data with textual narratives by leveraging advanced Large Language Models (LLMs), such as GPT-2, GPT-3.5, and GPT-4.5, using zero-shot (ZS), few-shot (FS), and fine-tuning (FT) learning strategies. We employed few-shot learning with GPT-4.5 to generate new labels for traffic crash analysis, such as driver fault, driver actions, and crash factors, alongside the existing label for severity. Our methodology was tested on crash data from the Missouri State Highway Patrol, demonstrating significant improvements in model performance. GPT-2 (fine-tuned) was used as the baseline model, against which more advanced models were evaluated. GPT-4.5 few-shot learning achieved 98.9% accuracy for crash severity prediction and 98.1% accuracy for driver fault classification. In crash factor extraction, GPT-4.5 few-shot achieved the highest Jaccard score (82.9%), surpassing GPT-3.5 and fine-tuned GPT-2 models. Similarly, in driver actions extraction, GPT-4.5 few-shot attained a Jaccard score of 73.1%, while fine-tuned GPT-2 closely followed with 72.2%, demonstrating that task-specific fine-tuning can achieve performance close to state-of-the-art models when adapted to domain-specific data. These findings highlight the superior performance of GPT-4.5 few-shot learning, particularly in classification and information extraction tasks, while also underscoring the effectiveness of fine-tuning on domain-specific datasets to bridge performance gaps with more advanced models. The MDF framework’s success demonstrates its potential for broader applications beyond traffic crash analysis, particularly in domains where labeled data are scarce and predictive modeling is essential.https://www.mdpi.com/2673-2688/6/4/72Generative Pre-Trained Transformer (GPT)Large Language Model (LLM)traffic crash analysisfew-shot learningzero-shot learningMultimodal Data Fusion
spellingShingle Shadi Jaradat
Mohammed Elhenawy
Richi Nayak
Alexander Paz
Huthaifa I. Ashqar
Sebastien Glaser
Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models
AI
Generative Pre-Trained Transformer (GPT)
Large Language Model (LLM)
traffic crash analysis
few-shot learning
zero-shot learning
Multimodal Data Fusion
title Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models
title_full Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models
title_fullStr Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models
title_full_unstemmed Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models
title_short Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models
title_sort multimodal data fusion for tabular and textual data zero shot few shot and fine tuning of generative pre trained transformer models
topic Generative Pre-Trained Transformer (GPT)
Large Language Model (LLM)
traffic crash analysis
few-shot learning
zero-shot learning
Multimodal Data Fusion
url https://www.mdpi.com/2673-2688/6/4/72
work_keys_str_mv AT shadijaradat multimodaldatafusionfortabularandtextualdatazeroshotfewshotandfinetuningofgenerativepretrainedtransformermodels
AT mohammedelhenawy multimodaldatafusionfortabularandtextualdatazeroshotfewshotandfinetuningofgenerativepretrainedtransformermodels
AT richinayak multimodaldatafusionfortabularandtextualdatazeroshotfewshotandfinetuningofgenerativepretrainedtransformermodels
AT alexanderpaz multimodaldatafusionfortabularandtextualdatazeroshotfewshotandfinetuningofgenerativepretrainedtransformermodels
AT huthaifaiashqar multimodaldatafusionfortabularandtextualdatazeroshotfewshotandfinetuningofgenerativepretrainedtransformermodels
AT sebastienglaser multimodaldatafusionfortabularandtextualdatazeroshotfewshotandfinetuningofgenerativepretrainedtransformermodels