Early and Late Fusion for Multimodal Aggression Prediction in Dementia Patients: A Comparative Analysis

Aggression in patients with dementia poses significant caregiving and clinical issues. In this work, fusion approaches—Early Fusion and Late Fusion—were compared to classify aggression using audio and visual signals. Early Fusion integrates the extracted features of the two modalities into one datas...

Full description

Saved in:
Bibliographic Details
Main Authors: Ioannis Galanakis, Rigas Filippos Soldatos, Nikitas Karanikolas, Athanasios Voulodimos, Ioannis Voyiatzis, Maria Samarakou
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/11/5823
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Aggression in patients with dementia poses significant caregiving and clinical issues. In this work, fusion approaches—Early Fusion and Late Fusion—were compared to classify aggression using audio and visual signals. Early Fusion integrates the extracted features of the two modalities into one dataset before classification, while Late Fusion integrates the prediction probabilities of standalone audio and visual classifiers with a meta-classifier. Both models were tested using a Random Forest classifier with five-fold cross-validation, and the performance was compared on accuracy, precision, recall, F1-score, ROC-AUC, and inference time. The results showcase that Late Fusion is superior to Early Fusion in terms of accuracy (0.876 vs. 0.828), recall (0.914 vs. 0.818), F1-score (0.867 vs. 0.835), and ROC-AUC score (0.970 vs. 0.922), proving more suitable for high-sensitivity use cases like healthcare and security. However, Early Fusion exhibited higher precision (0.852 vs. 0.824), indicating that in cases when false positives are a requirement, Early Fusion is preferable. Paired <i>t</i>-tests were applied for statistical comparison and indicate that precision alone is significantly different, with the advantage of Early Fusion. Late Fusion also performs slightly less in inference time, which makes it suitable for use in real-time systems. These findings provide significant information on multimodal fusion strategies and their applicability in the detection of aggressive behavior, which can contribute to the development of efficient monitoring systems for dementia care.
ISSN:2076-3417