RecompGPT: Generative Pre-Trained Transformers-Assisted Interactive Human Gaze Pattern Learning and Distribution Modeling for Scene Recomposition

We introduce a cutting-edge, GPT-assisted approach to optimizing visual scene retargeting by learning human gaze behaviors. RecompGPT leverages the power of Generative GPT to model human gaze dynamics, providing an advanced mechanism for intelligent and interactive image reconstruction. The framewor...

Full description

Saved in:
Bibliographic Details
Main Authors: Wang Shang, Nassiriah Binti Shaari, Nur Sauri Bin Yahaya, Liu Hao
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11078273/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849719453488513024
author Wang Shang
Nassiriah Binti Shaari
Nur Sauri Bin Yahaya
Liu Hao
author_facet Wang Shang
Nassiriah Binti Shaari
Nur Sauri Bin Yahaya
Liu Hao
author_sort Wang Shang
collection DOAJ
description We introduce a cutting-edge, GPT-assisted approach to optimizing visual scene retargeting by learning human gaze behaviors. RecompGPT leverages the power of Generative GPT to model human gaze dynamics, providing an advanced mechanism for intelligent and interactive image reconstruction. The framework incorporates a hierarchical structure, utilizing the BING objectness metric to identify key visual elements in exhibits. Through the integration of multimodal data, RecompGPT employs a novel Locality-Preserved and Observer-Like Active Learning (LOAL) strategy to incrementally generate Gaze Shift Paths (GSPs). LOAL is an active-learning algorithm that selects multiple representative patches from each scene image. It simultaneously preserves the local distribution of image patches and chooses representative (or visually salient) regions in a way that mimics human gaze allocation. Then, we deploy GPT to learn the distribution of the initial human gaze fixation toward different sceneries. Afterward, these GSPs are refined via a multi-layer aggregation algorithm that encodes deep feature representations into a Gaussian Mixture Model (GMM) to model the distribution of human gaze patterns. The learned GMM guides the scene recomposition process by maximizing the posterior estimation. Empirical evaluations, including interactive user studies, demonstrate RecompGPT’s superiority over existing methods, achieving a 4.39% improvement in precision and reducing testing time by 50%. RecompGPT harmonizes cutting-edge algorithmic efficiency with human-centered aesthetics, pushing the boundaries of AI-driven scene analysis and visual recomposition. It enhances the interactivity and immersion of virtual reality, offering a next-generation approach to interactive scene recomposition that aligns with human gaze patterns and cognitive preferences.
format Article
id doaj-art-af262376e2cc47469b399c3c3ff6d573
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-af262376e2cc47469b399c3c3ff6d5732025-08-20T03:12:09ZengIEEEIEEE Access2169-35362025-01-011312136212137710.1109/ACCESS.2025.358822111078273RecompGPT: Generative Pre-Trained Transformers-Assisted Interactive Human Gaze Pattern Learning and Distribution Modeling for Scene RecompositionWang Shang0Nassiriah Binti Shaari1Nur Sauri Bin Yahaya2Liu Hao3https://orcid.org/0009-0005-5003-9039School of Multimedia Technologies and Communication, University Utara Malaysia, Sintok, MalaysiaSchool of Multimedia Technologies and Communication, University Utara Malaysia, Sintok, MalaysiaSchool of Multimedia Technologies and Communication, University Utara Malaysia, Sintok, MalaysiaDepartment of Medical Genetics, College of Basic Medicine and Forensic Medicine, Henan University of Science and Technology, Luoyang, ChinaWe introduce a cutting-edge, GPT-assisted approach to optimizing visual scene retargeting by learning human gaze behaviors. RecompGPT leverages the power of Generative GPT to model human gaze dynamics, providing an advanced mechanism for intelligent and interactive image reconstruction. The framework incorporates a hierarchical structure, utilizing the BING objectness metric to identify key visual elements in exhibits. Through the integration of multimodal data, RecompGPT employs a novel Locality-Preserved and Observer-Like Active Learning (LOAL) strategy to incrementally generate Gaze Shift Paths (GSPs). LOAL is an active-learning algorithm that selects multiple representative patches from each scene image. It simultaneously preserves the local distribution of image patches and chooses representative (or visually salient) regions in a way that mimics human gaze allocation. Then, we deploy GPT to learn the distribution of the initial human gaze fixation toward different sceneries. Afterward, these GSPs are refined via a multi-layer aggregation algorithm that encodes deep feature representations into a Gaussian Mixture Model (GMM) to model the distribution of human gaze patterns. The learned GMM guides the scene recomposition process by maximizing the posterior estimation. Empirical evaluations, including interactive user studies, demonstrate RecompGPT’s superiority over existing methods, achieving a 4.39% improvement in precision and reducing testing time by 50%. RecompGPT harmonizes cutting-edge algorithmic efficiency with human-centered aesthetics, pushing the boundaries of AI-driven scene analysis and visual recomposition. It enhances the interactivity and immersion of virtual reality, offering a next-generation approach to interactive scene recomposition that aligns with human gaze patterns and cognitive preferences.https://ieeexplore.ieee.org/document/11078273/RecompGPThuman gaze behaviorGSPrecompositionobserver-like
spellingShingle Wang Shang
Nassiriah Binti Shaari
Nur Sauri Bin Yahaya
Liu Hao
RecompGPT: Generative Pre-Trained Transformers-Assisted Interactive Human Gaze Pattern Learning and Distribution Modeling for Scene Recomposition
IEEE Access
RecompGPT
human gaze behavior
GSP
recomposition
observer-like
title RecompGPT: Generative Pre-Trained Transformers-Assisted Interactive Human Gaze Pattern Learning and Distribution Modeling for Scene Recomposition
title_full RecompGPT: Generative Pre-Trained Transformers-Assisted Interactive Human Gaze Pattern Learning and Distribution Modeling for Scene Recomposition
title_fullStr RecompGPT: Generative Pre-Trained Transformers-Assisted Interactive Human Gaze Pattern Learning and Distribution Modeling for Scene Recomposition
title_full_unstemmed RecompGPT: Generative Pre-Trained Transformers-Assisted Interactive Human Gaze Pattern Learning and Distribution Modeling for Scene Recomposition
title_short RecompGPT: Generative Pre-Trained Transformers-Assisted Interactive Human Gaze Pattern Learning and Distribution Modeling for Scene Recomposition
title_sort recompgpt generative pre trained transformers assisted interactive human gaze pattern learning and distribution modeling for scene recomposition
topic RecompGPT
human gaze behavior
GSP
recomposition
observer-like
url https://ieeexplore.ieee.org/document/11078273/
work_keys_str_mv AT wangshang recompgptgenerativepretrainedtransformersassistedinteractivehumangazepatternlearninganddistributionmodelingforscenerecomposition
AT nassiriahbintishaari recompgptgenerativepretrainedtransformersassistedinteractivehumangazepatternlearninganddistributionmodelingforscenerecomposition
AT nursauribinyahaya recompgptgenerativepretrainedtransformersassistedinteractivehumangazepatternlearninganddistributionmodelingforscenerecomposition
AT liuhao recompgptgenerativepretrainedtransformersassistedinteractivehumangazepatternlearninganddistributionmodelingforscenerecomposition