Pyramidal Predictive Network V2: An Improved Predictive Architecture and Training Strategies for Future Perception Prediction

In this paper, we propose an improved version of the Pyramidal Predictive Network (PPNV2), a theoretical framework inspired by predictive coding, which addresses the limitations of its predecessor (PPNV1) in the task of future perception prediction. While PPNV1 employed a temporal pyramid architectu...

Full description

Saved in:
Bibliographic Details
Main Authors: Chaofan Ling, Junpei Zhong, Weihua Li, Ran Dong, Mingjun Dai
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/9/4/79
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we propose an improved version of the Pyramidal Predictive Network (PPNV2), a theoretical framework inspired by predictive coding, which addresses the limitations of its predecessor (PPNV1) in the task of future perception prediction. While PPNV1 employed a temporal pyramid architecture and demonstrated promising results, its innate signal processing led to aliasing in the prediction, restricting its application in robotic navigation. We analyze the signal dissemination and characteristic artifacts of PPNV1 and introduce architectural enhancements and training strategies to mitigate these issues. The improved architecture focuses on optimizing information dissemination and reducing aliasing in neural networks. We redesign the downsampling and upsampling components to enable the network to construct images more effectively from low-frequency-input Fourier features, replacing the simple concatenation of different inputs in the previous version. Furthermore, we refine the training strategies to alleviate input inconsistency during training and testing phases. The enhanced model exhibits increased interpretability, stronger prediction accuracy, and improved quality of predictions. The proposed PPNV2 offers a more robust and efficient approach to future video-frame prediction, overcoming the limitations of its predecessor and expanding its potential applications in various robotic domains, including pedestrian prediction, vehicle prediction, and navigation.
ISSN:2504-2289