Spatiotemporal Feature Enhancement for Lip-Reading: A Survey

Lip-reading, a crucial technique to recognize human lip movement patterns for semantic output, has gained increasing attention due to its broad applications in public safety, healthcare, the military, and entertainment. Spatiotemporal feature enhancement techniques have played a significant role in...

Full description

Saved in:
Bibliographic Details
Main Authors: Yinuo Ma, Xiao Sun
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/8/4142
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Lip-reading, a crucial technique to recognize human lip movement patterns for semantic output, has gained increasing attention due to its broad applications in public safety, healthcare, the military, and entertainment. Spatiotemporal feature enhancement techniques have played a significant role in advancing lip-reading research in deep learning. This paper presents a comprehensive review of the latest advancements in methods for lip-reading by exploring key properties of diversity enhancement techniques, involving spatial features, spatiotemporal convolution, attention mechanisms, pulse features, audio-visual features, and so on. Furthermore, according to different network structures, the six spatiotemporal feature enhancement method for lip-reading is offered. And each spatiotemporal feature enhancement method was divided into different subclasses based on the differences in the architecture structure, feature attributes, and application types. Ultimately, this is followed by an in-depth discussion of state-of-the-art spatiotemporal feature enhancement methods, accompanied by an analysis of the challenges and limitations faced, and a discussion of future research directions. From different views, this comprehensive review reveals the limitations and intrinsic disparities among these techniques in different categories for scholars to embark on innovative paths in the advancement of lip-reading.
ISSN:2076-3417