Apvit: ViT with adaptive patches for scene text recognition

Apvit: ViT with adaptive patches for scene text recognition

Abstract Scene texts in nature exhibit varied colors, which serve as a significant distinguishing feature that effectively suppresses background interference. In this study, color clustering is utilized as a prior guide to group patches, enhancing their spatial relationships. Additionally, patch siz...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ning Zhang, Ce Li, Zongshun Wang, Jialin Ma, Zhiqiang Feng
Format:	Article
Language:	English
Published:	Springer 2025-03-01
Series:	Discover Applied Sciences
Subjects:	Adaptive patches ViTs Scene text recognition Prune
Online Access:	https://doi.org/10.1007/s42452-025-06570-9
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Interpretable Deep Learning for Diabetic Retinopathy: A Comparative Study of CNN, ViT, and Hybrid Architectures
by: Weijie Zhang, et al.
Published: (2025-05-01)

A New Pes Planus Automatic Diagnosis Method: ViT-OELM Hybrid Modeling
by: Derya Avcı
Published: (2025-03-01)

Turkish scene text recognition: Introducing extensive real and synthetic datasets and a novel recognition model
by: Serdar Yıldız
Published: (2024-12-01)

Tumor ViT-GRU-XAI: Advanced Brain Tumor Diagnosis Framework: Vision Transformer and GRU Integration for Improved MRI Analysis: A Case Study of Egypt
by: Mohammed Aly, et al.
Published: (2024-01-01)

ViT-RoT: Vision Transformer-Based Robust Framework for Tomato Leaf Disease Recognition
by: Sathiyamohan Nishankar, et al.
Published: (2025-06-01)

Cascaded Dual-Inpainting Network for Scene Text
by: Chunmei Liu
Published: (2025-07-01)

An Investigation on Prediction of Infrastructure Asset Defect with CNN and ViT Algorithms
by: Nam Lethanh, et al.
Published: (2025-05-01)

ViT-Based Face Diagnosis Images Analysis for Schizophrenia Detection
by: Huilin Liu, et al.
Published: (2024-12-01)

Unlocking the Potential of XAI for Improved Alzheimer’s Disease Detection and Classification Using a ViT-GRU Model
by: S. M. Mahim, et al.
Published: (2024-01-01)

A Robust Hybrid CNN+ViT Framework for Breast Cancer Classification Using Mammogram Images
by: Vasudha Rani Patheda, et al.
Published: (2025-01-01)

Text Font Correction and Alignment Method for Scene Text Recognition
by: Liuxu Ding, et al.
Published: (2024-12-01)

Binary and Multi-Class Classification of Colorectal Polyps Using CRP-ViT: A Comparative Study Between CNNs and QNNs
by: Jothiraj Selvaraj, et al.
Published: (2025-07-01)

A New Hybrid ConvViT Model for Dangerous Farm Insect Detection
by: Anil Utku, et al.
Published: (2025-02-01)

GIVTED-Net: GhostNet-Mobile Involution ViT Encoder-Decoder Network for Lightweight Medical Image Segmentation
by: Resha Dwika Hefni Al-Fahsi, et al.
Published: (2024-01-01)

Progressive Pruning of Light Dehaze Networks for Static Scenes
by: Byeongseon Park, et al.
Published: (2024-11-01)

MalFormer: A Novel Vision Transformer Model for Robust Malware Analysis
by: In-Woong Jeong, et al.
Published: (2025-01-01)

Empowering Efficient Spatio-Temporal Learning with a 3D CNN for Pose-Based Action Recognition
by: Ziliang Ren, et al.
Published: (2024-11-01)

An Ensemble Deep Learning Approach for Accurate Urinary Sediment Detection Using YOLOv9e and KD-YOLOX-ViT
by: Mansura Naznine, et al.
Published: (2025-01-01)

CNN-ViT: A multi-feature learning based approach for driver drowsiness detection
by: Madduri Venkateswarlu, et al.
Published: (2025-09-01)

Hangul Character Recognition of A New Hangul Dataset with Vision Transformers Model
by: Aurelia Shana, et al.
Published: (2024-12-01)

MAS-PD: Transferable Adversarial Attack Against Vision-Transformers-Based SAR Image Classification Task
by: Boshi Zheng, et al.
Published: (2025-01-01)

ACTFormer: A Transformer Network With Attention and Convolutional Synergy for Remote Sensing Scene Classification
by: Chao Xie, et al.
Published: (2025-01-01)

WindDefNet: A Multi-Scale Attention-Enhanced ViT-Inception-ResNet Model for Real-Time Wind Turbine Blade Defect Detection
by: Majad Mansoor, et al.
Published: (2025-05-01)

KSTRV1: A scene text recognition dataset for central Kurdish in (Arabic-Based) scriptZenodo
by: Sardar Omar Salih, et al.
Published: (2025-06-01)

A Hybrid VGG16‐ViT Approach With Image Processing Techniques for Improved White Blood Cell Classification and Disease Diagnosis: A Retrospective Study
by: Md Shahin Ali, et al.
Published: (2025-06-01)

LFEN: A language feature enhanced network for scene text recognition
by: Hui Chen, et al.
Published: (2025-01-01)

Leveraging text semantics for enhanced scene text image super-resolution
by: Li Chen, et al.
Published: (2025-06-01)

MAPE-ViT: multimodal scene understanding with novel wavelet-augmented Vision Transformer
by: Muhammad Waqas Ahmed, et al.
Published: (2025-05-01)

ViSwNeXtNet Deep Patch-Wise Ensemble of Vision Transformers and ConvNeXt for Robust Binary Histopathology Classification
by: Özgen Arslan Solmaz, et al.
Published: (2025-06-01)

Human Body Segmentation in Wide-Angle Images Based on Fast Vision Transformers
by: Xiao Yu, et al.
Published: (2024-01-01)

ViT-ISRGAN: A High-Quality Super-Resolution Reconstruction Method for Multispectral Remote Sensing Images
by: Yifeng Yang, et al.
Published: (2025-01-01)

Two-Stage Target Detection for Compact HFSWR With Space-to-Depth YOLOv8 and Multiframe ViT
by: Tong Wu, et al.
Published: (2025-01-01)

Advanced deep learning architectures for enhanced mammography classification: a comparative study of CNNs and ViT
by: Shubhi Sharma, et al.
Published: (2025-07-01)

Improving BI-RADS Mammographic Classification With Self-Supervised Vision Transformers and Cascade Learning
by: Abdelrahman Abdallah, et al.
Published: (2025-01-01)

ViT-Based Classification and Self-Supervised 3D Human Mesh Generation from NIR Single-Pixel Imaging
by: Carlos Osorio Quero, et al.
Published: (2025-05-01)

Scene Text Recognition That Eliminates Background and Character Noise Interference
by: Shancheng Tang, et al.
Published: (2025-03-01)

GaitTriViT and GaitVViT: Transformer-based methods emphasizing spatial or temporal aspects in gait recognition
by: Hongyun Sheng
Published: (2025-08-01)

CN2VF-Net: A Hybrid Convolutional Neural Network and Vision Transformer Framework for Multi-Scale Fire Detection in Complex Environments
by: Naveed Ahmad, et al.
Published: (2025-05-01)

Head and Hands Tunneling Pipeline for Enhancing Sign Language Recognition
by: Ganzorig Batnasan, et al.
Published: (2025-01-01)

BinaryViT: Binary Vision Transformer for Hyperspectral Image Classification
by: Xiang Hu, et al.
Published: (2025-01-01)