Showing 1 - 2 results of 2 for search 'shifted (patch OR batch) tokenization', query time: 0.07s Refine Results
  1. 1
  2. 2

    Visual Automatic Localization Method Based on Multi-level Video Transformer by Qiping ZOU, Botao LI, Saian CHEN, Xi GUO, Taohong ZHANG

    Published 2024-11-01
    “…This approach divides the original video data into token sequences across four levels: 2D Patch, 3D Patch, Frame, and Clip, capturing a comprehensive range of spatial and temporal details. …”
    Get full text
    Article