Text this: Airport Clearance Detection Based on Vision Transformer and Multi-Scale Feature Fusion