Text this: Robot Ground Media Classification Based on Hilbert–Huang Transform and Attention-Based Spatiotemporal Coupled Network