Text this: Spatiotemporal squeeze-and-excitation residual multiplier network for video action recognition