Text this: Learning a Mid-Level Representation for Multiview Action Recognition