Text this: Joint Adaptive Resolution Selection and Conditional Early Exiting for Efficient Video Recognition on Edge Devices