Text this: Encoding Longer-Term Contextual Information with Predictive Coding and Ego-Motion