Text this: Object detection and multimodal learning for product recommendations