Text this: Efficient GPT-4V level multimodal large language model for deployment on edge devices