Text this: Knowledge-driven semantic converting method of multimodal models toward a geospatial perspective