Text this: Monocular Object-Level SLAM Enhanced by Joint Semantic Segmentation and Depth Estimation