Text this: Monocular Image Depth Estimation Based on the Fusion of Transformer and CNN