Text this: Incorporating convolutional and transformer architectures to enhance semantic segmentation of fine-resolution urban images