Text this: Multimodal depression detection based on an attention graph convolution and transformer