Text this: ScaleFormer architecture for scale invariant human pose estimation with enhanced mixed features