Text this: Speech Emotion Recognition Using Multi-Scale Global–Local Representation Learning with Feature Pyramid Network