Text this: VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution