Text this: Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision