Text this: Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets