Text this: A hybrid CNN-LSTM model with adaptive instance normalization for one shot singing voice conversion