Text this: Image experience prediction for historic districts using a CNN-transformer fusion model