Text this: Assessing the performance of zero-shot visual question answering in multimodal large language models for 12-lead ECG image interpretation