Text this: An Image Grid Can Be Worth a Video: Zero-Shot Video Question Answering Using a VLM