Text this: Leveraging Frozen Foundation Models and Multimodal Fusion for BEV Segmentation and Occupancy Prediction