Text this: A multi-modal transformer for predicting global minimum adsorption energy