Text this: A methodology for identification of the formulaic language most representative of high-frequency collocations