Text this: Benchmarking overlapping communication and computations with multiple streams for modern GPUs