Text this: Efficient Decomposition Method for Similar Text Data in Large Corpora