Text this: Fingerprint-Based Near-Duplicate Document Detection with Applications to SNS Spam Detection