Text this: Information extraction from massive Web pages based on node property and text content