WebIQ uses a three-fold, query-based approach to the problem: First, it discovers data instances from the “Surface Web” (the part of the World Wide Web that can be accessed using regular search engines such as Google). This is done by formulating extraction queries, posing them to a search engine, extracting data instances from the results, and then validating them. Second, it borrows already existing data instances from other form labels, computing a validation score for them, and comparing this score with that received by known non-instances of that particular label. Lastly, WebIQ validates the borrowed instances via the “DeepWeb” (accessible through web form searches) by inputting the instances into the web forms and observing the response received.
Download Full PDF Version (Non-Commercial Use)