CPSC 502: Paper Critiques


WebIQ uses a three-fold, query-based approach to the problem: First, it discovers data instances from the “Surface Web” (the part of the World Wide Web that can be accessed using regular search engines such as Google). This is done by formulating extraction queries, posing them to a search engine, extracting data instances from the results, and then validating them. Second, it borrows already existing data instances from other form labels, computing a validation score for them, and comparing this score with that received by known non-instances of that particular label. Lastly, WebIQ validates the borrowed instances via the “DeepWeb” (accessible through web form searches) by inputting the instances into the web forms and observing the response received.


    0 Figures and Tables

      Download Full PDF Version (Non-Commercial Use)