WC3(WC-triple):Wikipedia Category Comprehensiveness Checker (DBpedia 2016-10 version)

Search   Automatic Query Construction   Query Construction by Related Categories   Page Info  Help   Publication   Data   Demo Movie  

HELP page


WC3 (WC-triple: Wikipedia Category Comprehensiveness Checker; formerly named as Wikipedia Category Consistency Checker) checks comprehensiveness of Wikipedia category information by using DBpedia information.
WC3 stores SPARQL queries of the DBpedia for representing the Wikipedia categories and compares retrieved results for checking the comprehensiveness of the Wikipedia category.

Fig. 1: Comparison between two page sets

For example, a SPARQL query for Wikipedia category 1991 births is as follows.

?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Person> .
?s <http://dbpedia.org/ontology/birthDate> ?o1 . FILTER regex (?o1, "1991") 

This SPARQL query retrieves pages whose type of the page are Person and birth data includes string "1991" using DBpedia database. Since most of the metadata of DBpedia are extrated from infobox, this query retrieves pages that use human related template and whose birth date contain "1991". In this case, most of the pages have "1991 births" as one of the Wikipedia categories for the page. However there are several pages that don't have such Wikipedia category annotation. Those pages may be candidates for adding "1991 births" as one of the Wikipedia categories. In addition, there are several pages that cannot be retrieved by the query. There are two cases for these pages. One is caused by the failure of metadata annotation extraction by DBpedia. Most of those pages lacks infobox or insufficient information in the infobox. This will suggest the editors to update infobox information. The other is error of adding Wikipedia categories. Especially for the categories that are subcategories of "Births by year", there are many pages whose birth day metadata in the infobox are different from the one estimated by the Wikipedia category. Revising the Wikipedia pages based on these information improve the comprehensiveness (coverage and appropriateness) of the Wikipedia Category. You can check the comparison results for Wikipedia category "1991 births".

Initial SPARQL queries were automatically generated by checking common metadata of the pages that belongs to the target Wikipedia category (See papers for detail) and stored in the database. User can update SPARQL queries when new queries have better F measure (harmonic average of precision and recall) than stored one.

Comparison results are categorized into following three groups.

When the SPARQL query can represent Wikipedia category appropriately, comprehensiveness of the target Wikipedia category can be evaluated by precision (|F|/|F|+|E|) and recall (|F|/|F|+|NF|). In order to increase the comprehensivenewss, it is better to check following points.

If the SPARQL queries are not so appropriate, analysis results are not so reliable.


Followings are procedures to analyze Wikipedia Category by using stored SPARQL queries.
  1. Input the name of the category in Category: text box and click the Load button. When you type the first two characters in the text box, candidate category names start with input characters are shown as a list.
  2. When the results are stored in the database, the results are shown at the bottom of the page.
    Detailed information of Found, NotFound, Error are displayed by clicking ▽.
  3. For each page, you can check the Wikipedia and/or DBpedia information by using the Wikipedia or DBpedia link in the table.
  4. For any new searches, please ensure to clear both the Category: and SPARQL textboxes by clicking "Clear SPARQL".

When there is no stored SPARQL queries, the user can generate queries by automatic SPARQL query construction interface . The user input the name of the Wikipedia category and push check button, WC3 generats new SPARQL query by checking common metadata of the pages that belongs to the target Wikipedia category.

Another interface to generate new SPARQL queries are modifying SPARQL queries for the sibling categories. When the user clicks link of the Related Categories near Stored SPARQL Query, the system generate new SPARQL queries by replacing uncommon terms. The user can compares appropriateness of the generated queries by clicking generated SPARQL.