Fig. 1: Comparison between two page sets
SELECT DISTINCT ?s WHERE { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Person> . ?s <http://dbpedia.org/ontology/birthDate> ?o1 . FILTER regex (?o1, "1991") }
This SPARQL query retrieves pages whose type of the page are Person and birth data includes string "1991" using DBpedia database. Since most of the metadata of DBpedia are extrated from infobox, this query retrieves pages that use human related template and whose birth date contain "1991". In this case, most of the pages have "1991 births" as one of the Wikipedia categories for the page. However there are several pages that don't have such Wikipedia category annotation. Those pages may be candidates for adding "1991 births" as one of the Wikipedia categories. In addition, there are several pages that cannot be retrieved by the query. There are two cases for these pages. One is caused by the failure of metadata annotation extraction by DBpedia. Most of those pages lacks infobox or insufficient information in the infobox. This will suggest the editors to update infobox information. The other is error of adding Wikipedia categories. Especially for the categories that are subcategories of "Births by year", there are many pages whose birth day metadata in the infobox are different from the one estimated by the Wikipedia category. Revising the Wikipedia pages based on these information improve the comprehensiveness (coverage and appropriateness) of the Wikipedia Category. You can check the comparison results for Wikipedia category "1991 births".
Initial SPARQL queries were automatically generated by checking common metadata of the pages that belongs to the target Wikipedia category (See papers for detail) and stored in the database. User can update SPARQL queries when new queries have better F measure (harmonic average of precision and recall) than stored one.
Comparison results are categorized into following three groups.
When the SPARQL query can represent Wikipedia category appropriately, comprehensiveness of the target Wikipedia category can be evaluated by precision (|F|/|F|+|E|) and recall (|F|/|F|+|NF|). In order to increase the comprehensivenewss, it is better to check following points.
If the SPARQL queries are not so appropriate, analysis results are not so reliable.
When there is no stored SPARQL queries, the user can generate queries by automatic SPARQL query construction interface . The user input the name of the Wikipedia category and push check button, WC3 generats new SPARQL query by checking common metadata of the pages that belongs to the target Wikipedia category.
Another interface to generate new SPARQL queries are modifying SPARQL queries for the sibling categories. When the user clicks link of the Related Categories near Stored SPARQL Query, the system generate new SPARQL queries by replacing uncommon terms. The user can compares appropriateness of the generated queries by clicking generated SPARQL.