QUASIMODO – A Commonsense Knowledge Base

In this work done in collaboration with the Max Planck Institute of Informatics, we focused on discovering commonsense knowledge, i.e. knowledge which is shared by most human, from various sources.

Commonsense is very important in modern AI as it helps understand objects, human behaviors, and general concepts. However, it is challenging to capture as it is rarely expressed and it is hard to distinguish from contextual knowledge.

For example, an elephant is grey, but this fact is almost never mentioned as so. On Google, a quick search gives three times more estimated results for “pink elephant” (7.3 million) than for “grey elephants” (2 million).

To tackle this problem, we devised novel ways of tapping into search-engine query logs and QA forums and we confirmed our facts using statistical evidence coming from different sources, such as encyclopedias, image tags, and books. Our pipeline can be represented as follow:

Our main idea to get fact candidates was to consider questions instead of statements. We leveraged human curiosity to extract salient knowledge about the world. Indeed, depending on the way one writes a question, it implies facts about the world. For instance, “why is the sky blue” implies that the sky is actually blue.

So we constructed a dataset of questions. One of our sources were QA forums such as Reddit or Answer.com, where we extracted all questions. Then, we used the autocompletion from a search engine like Google or Bing to simulate access to the search engine query log. We mainly focused on “why” and “how” questions (such as “why are elephants grey” or “how do birds fly”).

In the end, we obtained a knowledge base ten times bigger than ConceptNet, a handmade knowledge base, and TupleKB, a knowledge base focusing on high precision facts. Intrinsic and extrinsic evaluations proved to performances of our approach.

The paper was accepted in CIKM 2019 and can be found on Arxiv. Data and additional information can be accessed on D5 website. The code used for the generation can be found on Github.

Smart Plans

Welcome to the page related to our work on smart plans.

This work was done by Nicoleta Preda, Fabian Suchanek and Julien Romero


Abstract

Many data sources provide access to their data through Web services (remote APIs). These Web services can be orchestrated together in execution plans in order to answer queries. In our paper, we show that some plans are guaranteed to deliver an answer to the query under certain conditions. We show that the problem of determining the existence of such plans is decidable, and we provide an algorithm for finding such plans. Finally, we conduct a proof of concept for our method on real Web services.

(more…)