The use of Semantic Web technologies led to an increasing number of structured data published on the Web. Despite the advances on question answering systems retrieving the desired information from structured sources is still a substantial challenge. Users and researchers still face difficulties to integrate and compare their systems results and performance. openQA is an open source question answering framework that unifies approaches from several domain experts. The aim of openQA is to provide a common platform that can be used to promote advances by easy integration and measurement of different approaches.
We propose a framework for combining different approaches to question answering. Following, we explain our openQA framework which consists of three main modules comprising several sub modules. We claim that the presented architecture leverages the diversity of many existing question answering systems.
- Interpretation The first and crucial stage of the the core module is the interpretation. Here, the framework attempts to generate a formal presentation of the intention behind the input question. By these means, it also determines how the input question will be processed by the rest of the system. There is a vast variety of techniques that can be applied on this stage, such as tokenization, disambiguation, internationalization, logical forms, semantic role labels, question reformulation, coreference, relations, named entities amongst others. Most of these technologies are well understood and are not discussed here. The interpretation stage can generate one or more interpretations of the same input in different formats as SPARQL, SQL or string tokens.
- Retrieval The retrieval stage extracts answers from sources according to the delivered interpretation format or content. For instance, one of the outputs of the interpretation can be: (1) a SPARQL query which can be handled by a triple store; (2) a SQL query which will be processed by a database, or; (3) a set of keywords that can be send to a document-based search engine. Specific interpretations can also be used for extracting answer from sources such as web services.
- Synthesization After a set of answer candidates be founded by the retrieval stage, a synthesization stage is required.
Answers can come from different sources and ambiguous results can be retrieved.
This redundancy can be removed by the synthesis stage that processes all information from different retrieval sub-modules.
Results that appear multiple times are fused together with an occurrence incidence attribute.
The retrieved answer candidates can be from different explored graphs and formats: video, image, document and resource. The syntheses stage is also designed for cluster all the related entity information. Result Synthesis allows the framework to rank, cluster and estimate the confidence of the retrieved answer candidates.
- Resolution The last stage is the resolution. The resolution stage can apply filters to remove noise, choose or re-arrange the most promising candidates. The use of machine learning is also encouraged. Machine learning allows the system to select better and rank the candidates synthesized by the previous stage and produce better results in future queries.