FWF Project P30360

Ontology-mediated Queries for Graph Databases

Principle Investigator: Mantas Šimkus

Introduction and Motivation

Modern software applications need to manage and access information with increasingly complex structure, and Knowledge Representation and Reasoning (KR&R) has proved to provide promising means to address this challenge. This area of Artificial Intelligence (AI) is developing techniques to enable automated reasoning about knowledge in a given application domain, captured in a suitably tailored machine-readable format. When knowledge with complex structure needs to be represented, building ontologies expressed in Description Logics (DLs) is one of the most popular choices [Baader et al., 2003]. For instance, DLs provide the foundations for the W3C-standardized languages for describing resources in the Semantic Web using the so-called OWL ontologies [OWL Working Group, 2009Horrocks, 2008]. Supported by efficient reasoners, DLs are being successfully used to capture complex knowledge in life sciences, e.g., to systematize thousands of medical terms in the SNOMED CT ontology (Systematized Nomenclature of Medicine—Clinical Terms), which is used in the health care systems of the US, the UK, and other countries [Spackman, 2000]. A major application of DLs is the data integration paradigm called Ontology-based Data Access (OBDA) [Poggi et al., 2008]. Here an ontology provides a unified view (a conceptual schema) of the application domain, reconciling the differences among the multiple data sources and providing machine-readable domain knowledge. A key reasoning task in OBDA is answering ontology-mediated queries (OMQs). Intuitively, in this setting a user query formulated using the vocabulary of the conceptual schema is answered by an OBDA engine by incorporating information from the various information sources, and addressing the possible information incompleteness by employing the represented domain knowledge to infer new facts when possible.

DLs allow to model the domain of interest using unary and binary relation symbols, called concept names and role names, respectively. The semantics to DL ontologies is given in terms of models, which are nothing else but relational databases with only unary and binary relations, satisfying the ontology seen as a collection of integrity constraints. These structures can be naturally represented as graphs, whose nodes are labeled with concept names and edges are labeled with role names. Such graphs are a basic object of investigations in graph databases (see, e.g., [Consens and Mendelzon, 1990Barceló et al., 2012aBarceló et al., 2015Barceló et al., 2014Bourhis et al., 2014Reutter, 2013Barceló et al., 2012b]), which are gaining increasing popularity as an alternative to traditional relational databases. Graph databases provide more flexibility as they do not require a database schema, which in classic relational databases imposes a very rigid structure on the data. E.g., RDF triple stores for Web data are concrete instances of graph databases. OBDA and graph databases are close in spirit in the sense that both deal with loosely structured data. However, they also have major differences, which this project aims to reconcile.

The first difference between OBDA and graph databases is the assumption made regarding completeness of information. Most DLs are fragments of classical first-order logic [Borgida, 1996], and thus have semantics that makes the open-world assumption. That is, the semantics of an ontology is given by the set of all its models. This view naturally supports modeling and reasoning in the presence of incomplete knowledge, and hence is adopted by the current OBDA approaches. In contrast, in the traditional (graph) database setting one makes the closed-world assumption; intuitively, it means that every fact that is not explicitly present in the database must be assumed to be false. This is natural when the information stored in a database is known to be complete. For example, a municipality-provided table of bus routes can be assumed complete, i.e. if the table does not include a route to a certain destination, then it can be assumed that there is no such route. It has been acknowledged by both DL researchers and database theorists that these assumptions are too strong, and that formalisms simultaneously supporting open-world and closed-world reasoning need to be developed [Lutz et al., 2012Motik and Rosati, 2010Casoli and Lenzerini, 1990Knorr et al., 2011Fan and Geerts, 2010Benedikt et al., 2016]. E.g., in an OBDA system that integrates information about a city’s restaurants and bus routes, it may be useful to view the municipality-provided data about existing bus routes as complete, while a crowd-sourced collection of restaurants as incomplete. Ideally, query answering algorithms should exploit this information to exclude irrelevant models and infer more informative answers. A recent approach to this challenge are extensions of DLs with closed predicates [Franconi et al., 2011Franconi et al., 2013Lutz et al., 2013Lutz et al., 2015Ngo et al., 2016], which allow the user to specify which relations of the input database should be considered as complete.

The second difference between OBDA and graph databases is the query languages that are used for accessing data. The DL community has mainly focused on OMQs based on conjunctive queries (CQs), which are the most common query language for relational databases. The computational complexity and efficient algorithms for such OMQs have received significant attention in the recent years (see, e.g., [Ortiz and Šimkus, 2012Bienvenu and Ortiz, 2015] for surveys). However, CQs lack navigational features, which are central in query languages for graph databases. E.g., the natural query to retrieve pairs (x,ysuch that y is a police station that is reachable from a bus stop y using one or more bus connections is not supported by the majority of OMQ answering approaches. In contrast, navigational queries like regular path queries (RPQs) in graph databases effortlessly support such and more complex queries.

Main Goals

In this project we will study the database-theoretic foundations of combining closed predicates and navigational features in ontology-mediated query answering, paving the way for very powerful techniques for intelligent data access, and going significantly beyond the current state-of-the-art. In particular, the main goal of this project is to study the relative expressiveness and the complexity of static analysis problems for ontology-mediated queries with navigational features and closed predicates (ONCs).

Characterizing Relative Expressiveness  This goal aims to compare the expressive power of ONCs with the expressive power of classic query languages. To this end, we will mainly develop translations from ONCs into (suitable variations) of Datalog. This study will clarify the relationship between the emerging and the classic approaches to data access, e.g., it will allow us to transfer existing results from the setting of Datalog to the setting of ONCs. This study will also open the way to reusing existing efficient Datalog engines for answering ONCs in real-life scenarios.

Complexity of Static Analysis Problems  We will study the computational complexity of reasoning tasks aimed at supporting the design of ONCs. In particular, we will investigate the query containment and the query emptiness problems, the most primitive static analysis problems that lie at the core query design and optimization techniques.

By studying the combination of navigational features and closed predicates, this project aims to provide substantial impact in at least two research areas: in KR&R, and in Database Theory. From the KR&R perspective, ONCs are superior to standard OMQs because they allow for a more flexible navigation of information, and allow to assert that certain parts of the knowledge are complete. From the Database Theory perspective, ONCs are superior to standard navigational query languages for graph databases because they allow to declare some relations as incomplete (in contrast to the default assumption that they are complete), and then interrelate the complete and incomplete relations by means of an ontology.

References

[Baader et al., 2003]   Baader, F., Calvanese, D., McGuinness, D. L., Nardi, D., and Patel-Schneider, P. F., editors (2003). The Description Logic Handbook: Theory, Implementation, and Applications.
Cambridge University Press.

[Barceló et al., 2015]   Barceló, P., Fontaine, G., and Lin, A. W. (2015). Expressive path queries on
graph with data. Logical Methods in Computer Science, 11(4).

[Barceló et al., 2012a]   Barceló, P., Libkin, L., Lin, A. W., and Wood, P. T. (2012a). Expressive
languages for path queries over graph-structured data. ACM Trans. Database Syst., 37(4):31.

[Barceló et al., 2014]   Barceló, P., Libkin, L., and Reutter, J. L. (2014). Querying regular graph
patterns. J. ACM, 61(1):8:1–8:54.

[Barceló et al., 2012b]   Barceló, P., Pérez, J., and Reutter, J. L. (2012b). Relative expressiveness of
nested regular expressions. In Proc. of the A. Mendelzon Int. WS on Found. of Data Management
(AMW 2012), pages 180–195. CEUR-WS.org.

[Benedikt et al., 2016]   Benedikt, M., Bourhis, P., ten Cate, B., and Puppis, G. (2016). Querying visible
and invisible information. In Proc. of LICS 2016, pages 297–306. ACM.

[Bienvenu and Ortiz, 2015]   Bienvenu, M. and Ortiz, M. (2015). Ontology-mediated query answering
with data-tractable description logics. In Proc. of Reasoning Web Summer School 2015, volume 9203 of
LNCS, pages 218–307. Springer.

[Borgida, 1996]   Borgida, A. (1996). On the relative expressiveness of description logics and predicate
logics. Artif. Intell., 82(1-2):353–367.

[Bourhis et al., 2014]   Bourhis, P., Krötzsch, M., and Rudolph, S. (2014). How to best nest regular
path queries. In Proc. of DL 2014, volume 1193 of CEUR Workshop Proceedings, pages 404–415.
CEUR-WS.org.

[Cadoli and Lenzerini, 1990]   Cadoli, M. and Lenzerini, M. (1990). The complexity of closed world
reasoning and circumscription. In Proc. of AAAI’90, pages 550–555. AAAI Press / The MIT Press.

[Consens and Mendelzon, 1990]   Consens, M. P. and Mendelzon, A. O. (1990). Graphlog: a visual
formalism for real life recursion. In Proc. of PODS’90, pages 404–416. ACM Press.

[Fan and Geerts, 2010]   Fan, W. and Geerts, F. (2010). Relative information completeness. ACM Trans.
Database Syst., 35(4):27:1–27:44.

[Franconi et al., 2011]   Franconi, E., Ibáñez-García, Y. A., and Seylan, I. (2011). Query answering
with DBoxes is hard. Electr. Notes Theor. Comput. Sci., 278:71–84.

[Franconi et al., 2013]   Franconi, E., Kerhet, V., and Ngo, N. (2013). Exact query reformulation over
databases with first-order and description logics ontologies. J. Artif. Intell. Res. (JAIR), 48:885–922.

[Horrocks, 2008]   Horrocks, I. (2008). Ontologies and the semantic web. Commun. ACM,
51(12):58–67.

[Knorr et al., 2011]   Knorr, M., Alferes, J. J., and Hitzler, P. (2011). Local closed world reasoning
with description logics under the well-founded semantics. Artif. Intell., 175(9-10):1528–1554.

[Lutz et al., 2012]   Lutz, C., Seylan, I., and Wolter, F. (2012). Mixing open and closed world
assumptionin ontology-based data access: Non-uniform data complexity. In Proc. of DL 2012.
CEUR-WS.org.

[Lutz et al., 2013]   Lutz, C., Seylan, I., and Wolter, F. (2013). Ontology-based data access with closed
predicates is inherently intractable(sometimes). In Proc. of IJCAI 2013. IJCAI/AAAI.

[Lutz et al., 2015]   Lutz, C., Seylan, I., and Wolter, F. (2015). Ontology-mediated queries with closed
predicates. In Proc. of IJCAI 2015, pages 3120–3126. AAAI Press.

[Motik and Rosati, 2010]   Motik, B. and Rosati, R. (2010). Reconciling description logics and rules.
J. ACM, 57(5).

[Ngo et al., 2016]   Ngo, N., Ortiz, M., and Šimkus, M. (2016). Closed predicates in description logics:
Results on combined complexity. In Proc. of KR 2016.

[Ortiz and Šimkus, 2012]   Ortiz, M. and Šimkus, M. (2012). Reasoning and query answering in
description logics. In Proc. of the Reasoning Web Summer School 2012, volume 7487 of LNCS, pages
1–53. Springer.

 [OWL Working Group, 2009]   OWL
Working Group, W. (27 October 2009). OWL 2 Web Ontology Language: Document Overview. W3C
Recommendation. Available at http://www.w3.org/TR/owl2-overview/.

[Poggi et al., 2008]   Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., and Rosati,
R. (2008). Linking data to ontologies. J. Data Semantics, 10:133–173.

[Reutter, 2013]   Reutter, J. L. (2013). Containment of nested regular expressions. CoRR,
abs/1304.2637.

[Spackman, 2000]   Spackman, K. (2000). Managing clinical terminology hierarchies using algorithmic
calculation of subsumption: Experience with SNOMED-RT. J. of the Amer. Med. Informatics Ass.