FWF Project P30873

KtoAPP: Compiling Knowledge into Applications

Principle Investigator: Mantas Šimkus

Project Description

Software development is notoriously time-consuming and error-prone, and thus automated tools to aid software creation are of great importance to the society. In particular, in many key areas it would be of great value to have automated tools to rapidly create highly customized yet highly reliable data-centric applications. For example, when coping with a natural disaster or other unexpected crisis, we would like to have situation-specific yet reliable software for the responders and the general public to report the crisis-related information. Another example are one-off clinical trials, in which researchers need reliable tools to collect data reported by trial participants, all in a very knowledge-intensive and highly regulated domain. To ensure the quality of data, the applications for data reporting should be tailored to the specific trial, and the medical conditions and needs of the participants. These examples share a challenging combination of requirements, which current technologies do not allow to fulfill without very costly development efforts: (i) a very complex problem domain, (ii) a need for reliability, and (iii) a need for situation-specific customizations to make the application accessible to users with little or no training. We believe that this challenge can now be overcome due to the big advances in Knowledge Representation and Reasoning (KR&R) and other areas of Artificial Intelligence (AI) in the last decade. Large amounts of complex human knowledge have been captured in machine-readable formats, and very scalable and reliable systems for automated inference using this knowledge are now available. The core idea of our approach is to exploit domain knowledge given as two components: (1) a knowledge base (KB) that captures in a machine-readable language the general knowledge about a given problem domain (e.g., a KB of clinical terms), and (2) a focus specification that captures the requirements of a specific application (e.g., a description of a clinical trial).

The main goal of this project is to develop foundational KR&R techniques to automatically compile a given knowledge base and a focus specification into a reliable yet very accessible application, tailored for reporting data about the entities described in the focus specification. 

We envision a four stage process for compiling a knowledge base and a focus specification into an application:

I. Preparatory Stage  In this stage, domain experts create a KB K=(O,A) for the given application domain. The KB contains an ontology O and a collection A of actions. The KB is written using a specially designed knowledge representation language. The ontology O describes the relevant entities of the application domain, as well as formalizes in a logic-based language the known relationships between such entities. The actions in A enable the evolution of data that is to be managed by target applications: they are pieces of program code that describe the precise ways data can be added and modified by the users.

II. Focusing Stage   When a need for a particular application emerges, domain experts create a focus specification. It contains a (possibly highly incomplete) list of entities that are relevant for the target application, and may also contain additional situation-specific actions. The focus specification is
written in a machine-readable language and uses the vocabulary of the KB constructed in the previous stage.

III. Compilation Stage   In this stage, a specially designed software tool (a compiler) takes the KB together with the focus specification as input, and—if no errors occur—outputs a fully functional application. It incorporates into the application only the functionality that is strictly necessary to keep track of the relevant data, i.e. data about the entities described in the focus specification. Importantly, the compiler employs automated reasoning to not only configure the application but to also provide significant correctness guarantees. Thus the compiler produces a very reliable yet highly customized application from a KB that may potentially describe thousands of different entities. This valuable functionality is provided by the compiler’s two key components—the focusing component and the verification component—working in tandem:

  • The focusing component computes from the input KB and the focus specification a new (ideally, much smaller) KB, which in a suitable way conceals the entities whose exclusion is prescribed either explicitly in the focus specification or can be inferred from it. Roughly speaking, the resulting KB is the basis for an automatically derived database schema and database integrity constraints for the intended target application. The resulting KB is then sent to the verification component. The verification component employs tailored software verification techniques to check correctness of the data-manipulating actions to be supported by the target application. In case a problem is identified by the verification component (e.g., there exists a sequence of actions that could lead the application’s data into an undesired state), a suitable feedback is sent to the focusing component. The focusing component may use the feedback in an attempt to modify its output. If that turns out to be impossible, an elaborate feedback regarding errors in the initial KB or the focus specification is sent to the developers.

If the focusing and the verification components find no errors, the code generation component produces from the restricted KB the program code for the target application (this code is in a standard programming language, and an existing compiler can be used to create the final application), as well as the code (e.g., SQL statements) to create the application’s initial database equipped with integrity constraints.

IV. Run-time Stage  In this stage, the generated application is employed by end users to collect and explore data about the entities described in the focus specification. Since the application’s scope is limited to the necessary minimum, one can successfully use the application with little or no training.

Summary of Objectives

The key ingredient in our vision of compiling KBs into applications is the tandem of focusing and verification. By studying the theoretic foundations of focusing and verification in KBs, this project will provide a deep understanding of how such a tandem can be realized. In particular, we will study KBs that are based on various Description Logics, which are very prominent languages to write ontologies. An important goal in our study will be to ensure compositionally of focusing and verification: to enable the compilation process as described previously, the output of the focusing component must be a valid input for the verification component. The objectives of this project will involve work that connects several areas of computer science: KR&R, Database Theory and Software Verification. In the long run, the techniques developed during this project will help to dramatically increase the quality of data-centric applications, and to reduce their development costs.