Datalog


Datalog is a declarative logic programming language that syntactically is a subset of Prolog. It is often used as a query language for deductive databases. In recent years, Datalog has found new application in data integration, information extraction, networking, program analysis, security, cloud computing and machine learning.
Its origins date back to the beginning of logic programming, but it became prominent as a separate area around 1977 when Hervé Gallaire and Jack Minker organized a workshop on logic and databases. David Maier is credited with coining the term Datalog.

Features, limitations and extensions

Unlike in Prolog, statements of a Datalog program can be stated in any order. Furthermore, Datalog queries on finite sets are guaranteed to terminate, so Datalog does not have Prolog's cut operator. This makes Datalog a fully declarative language.
In contrast to Prolog, Datalog
  1. disallows complex terms as arguments of predicates, e.g., p is admissible but not p,
  2. imposes certain stratification restrictions on the use of negation and recursion,
  3. requires that every variable that appears in the head of a clause also appears in a nonarithmetic positive literal in the body of the clause,
  4. requires that every variable appearing in a negative literal in the body of a clause also appears in some positive literal in the body of the clause
Query evaluation with Datalog is based on first-order logic, and is thus sound and complete. However, Datalog is not Turing complete, and is thus used as a domain-specific language that can take advantage of efficient algorithms developed for query resolution. Indeed, various methods have been proposed to efficiently perform queries, e.g., the Magic Sets algorithm, tabled logic programming or SLG resolution.
Some widely used database systems include ideas and algorithms developed for Datalog. For example, the standard includes recursive queries, and the Magic Sets algorithm is implemented in IBM's DB2. Moreover, Datalog engines are behind specialised database systems such as Intellidimension's database for the semantic web.
Several extensions have been made to Datalog, e.g., to support aggregate functions, to allow object-oriented programming, or to allow disjunctions as heads of clauses. These extensions have significant impacts on the definition of Datalog's semantics and on the implementation of a corresponding Datalog interpreter.

Fragments

Datalog programs may or may not use negation in rule bodies: Datalog programs with negation are often required to use it as stratified negation to ensure that the semantics are well-defined. Datalog programs may or may not use inequalities between variables in rule bodies.
Some syntactic fragments of Datalog have been defined, such as the following :
Another restriction concerns the use of recursion: nonrecursive Datalog is defined by disallowing recursion in the definition of Datalog programs. This fragment of Datalog is as expressive as unions of conjunctive queries, but writing queries as nonrecursive Datalog can be more concise.

Expressiveness

The boundedness problem for Datalog asks, given a Datalog program, whether it is bounded, i.e., the maximal recursion depth reached when evaluating the program on an input database can be bounded by some constant. In other words, this question asks whether the Datalog program could be rewritten as a nonrecursive Datalog program. Solving the boundedness problem on arbitrary Datalog programs is undecidable, but it can be made decidable by restricting to some fragments of Datalog.

Example

These two lines define two facts, i.e. things that always hold:

parent.
parent.

This is what they mean: xerces is a parent of brooke and brooke is a parent of damocles. The names are written in lowercase because strings beginning with an uppercase letter stand for variables.
These two lines define rules, which define how new facts can be inferred from known facts.

ancestor :- parent.
ancestor :- parent, ancestor.

meaning:
This line is a query:

?- ancestor.

It asks the following: Who are all the X that xerces is an ancestor of? It would return brooke and damocles when posed against a Datalog system containing the facts and rules described above.
More about rules: a rule has a :- symbol in the middle: the part to the left of this symbol is the head of the rule, the part to the right is the body. A rule reads like this: is known to be true if it is known that is true. Uppercase letters in rules stand for variables: in the example, we don't know who X or Y are, but some X is the ancestor of some Y if that X is the parent of that Y. The ordering of the clauses is irrelevant in Datalog, in contrast to Prolog, which depends on the ordering of clauses for computing the result of the query call.
Datalog distinguishes between extensional predicate symbols and intensional predicate symbols. In the example above ancestor is an intensional predicate symbol, and parent is extensional. Predicates may also be defined by facts and rules and therefore neither be purely extensional nor intensional, but any Datalog program can be rewritten into an equivalent program without such predicate symbols with duplicate roles.

Systems implementing Datalog

Here is a short list of systems that are either based on Datalog or provide a Datalog interpreter:

Free software/open source

Non-free software