Inductive logic Programming 2018

Machine Learning Technology Artificial Intelligence Technology Natural Language Processing Technology Semantic Web Technology Ontology Technology Reasoning Technology Knowledge Information Technology Collecting AI Conference Papers Digital Transformation Technology

In the previous article, we discussed ILP2017. In this issue, I will discuss ILP2018 held in Ferrara, Italy.

Inductive logic programming (ILP) is a subfield of machine learning that relies on logic programming as a unified expression language for representing examples, background knowledge, and hypotheses. With its powerful expressive form based on first-order predicate logic, ILP provides an excellent vehicle for multi-relational learning and data mining.

Launched in 1991, the ILP conference series is the premier international forum for learning from structured or semi-structured relational data. Originally focused on the introduction of logic programs, over the years the scope of research has expanded significantly to include logic, multi-relational data mining, statistical relational learning, graph and tree mining, other learning (non – proposed) logic-based knowledge representation frameworks, statistical learning and other probabilistic Investigate the intersections of the approaches with the

Learning and Inference with Constraints

Probabilistic modeling has been a dominant approach in Machine Learning research. As the field evolves, thc problems of interest become increasingly challenging and complex. Making complex decisions in real world problems often involves assigning values to sets of interdependent variables where the expressive dependency structure can influence, or even dictate, what assignments are possible. However, incorporating nonlocal depcndencies in a probabilistic model can lead to intractable training and inference. This paper presents Constraints Conditional Models (CCMs), a framework that augments probabilistic models with declarative constraints as a way to support decisions in an expressive output space while maintaining modularity and tractability of training. We further show that declarative constraints can be used to take advantage of unlabeled data when training the probabilistic model.

The game of Bridge: a challenge for ILP

Designs of champion-level systems dedicated to a game have been considered as milestones for Artificial Intelligence. Such a success has not yet happened for the game of Bridge because (i) Bridge is a partially observable game (ii) a Bridge player must be able to explain at some point the meaning of his actions to his opponents. This paper presents a simple supervised learning problem in Bridge: given a ‘limit hand’, should a player bid or not, only considering his hand and the context of his decision. We describe this problem and some of its candidate modelisations. We then experiment state of the art propositional machine learning and ILP systems on this problem. Results of these preliminary experiments show that ILP systems are competitive or even outperform propositional Machine Learning systems. ILP systems are moreover able to build explicit models that have been validated by expert Bridge players.

Large-Scale Assessment of Deep Relational Machines

Deep Relational Machines (or DRMs) present a simple way for incorporating complex domain knowledge into deep networks. In a DRM this knowledge is introduced through relational features: in the original formulation of [1], the features are selected by an ILP engine using domain knowledge encoded as logic programs. More recently, in [2], DRMs appear to achieve good performance without the need of feature-selection by an ILP engine (the features are simply drawn randomly from a space of relevant features). The reports so far on DRMs though have been deficient on three counts: (a) They have been tested on very small amounts of data (7 datasets, not all independent, altogether with few 1000s of instances); (b) The background knowledge involved has been modest, involving few 10s of predicates; and (c) Performance assessment has been only on classification tasks. In this paper we rectify each of these shortcomings by testing on datasets from the biochemical domain involving 100s of 1000s of instances; industrial-strength background predicates involving multiple hierarchies of complex definitions; and on classification and regression tasks. Our results provide substantially reliable evidence of the predictive capabilities of DRMs; along with a significant improvement in predictive performance with the incorporation of domain knowledge. We propose the new datasets and results as updated benchmarks for comparative studies in neural-symbolic modelling.

Lifted Relational Team Embeddings for Predictive Sports Analytics

We investigate the use of relational learning in domain of predictive sports analytics, for which we propose a team embedding concept expressed in the language of Lifted relational neural networks, a framework for learning of latent relational structures. On a large dataset of soccer results, we compare different relational learners against strong current methods from the domain to show some very promising results of the relational approach when combined with embedding learning.

Online Learning of Weighted Relational Rules for Complex Event Recognition

Systems for symbolic complex event recognition detect occurrences of events in time using a set of event definitions in the form of logical rules. The Event Calculus is a temporal logic that has been used as a basis in event recognition applications, providing among others, connections to techniques for learning such rules from data. We advance the state-of-the-art by combining an existing online algorithm for learning crisp relational structure with an online method for weight learning in Markov Logic Networks (MLN). The result is an algorithm that learns complex event patterns in the form of Event Calculus theories in the MLN semantics. We evaluate our approach on a challenging real-world application for activity recognition and show that it outperforms both its crisp predecessor and competing online MLN learners in terms of predictive performance, at the price of a small increase in training time.

Context-Aware End-To-End Relation Extracting from Clinical Texts with Attention-based Bi-Tree-GRUs

Extracting clinical entities and their relations from clinical texts is a preliminary task for constructing medical knowledge graph. Existing end-to-end models for extracting entity and relation have limited performance in clinical text because they rarely take both latent syntactic information and the effect of context information into account. Thus, this paper proposed a context-aware end-toend neural model for extracting relations between entities from clinical texts with two level attention. We show that entity-level attention effectively acquires more syntactic information by learning a weighted sum of child word nodes rooted at the target entities. Meanwhile, sub sentence-level attention in an effort to capture the interactions between the target entity pairs and context entity pairs by assigning weights of each context representation within one sentence. Experiments on real-world clinical texts from Shanghai Ruijin Hospital demonstrate that our model significantly gains better performance in the application of clinical texts compared with existing joint models.

Sampling-Based SAT/ASP Multi-Model Optimization as a Framework for Probabilistic Inference

This paper proposes multi-model optimization through SAT witness or answer set sampling, with common probabilistic reasoning tasks as primary use cases (including deduction-style probabilistic inference and hypothesis weight learning). Our approach enhances a state-of-the-art SAT/ASP solving algorithm with Gradient Descent as branching literal decision approach, and optionally a cost backtracking mechanism. Sampling of models using these methods minimizes a task-specific, user-provided multi-model cost function while adhering to given logical background knowledge (either a Boolean formula in CNF or a normal logic program under stable model semantics). Features of the framework include its relative simplicity and high degree of expressiveness, since arbitrary differentiable cost functions and background knowledge can be provided.

Heuristic Based Induction of Answer Set Programs, From Default theories to Combinatorial problems

Significant research has been conducted in recent years to extend Inductive Logic Programming (ILP) methods to induce Answer Set Programs (ASP). These methods perform an exhaustive search for the correct hypothesis by encoding an ILP problem instance as an ASP program. Exhaustive search, however, results in loss of scalability. In addition, the language bias employed in these methods is overly restrictive too. In this paper we extend our previous work on learning stratified answer set programs that have a single stable model to learning arbitrary (i.e., non-stratified) ones with multiple stable models. Our extended algorithm is a greedy FOIL-like algorithm, capable of inducing non-monotonic logic programs, examples of which includes programs for combinatorial problems such as graph-coloring and N-queens. To the best of our knowledge, this is the first heuristic-based ILP algorithm to induce answer set programs with multiple stable models.

The Complexity and Generality of Learning Answer Set Programs

Traditionally most of the work in the field of Inductive Logic Programming (ILP) has addressed the problem of learning Prolog programs. On the other hand, Answer Set Programming is increasingly being used as a powerful language for knowledge representation and reasoning, and is also gaining increasing attention in industry. Consequently, the research activity in ILP has widened to the area of Answer Set Programming, witnessing the proposal of several new learning frameworks that have extended ILP to learning answer set programs. In this paper, we investigate the theoretical properties of these existing frameworks for learning programs under the answer set semantics. Specifically, we present a detailed analysis of the computational complexity of each of these frameworks with respect to the two decision problems of deciding whether a hypothesis is a solution of a learning task and deciding whether a learning task has any solutions. We introduce a new notion of generality of a learning framework, which enables us to define a framework to be more general than another in terms of being able to distinguish one ASP hypothesis solution from a set of incorrect ASP programs. Based on this notion, we formally prove a generality relation over the set of existing frameworks for learning programs under answer set semantics. In particular, we show that our recently proposed framework, Context-dependent Learning from Ordered Answer Sets, is more general than brave induction, induction of stable models, and cautious induction, and maintains the same complexity as cautious induction, which has the highest complexity of these frameworks

Using Deep Learning Platforms to Perform Inference over Large Knowledge Bases

Semantic Web knowledge representation standards, and in particular RDF and OWL, often come endowed with a formal semantics which is considered to be of fundamental importance for the field. Reasoning, i.e., the drawing of logical inferences from knowledge expressed in such standards, is traditionally based on logical deductive methods and algorithms which can be proven to be sound and complete and terminating, i.e. correct in a very strong sense. For various reasons, though, in particular, the scalability issues arising from the ever-increasing amounts of Semantic Web data available and the inability of deductive algorithms to deal with noise in the data, it has been argued that alternative means of reasoning should be investigated which bear high promise for high scalability and better robustness. From this perspective, deductive algorithms can be considered the gold standard regarding correctness against which alternative methods need to be tested. In this paper, we show that it is possible to train a Deep Learning system on RDF knowledge graphs, such that it is able to perform reasoning over new RDF knowledge graphs, with high precision and recall compared to the deductive gold standard.

Learning efficient logic programs

When machine learning programs from data, we ideally want to learn efficient rather than inefficient programs. However, existing inductive logic programming (ILP) techniques cannot distinguish between the efficiencies of programs, such as permutation sort (n!) and merge sort $O (n l o g n)$

$O (n l o g n)$

. To address this limitation, we introduce Metaopt, an ILP system which iteratively learns lower cost logic programs, each time further restricting the hypothesis space. We prove that given sufficiently large numbers of examples, Metaopt converges on minimal cost programs, and our experiments show that in practice only small numbers of examples are needed. To learn minimal time-complexity programs, including non-deterministic programs, we introduce a cost function called tree cost which measures the size of the SLD-tree searched when a program is given a goal. Our experiments on programming puzzles, robot strategies, and real-world string transformation problems show that Metaopt learns minimal cost programs. To our knowledge, Metaopt is the first machine learning approach that, given sufficient numbers of training examples, is guaranteed to learn minimal cost logic programs, including minimal time-complexity programs.

Derivation reduction of metarules in meta-interpretive learning

Meta-interpretive learning (MIL) is a form of inductive logic programming. MIL uses second-order Horn clauses, called metarules, as a form of declarative bias. Metarules define the structures of learnable programs and thus the hypothesis space. Deciding which metarules to use is a trade-off between efficiency and expressivity. The hypothesis space increases given more metarules, so we wish to use fewer metarules, but if we use too few metarules then we lose expressivity. A recent paper used Progol’s entailment reduction algorithm to identify irreducible, or minimal, sets of metarules. In some cases, as few as two metarules were shown to be sufficient to entail all hypotheses in an infinite language. Moreover, it was shown that compared to non-minimal sets, learning with minimal sets of metarules improves predictive accuracies and lowers learning times. In this paper, we show that entailment reduction can be too strong and can remove metarules necessary to make a hypothesis more specific. We describe a new reduction technique based on derivations. Specifically, we introduce the derivation reduction problem, the problem of finding a finite subset of a Horn theory from which the whole theory can be derived using SLD-resolution. We describe a derivation reduction algorithm which we use to reduce sets of metarules. We also theoretically study whether certain sets of metarules can be derivationally reduced to minimal finite subsets. Our experiments compare learning with entailment and derivation reduced sets of metarules. In general, using derivation reduced sets of metarules outperforms using entailment reduced sets of metarules, both in terms of predictive accuracies and learning times.

How much can experimental cost be reduced in active learning of agent strategies?

In science, experiments are empirical observations allowing for the arbitration of competing hypotheses and knowledge acquisition. For a scientist that aims at learning an agent strategy, performing experiments induces energy costs. To that extent, the efficiency of a learning process relies on the number of experiments performed. We study in this article how the cost of experimentation can be reduced with active learning to learn efficient agent strategies. We consider an extension of the meta-interpretive learning framework that allocates a Bayesian posterior distribution over the hypothesis space. At each iteration, the learner queries the label of the instance with maximum entropy, it is the most discriminative over the remaining competing hypotheses, and thus achieves the highest shrinkage of the version space. We study the theoretical framework and evaluate the gain on the cost of experimentation for the task of learning regular grammars and agent strategies: our results demonstrate that the number of experiments to perform to reach an arbitrary accuracy level can at least be halved.

Which background knowledge is relevant?

Selection of appropriate background knowledge is critical within the application of Inductive Logic Programming. Traditionally such selection is entirely dependent on human beings, who provide primitive predicates to be used by an ILP system for formulating hypotheses. In this paper we consider issues relating to possible automatic selection of background definitions from a large library of pre-existing predicates. In particular, we consider the effect of the generality of background definitions on error. In our experiments we introduce randomly defined extensional background predicates with varying levels of generality and measure the effects on omission and commission errors in the family relation domain. Our results indicate increasing generality of background predicates leads to a sharp increase in commission errors with a corresponding rapid decrease in omission errors. In further work we aim to investigate Meta-Interpretive Learning systems which order the selection of background and invented predicates based on their generality

Efficiently Encoding Meta-Interpretive Learning by Answer Set Programming

Meta-Interpretive Learning (MIL) learns logic programs from examples by instantiating meta-rules, which is implemented by the Metagol system using Prolog. Based on previous work wrt. solving MIL by employing Answer Set Programming, in this work-in-progress paper we describe a modification of a previous MIL-encoding which prunes the search space more effectively by simulating a top-down search as performed by Prolog. Initial experiments show that our new encoding can significantly speed up the inductive learning process.

Targeted End-to-end Knowledge Graph Decomposition Abstract

Knowledge graphs are networks with annotated nodes and edges, representing different relations between the network nodes. Learning from such graphs is becoming increasingly important as numerous real-life systems can be represented as knowledge graphs, where properties of selected types of nodes or edges are learned. This paper presents a fully autonomous approach to targeted knowledge graph decomposition, advancing the state-of-the-art HINMINE network decomposition methodology. In this methodology, weighted edges between the nodes of a selected node type are constructed via different typed triplets, each connecting two nodes of the same type through an intermediary node of a different type. The final product of such a decomposition is a weighted homogeneous network of the selected node type. HINMINE is advanced by reformulating the supervised network decomposition problem as a combinatorial optimization problem, and by solving it by a differential evolution approach. The proposed approach is tested on node classification tasks on two real-life knowledge graphs. The experimental results demonstrate that the proposed end-to-end learning approach is much faster and as accurate as the exhaustive search approach.

Diagnostics of Trains with Semantic Diagnostics Rules

Industry today employs rule-based diagnostic systems to minimize the maintenance cost and downtime of equipment. Rules are typically used to process signals from sensors installed in equipment by filtering, aggregating, and combining sequences of time-stamped measurements recorded by the sensors. Such rules are often data-dependent in the sense that they rely on specific characteristics of individual sensors and equipment. This dependence poses significant challenges in rule authoring, reuse, and maintenance by engineers especially when the rules require domain knowledge. In this work we propose an approach to address these problems by relying on the well-known Ontology-Based Data Access approach: we propose to use ontologies to mediate the sensor signals and the rules. To this end, we propose a semantic rule language, SDRL, where signals are first class citizens. Our language offers a balance of expressive power, usability, and efficiency: it captures most of Siemens data-driven diagnostic rules, significantly simplifies authoring of diagnostic tasks, and allows to efficiently rewrite semantic rules from ontologies to data and execute over data. We implemented our approach in a semantic diagnostic system and evaluated it. For evaluation we developed a use case of rail systems at Siemens and conducted experiments to demonstrate both usability and efficiency of our solution.

Concept Learning in Engineering based on Refinement Operator

Semantic interoperability has been acknowledged as a challenge for industrial engineering due to the heterogeneity of data models in the involved software tools. In this paper, we show how to learn declarative class definitions of engineering objects in the XML-based data format AutomationML (AML). Specifically, we transform AML document to the description logic OWL 2 DL and use the DL-Learner framework to learn the concepts of named classes. Moreover, we extend the ALC refinement operator in DL-Learner by exploiting the syntax specification of AML and show significant better learning performance.

Mining the Web of Data with Metaqueries

The Web of Data uses the World Wide Web (WWW) infrastructure to represent and interrelate data sources. These sources are referred to as knowledge graphs (KGs) and correspond to huge collections of facts in the form of RDF triples. The analysis of data contained in a KG is propaedeutic to several crucial KG curation tasks, notably the automated completion of the graph, which pose several challenges due to the open and distributed environment of the WWW infrastructure. However, KG mining could take advantage of some useful meta-information about the data to be analyzed, for instance, the schema of the KG when available. In this paper, we resort to the notion of a metaquery, proposed in the 90s as a template for patterns one is interested to discover in a relational database. We propose to extend this notion to the novel context of the Web of Data, in particular to the case of KG mining. A distinguishing feature of metaquerying problems is the use of a second-order logic language. In this paper, we present a metaquery language based on second-order Description Logics but implementable with standard technologies underlying the Web of Data, and briefly describe mechanisms for answering such metaqueries in the context of interest. Keywords: Metaquerying · Knowledge Graphs · Rule Mining.

Hierarchical Representation Learning on Relational Data

Representation learning of knowledge graphs aims to project both entities and relations as vectors in a continuous low-dimensional space. Relation Hierarchical Structure (RHS), which is constructed by a generalization relationship named subRelationOf between relations, can improve the overall performance of knowledge representation learning. However, most of the existing methods ignore this critical information, and a straightforward way of considering RHS may have a negative effect on the embeddings and thus reduce the model performance. In this paper, we propose a novel method named TransRHS, which is able to incorporate RHS seamlessly into the embeddings. More specifically, TransRHS encodes each relation as a vector together with a relation-specific sphere in the same space. Our TransRHS employs the relative positions among the vectors and spheres to model the subRelationOf, which embodies the inherent generalization relationships among relations. We evaluate our model on two typical tasks, i.e., link prediction and triple classification. The experimental results show that our TransRHS model significantly outperforms all baselines on both tasks, which verifies that the RHS information is significant to representation learning of knowledge graphs, and TransRHS can effectively and efficiently fuse RHS into knowledge graph embeddings.

Explaining Black-box Classifiers with ILP – Empowering LIME with Aleph to Approximate Non-linear Decisions with Relational Rules

We propose an adaption of the explanation-generating system LIME. While LIME relies on sparse linear models, we explore how richer explanations can be generated. As application domain we use images which consist of a coarse representation of ancient graves. The graves are divided into two classes and can be characterised by meaningful features and relations. This domain was generated in analogy to a classic concept acquisition domain researched in psychology. Like LIME, our approach draws samples around a simplified representation of the instance to be explained. The samples are labelled by a generator – simulating a black-box classifier trained on the images. In contrast to LIME, we feed this information to the ILP system Aleph. We customised Aleph’s evaluation function to take into account the similarity of instances. We applied our approach to generate explanations for different variants of the ancient graves domain. We show that our explanations can involve richer knowledge thereby going beyond the expressiveness of sparse linear models

Was the Year 2000 a Leap Year? Step-wise Narrowing Theories with Metagol

Many people believe that every fourth year is a leap year. However, this rule is too general: year X is a leap year if X is divisible by 4 except if X is divisible by 100 except if X is divisible by 400. We call such a theory with alternating generalisation and specialisation a step-wise narrowed theory. We present and evaluate an extension to the ILP system Metagol which facilitates learning such theories. We enabled Metagol to learn over-general theories by allowing a limited number of false positives during learning. This variant is iteratively applied on a learning task. For each iteration after the first, positive examples are the false positives from the previous iteration and negative examples are the true positives from the previous iteration. Iteration continues until no more false positives are present. Then, the theories are combined to a single step-wise narrowed theory. We evaluate the usefulness of our approach in the leap year domain. We can show that our approach finds solutions with fewer clauses, higher accuracy, and in shorter time.

Using Binary Decision Diagrams to Enumerate Inductive Logic Programming Solutions

This paper proposes a method for efficiently enumerating all solutions of a given ILP problem. Inductive Logic Programming (ILP) is a machine learning technique that assumes that all data, background knowledge, and hypotheses can be represented by first-order logic. The solution of an ILP problem is called a hypothesis. Most ILP studies propose methods that find only one hypothesis per problem. In contrast, the proposed method uses Binary Decision Diagrams (BDDs) to enumerate all hypotheses of a given ILP problem. A BDD yields a very compact graph representation of a Boolean function. Once all hypotheses are enumerated, the user can select the preferred hypothesis or compare the hypotheses using an evaluation function. We propose an efficient recursive algorithm for constructing a BDD that represents the set of all hypotheses. In addition, we propose an efficient method to get the best hypothesis given an evaluation function. We empirically show the practicality of our approach on ILP problems using real data.

GPU-Accelerated Hypothesis Cover Set Testing for Learning in Logic

ILP learners are commonly implemented to consider sequentially each training example for each of the hypotheses tested. Computing the cover set of a hypothesis in this way is costly, and introduces a major bottleneck in the learning process. This computation can be implemented more efficiently through the use of data level parallelism. Here we propose a GPU-accelerated approach to this task for propositional logic and for a subset of first order logic. This approach can be used with one’s strategy of choice for the exploration of the hypothesis space. At present, the hypothesis language is limited to logic formulae using unary and binary predicates, such as those covered by certain types of description logic. The approach is tested on a commodity GPU and datasets of up to 200 million training examples, achieving run times of below 30ms per cover set computation.

Tensor-Based Abduction in Horn Propositional Programs

This paper proposes an algorithm for computing solutions of abductive Horn propositional tasks using third-order tensors. We first introduce the notion of explanatory operator, a single-step operation based on inverted implication, and prove that minimal abductive solutions of a given Horn propositional task can be correctly computed using this operator. We then provide a mapping of Horn propositional programs into third-order tensors, which builds upon recent work on matrix representation of Horn programs. We finally show how this mapping can be used to compute the explanatory operator by tensor multiplication

Learning Dynamics with Synchronous, Asynchronous and General Semantics

Learning from interpretation transition (LFIT) automatically constructs a model of the dynamics of a system from the observation of its state transitions. So far, the systems that LFIT handles are restricted to synchronous deterministic dynamics, i.e., all variables update their values at the same time and, for each state of the system, there is only one possible next state. However, other dynamics exist in the field of logical modeling, in particular the asynchronous semantics which is widely used to model biological systems. In this paper, we focus on a method that learns the dynamics of the system independently of its semantics. For this purpose, we propose a modeling of multi-valued systems as logic programs in which a rule represents what can occurs rather than what will occurs. This modeling allows us to represent non-determinism and to propose an extension of LFIT in the form of a semantics free algorithm to learn from discrete multi-valued transitions, regardless of their update schemes. We show through theoretical results that synchronous, asynchronous and general semantics are all captured by this method. Practical evaluation is performed on randomly generated systems and benchmarks from biological literature to study the scalabil-ity of this new algorithm regarding the three aforementioned semantics.

Using Reachability Properties of Logic Program for Revising Biological Models

Learning system dynamics from the observations of state transitions has many applications in bioinformatics. It can correspond to the identification of the mutual influence of genes and can help to understand their interactions. A model can be automatically learned from time series data by using methods like Learning from interpretation transition (LFIT). This method learns an exact model if all transitions of the systems are used as input. However, in real biological data, such complete data sets are usually not accessible and we have to learn a system with partial observations. Usually, biologists also provide with a priori knowledge about the system dynamics in the form of temporal properties. When building models, keeping critical properties valid is one of the major concerns and model checking plays a role in the verification of such desired properties. Our research aims at providing a model checking approach to revise logic programs thanks to temporal properties. In this paper, as a first step, we propose a method that can exploit reachability properties to fit such a model.

Schema mapping discovery from example data using ILP

We frame the task of schema mapping discovery for Data Exchange as an ILP problem to discuss resulting challenges and advantages. The challenges stem from the necessity of learning complex mappings in practice as we illustrate with simple examples.

In the next article, we will discuss ILP2019.