Inductive logic Programming 2017 Papers

Machine Learning Technology Artificial Intelligence Technology Natural Language Processing Technology Semantic Web Technology Ontology Technology Reasoning Technology Knowledge Information Technology Collecting AI Conference Papers Digital Transformation Technology

In the previous article, we discussed ILP2016. In this issue, we describe the 27th International Conference on Inductive Logic Programming, ILP2017, held in Orléans, France, in September 2017.

The 12 full papers presented were carefully reviewed and selected from a large number of submissions.
Inductive logic programming (ILP) is a subfield of machine learning that originally relied on logic programming as a unified representation language for expressing examples, background knowledge, and hypotheses ILP is a powerful representation format based on first-order logic that allows for multi-stage learning, data mining, and more generally Provides an excellent means for learning from structured data.

Contents include: robot control, knowledge base and medicine, statistical machine learning in image recognition, relational learning, logic-based event recognition systems, the problem of learning Boltzmann machine classifiers from relational data, parallel inductive logic programming, learning from interpretative transitions (LFIT), Lifted Relational Neural Networks (LRNN), and improvements to WOrd2Vec are described. Details are given below.

Relational Affordance Learning for Task-Dependent Robot Grasping

Robot grasping depends on the specific manipulation scenario: the object, its properties, task and grasp constraints. Object-task affordances facilitate semantic reasoning about pre-grasp configurations with respect to the intended tasks, favoring good grasps. We employ probabilistic rule learning to recover such object-task affordances for task-dependent grasping from realistic video data.

Positive and Unlabeled Relational Classification Through Label Frequency Estimation

Many applications, such as knowledge base completion and automated diagnosis of patients, only have access to positive examples but lack negative examples which are required by standard relational learning techniques and suffer under the closed-world assumption. The corresponding propositional problem is known as Positive and Unlabeled (PU) learning. In this field, it is known that using the label frequency (the fraction of true positive examples that are labeled) makes learning easier. This notion has not been explored yet in the relational domain. The goal of this work is twofold: (1) to explore if using the label frequency would also be useful when working with relational data and (2) to propose a method for estimating the label frequency from relational positive and unlabeled data. Our experiments confirm the usefulness of knowing the label frequency and of our estimate.

On Applying Probabilistic Logic Programming to Breast Cancer Data

Medical data is particularly interesting as a subject for relational data mining due to the complex interactions which exist between different entities. Furthermore, the ambiguity of medical imaging causes interpretation to be complex and error-prone, and thus particularly amenable to improvement through automated decision support. Probabilistic Inductive Logic Programming (PILP) is a particularly well-suited tool for this task, since it makes it possible to combine the relational nature of this field with the ambiguity inherent in human interpretation of medical imaging. This work presents a PILP setting for breast cancer data, where several clinical and demographic variables were collected retrospectively, and new probabilistic variables and rules reflecting domain knowledge were introduced. A PILP predictive model was built automatically from this data and experiments show that it can not only match the predictions of a team of experts in the area, but also consistently reduce the error rate of malignancy prediction, when compared to other non-relational techniques.

Logical Vision: One-Shot Meta-Interpretive Learning from Real Images

Statistical machine learning is widely used in image classification. However, most techniques (1) require many images to achieve high accuracy and (2) do not provide support for reasoning below the level of classification, and so are unable to support secondary reasoning, such as the existence and position of light sources and other objects outside the image. In recent work an Inductive Logic Programming approach called Logical Vision (LV) was shown to overcome some of these limitations. LV uses Meta-Interpretive Learning combined with low-level extraction of high-contrast points sampled from the image to learn recursive logic programs describing the image. This paper extends LV by using (a) richer background knowledge enabling secondary reasoning from raw images, such as light reflection that can itself be learned and used for resolving visual ambiguities, which cannot be easily modelled using statistical approaches, (b) a wider class of background models representing classical 2D shapes such as circles and ellipses, (c) primitive-level statistical estimators to handle noise in real images. Our results indicate that the new noise-robust version of LV is able to handle secondary reasoning task in real images with few data, which is very similar to scientific discovery process of humans. Specifically, it uses a single example (i.e. one-shot LV) converges to an accuracy at least comparable to thirty-shot statistical machine learner on the prediction of hidden light sources. Moreover, we demonstrate that the learned theory can be used to identify ambiguities in the convexity/concavity of objects such as craters.

Demystifying Relational Latent Representations

Latent features learned by deep learning approaches have proven to be a powerful tool for machine learning. They serve as a data abstraction that makes learning easier by capturing regularities in data explicitly. Their benefits motivated their adaptation to the relational learning context. In our previous work, we introduce an approach that learns relational latent features by means of clustering instances and their relations. The major drawback of latent representations is that they are often black-box and difficult to interpret. This work addresses these issues and shows that (1) latent features created by clustering are interpretable and capture interesting properties of data; (2) they identify local regions of instances that match well with the label, which partially explains their benefit; and (3) although the number of latent features generated by this approach is large, often many of them are highly redundant and can be removed without hurting performance much.

Parallel Online Learning of Event Definitions

Logic-based event recognition systems infer occurrences of events in time using a set of event definitions in the form of first-order rules. The Event Calculus is a temporal logic that has been used as a basis in event recognition applications, providing among others, direct connections to machine learning, via Inductive Logic Programming (ILP). OLED is a recently proposed ILP system that learns event definitions in the form of Event Calculus theories, in a single pass over a data stream. In this work we present a version of OLED that allows for parallel, online learning. We evaluate our approach on a benchmark activity recognition dataset and show that we can reduce training times, while achieving super-linear speed-ups on some occasions.

Relational Restricted Boltzmann Machines: A Probabilistic Logic Learning Approach

We consider the problem of learning Boltzmann machine classifiers from relational data. Our goal is to extend the deep belief framework of RBMs to statistical relational models. This allows one to exploit the feature hierarchies and the non-linearity inherent in RBMs over the rich representations used in statistical relational learning (SRL). Specifically, we use lifted random walks to generate features for predicates that are then used to construct the observed features in the RBM in a manner similar to Markov Logic Networks. We show empirically that this method of constructing an RBM is comparable or better than the state-of-the-art probabilistic relational learning algorithms on six relational domains.

Parallel Inductive Logic Programming System for Superlinear Speedup

In this study, we improve our parallel inductive logic programming (ILP) system to enable superlinear speedup. This improvement redesigns several features of our ILP learning system and parallel mechanism. The redesigned ILP learning system searches and gathers all rules that have the same evaluation. The redesigned parallel mechanism adds a communication protocol for sharing the evaluation of the identified rules, thereby realizing superlinear speedup.

Inductive Learning from State Transitions over Continuous Domains

Learning from interpretation transition (LFIT) automatically constructs a model of the dynamics of a system from the observation of its state transitions. So far, the systems that LFIT handles are restricted to discrete variables or suppose a discretization of continuous data. However, when working with real data, the discretization choices are critical for the quality of the model learned by LFIT. In this paper, we focus on a method that learns the dynamics of the system directly from continuous time-series data. For this purpose, we propose a modeling of continuous dynamics by logic programs composed of rules whose conditions and conclusions represent continuums of values.

Stacked Structure Learning for Lifted Relational Neural Networks

Lifted Relational Neural Networks (LRNNs) describe relational domains using weighted first-order rules which act as templates for constructing feed-forward neural networks. While previous work has shown that using LRNNs can lead to state-of-the-art results in various ILP tasks, these results depended on hand-crafted rules. In this paper, we extend the framework of LRNNs with structure learning, thus enabling a fully automated learning process. Similarly to many ILP methods, our structure learning algorithm proceeds in an iterative fashion by top-down searching through the hypothesis space of all possible Horn clauses, considering the predicates that occur in the training examples as well as invented soft concepts entailed by the best weighted rules found so far. In the experiments, we demonstrate the ability to automatically induce useful hierarchical soft concepts leading to deep LRNNs with a competitive predictive power.

Pruning Hypothesis Spaces Using Learned Domain Theories

We present a method to prune hypothesis spaces in the context of inductive logic programming. The main strategy of our method consists in removing hypotheses that are equivalent to already considered hypotheses. The distinguishing feature of our method is that we use learned domain theories to check for equivalence, in contrast to existing approaches which only prune isomorphic hypotheses. Specifically, we use such learned domain theories to saturate hypotheses and then check if these saturations are isomorphic. While conceptually simple, we experimentally show that the resulting pruning strategy can be surprisingly effective in reducing both computation time and memory consumption when searching for long clauses, compared to approaches that only consider isomorphism.

An Investigation into the Role of Domain-Knowledge on the Use of Embeddings

Computing similarity in high-dimensional vector spaces is a long-standing problem that has recently seen significant progress with the invention of the word2vec algorithm. Usually, it has been found that using an embedded representation results in much better performance for the task being addressed. It is not known whether embeddings can similarly improve performance with data of the kind considered by Inductive Logic Programming (ILP), in which data apparently dissimilar on the surface, can be similar to each other given domain (background) knowledge. In this paper, using several ILP classification benchmarks, we investigate if embedded representations are similarly helpful for problems where there is sufficient amounts of background knowledge. We use tasks for which we have domain expertise about the relevance of background knowledge available and consider two subsets of background predicates (“sufficient” and “insufficient”). For each subset, we obtain a baseline representation consisting of Boolean-valued relational features. Next, a vector embedding specifically designed for classification is obtained. Finally, we examine the predictive performance of widely-used classification methods with and without the embedded representation. With sufficient background knowledge we find no statistical evidence for an improved performance with an embedded representation. With insufficient background knowledge, our results provide empirical evidence that for the specific case of using deep networks, an embedded representation could be useful.

In the next article, we will discuss ILP2018.