Inductive logic Programming 2019

Machine Learning Technology Artificial Intelligence Technology Natural Language Processing Technology Semantic Web Technology Ontology Technology Reasoning Technology Knowledge Information Technology Collecting AI Conference Papers Digital Transformation Technology

In the previous article, we discussed ILP2018. In this issue, we will discuss the 29th International Conference on Inductive Logic Programming, held in Plovdiv, Bulgaria, September 3-5, 2019.

Inductive logic programming (ILP) is a subfield of machine learning that relies on logic programming as a unified expression language for representing examples, background knowledge, and hypotheses. With its powerful expressive form based on first-order predicate logic, ILP provides an excellent vehicle for multi-relational learning and data mining.

Launched in 1991, the ILP conference series is the premier international forum for learning from structured or semi-structured relational data. Originally focused on the introduction of logic programs, over the years the scope of research has expanded significantly to include logic, multi-relational data mining, statistical relational learning, graph and tree mining, other learning (non – proposed) logic-based knowledge representation frameworks, statistical learning and other probabilistic Investigate the intersections of the approaches with the

Weight your words: the effect of different weighting schemes on Wordification performance

Relational data models are usually used by researchers and companies when a single-table model is not enough to describe their system. Then, when it comes to classification, there are mainly two options: apply the corresponding relational version of classification algorithms or use a propositionalization technique to transform the relational database into a single-table representation before classification. In this work, we evaluate a fast and simple propositionalization algorithm called Wordification. This technique uses the table name, attribute name and value to create a feature. Each feature is treated as a word and the instances of the database are represented by a Bag-Of-Words (BOW) model. Then, a weighting scheme is used to weight the features for each instance. The original implementation of Wordification only explored the TF-IDF, the term-frequency and the binary weighting schemes. However, works in the text classification and data mining fields show that the proper choice of weighting schemes can boost classification. Therefore, we empirically experimented different term weighting approaches with Wordification. Our results show that the right combination of weighting scheme and classification algorithm can significantly improve classification performance.

Logical reduction of metarules

Many forms of inductive logic programming (ILP) use emph{metarules}, second-order Horn clauses, to define the structure of learnable programs and thus the hypothesis space. Deciding which metarules to use for a given learning task is a major open problem and is a trade-off between efficiency and expressivity: the hypothesis space grows given more metarules, so we wish to use fewer metarules, but if we use too few metarules then we lose expressivity. In this paper, we study whether fragments of metarules can be logically reduced to minimal finite subsets. We consider two traditional forms of logical reduction: subsumption and entailment. We also consider a new reduction technique called emph{derivation reduction}, which is based on SLD-resolution. We compute reduced sets of metarules for fragments relevant to ILP and theoretically show whether these reduced sets are reductions for more general infinite fragments. We experimentally compare learning with reduced sets of metarules on three domains: Michalski trains, string transformations, and game rules. In general, derivation reduced sets of metarules outperforms subsumption and entailment reduced sets, both in terms of predictive accuracies and learning times.

Inductive general game playing

General game playing (GGP) is a framework for evaluating an agent’s general intelligence across a wide range of tasks. In the GGP competition, an agent is given the rules of a game (described as a logic program) that it has never seen before. The task is for the agent to play the game, thus generating game traces. The winner of the GGP competition is the agent that gets the best total score over all the games. In this paper, we invert this task: a learner is given game traces and the task is to learn the rules that could produce the traces. This problem is central to inductive general game playing (IGGP). We introduce a technique that automatically generates IGGP tasks from GGP games. We introduce an IGGP dataset which contains traces from 50 diverse games, such as Sudoku, Sokoban, and Checkers. We claim that IGGP is difficult for existing inductive logic programming (ILP) approaches. To support this claim, we evaluate existing ILP systems on our dataset. Our empirical results show that most of the games cannot be correctly learned by existing systems. The best performing system solves only 40% of the tasks perfectly. Our results suggest that IGGP poses many challenges to existing approaches. Furthermore, because we can automatically generate IGGP tasks from GGP games, our dataset will continue to grow with the GGP competition, as new games are added every year. We therefore think that the IGGP problem and dataset will be valuable for motivating and evaluating future research.

Machine Discovery of Comprehensible Strategies for Simple Games using Meta-Interpretive Learning

Recently, world-class human players have been outperformed in a number of complex two-person games (Go, Chess, Checkers) by Deep Reinforcement Learning systems. However, the data efficiency of the learning systems is unclear given that they appear to require far more training games to achieve such performance than any human player might experience in a lifetime. In addition, the resulting learned strategies are not in a form which can be communicated to human players. This contrasts to earlier research in Behavioural Cloning in which single-agent skills were machine learned in a symbolic language, facilitating their being taught to human beings. In this paper, we consider Machine Discovery of human-comprehensible strategies for simple two-person games (Noughts-and-Crosses and Hexapawn). One advantage of considering simple games is that there is a tractable approach to calculating minimax regret. We use these games to compare Cumulative Minimax Regret for variants of both standard and deep reinforcement learning against two variants of a new Meta-interpretive Learning system called MIGO. In our experiments, tested variants of both normal and deep reinforcement learning have consistently worse performance (higher cumulative minimax regret) than both variants of MIGO on Noughts-and-Crosses and Hexapawn. In addition, MIGO’s learned rules are relatively easy to comprehend, and are demonstrated to achieve significant transfer learning described in “Overview of Transfer Learning and Examples of Algorithms and Implementations” in both directions between Noughts-and-Crosses and Hexapawn.

Playgol: learning programs through play

Children learn though play. We introduce the analogous idea of learning programs through play. In this approach, a program induction system (the learner) is given a set of tasks and initial background knowledge. Before solving the tasks, the learner enters an unsupervised playing stage where it creates its own tasks to solve, tries to solve them, and saves any solutions (programs) to the background knowledge. After the playing stage is finished, the learner enters the supervised building stage where it tries to solve the user-supplied tasks and can reuse solutions learnt whilst playing. The idea is that playing allows the learner to discover reusable general programs on its own which can then help solve the user-supplied tasks. We claim that playing can improve learning performance. We show that playing can reduce the textual complexity of target concepts which in turn reduces the sample complexity of a learner. We implement our idea in Playgol, a new inductive logic programming system. We experimentally test our claim on two domains: robot planning and real-world string transformations. Our experimental results suggest that playing can substantially improve learning performance. We think that the idea of playing (or, more verbosely, unsupervised bootstrapping for supervised program induction) is an important contribution to the problem of developing program induction approaches that self-discover BK.

ILP Recommender System: Explainable and More

In this paper, we explore the use of ILP thoroughly in generating explainable, negative, group and context-aware recommendation. ILP provides recommendation rules in if-then logical format that allows us to form a clear and concise explanation to accompany the suggested items. It can indirectly derive negative rules which tell us not to recommend certain products to users. It also emphasizes the use of universal representations which enables us to suggest the items to a group of users who share the same interest. Additionally, ILP requires no re-training if new contexts (e.g., location, time and mood) are added to the system to generate context-aware recommendations (CARS), only predicates and settings are simply specified. In this paper, we also propose the explainability evaluation in terms of transparency by comparing the items/features appearing in the explanation with the features presented in the user’s review. The negative, group and dynamic recommendations can be evaluated using the standard measurement.

Pairwise Comparisons in a Logic-Based Recommender System

In this paper, we propose a recommender system using pair- wise comparisons as the main source of information in the user pref- erence elicitation process. We use a logic-based approach implemented in APARELL, an inductive learner modelling the user’s preferences in description logic. A within-subject preliminary user study with a large dataset from a real-world domain (car retail) was conducted to compare pairwise comparison interface to one using standard product list search. The results show the users’ preference for the interface based on pairwise comparisons, which has proven signifcantly better in a number of ways.

CONNER: A Concurrent ILP Learner in Description Logic

Machine Learning (ML) approaches can achieve impressive results, but many lack transparency or have difficulties handling data of high structural complexity. The class of ML known as Inductive Logic Programming (ILP) draws on the expressivity and rigour of subsets of First Order Logic to represent both data and models. When Description Logics (DL) are used, the approach can be applied directly to knowledge represented as ontologies. ILP output is a prime candidate for explainable artificial intelligence; the expense being computational complexity. We have recently demonstrated how a critical component of ILP learners in DL, namely, cover set testing, can be speeded up through the use of concurrent processing. Here we describe the first prototype of an ILP learner in DL that benefits from this use of concurrency. The result is a fast, scalable tool that can be applied directly to large ontologies.

On the relation between Loss Functions and T-Norms

Deep learning has been shown to achieve impressive results in several domains like computer vision and natural language processing. A key element of this success has been the development of new loss functions, like the popular cross-entropy loss described in “Overview of Cross-Entropy and Related Algorithms and Implementation Examples,”, which has been shown to provide faster convergence and to reduce the vanishing gradient problem in very deep structures. While the cross-entropy loss is usually justified from a probabilistic perspective, this paper shows an alternative and more direct interpretation of this loss in terms of t-norms and their associated generator functions, and derives a general relation between loss functions and t-norms. In particular, the presented work shows intriguing results leading to the development of a novel class of loss functions. These losses can be exploited in any supervised learning task and which could lead to faster convergence rates that the commonly employed cross-entropy loss.

Neural Networks for Relational Data

While deep networks have been enormously successful over the last decade, they rely on flat-feature vector representations, which makes them unsuitable for richly structured domains such as those arising in applications like social network analysis. Such domains rely on relational representations to capture complex relationships between entities and their attributes. Thus, we consider the problem of learning neural networks for relational data. We distinguish ourselves from current approaches that rely on expert hand-coded rules by learning relational random-walk-based features to capture local structural interactions and the resulting network architecture. We further exploit parameter tying of the network weights of the resulting relational neural network, where instances of the same type share parameters. Our experimental results across several standard relational data sets demonstrate the effectiveness of the proposed approach over multiple neural net baselines as well as state-of-the-art statistical relational models.

Towards an ILP Application in Machine Ethics

Machine Ethics is a newly emerging interdisciplinary field which is concerned with adding an ethical dimension to Artificial Intelligent (AI) agents. In this paper we address the problem of representing and acquiring rules of codes of ethics in the online customer service domain. The proposed solution approach relies on the non-monotonic features of Answer Set Programming (ASP) and applies ILP. The approach is illustrated by means of examples taken from the preliminary tests conducted with a couple of state-of-the-art ILP algorithms for learning ASP rules.

Learning to Detect Complex Events with Expert Advice

Systems for symbolic event recognition detect occurrences of events in streaming input using a set of event patterns in the form of temporal logical rules. Algorithms for online learning/revising such patterns should be capable of updating the current event pattern set without compromising the quality of the provided service, i.e. the system’s online predictive performance. Towards this, we present an approach based on Prediction with Expert Advice. The experts in our approach are logical rules representing event patterns, which are learnt online via a single-pass strategy. To handle the dynamic nature of the task, an Event Calculus-inspired prediction/event detection scheme allows to incorporate commonsense principles into the learning process. We present a preliminary empirical assessment with promising results.

Using ILP to Detect Anomalies in Pipelines

Simulated data generated from an accurate modelling tool can demonstrate real-life events. This approach mimics pipeline operation without the need for maintaining the original anomaly record that is already scarce in the pipeline industry. This synthetic data carries precise signatures and the shape of the curve depicts the type of alarm. Learning rules can be inferred from this parametric data and the examples are construed from the threshold levels. The issue is addressed by considering the method that lessens data handling and the associated complexity of the problem. Probabilistic ILPs can be the most appropriate candidate for classifying anomalies of this nature. A logic program addresses this issue by learning the parameters of a program given the structure (the rules), using the ability to incorporate probability in logic programming, interpreting the examples for target predicate, and refining background knowledge for the dissemination of discoveries. This synthetic data also develops a direct link with the ILP parameter learning for competing hypotheses. The modelling tool will receive feedback, check and tune the hypothesis until exclusivity is evolved. This will fall in the domain of closed-loop active learning [1]

Recent advances in learning description logic (DL) concepts usually employ a downward refinement operator for space traversing and hypotheses construction. However, theoretical research proved that ideal refinement operator does not exist for expressive DLs, including the language $A L C$

$A L C$

(RRHC) algorithm that selects a node for expansion by traversing the search tree in a hill climbing manner and rapidly restarts with one-step backtracking after each expansion. We provide an implementation of RRHC in the DL-Learner framework and compare its performance with CELOE using standard benchmarks.

Real world data are often noisy and fuzzy. Most traditional logical machine learning methods require the data to be first discretized or pre-processed before being able to produce useful output. Such short-coming often limits their application to real world data. On the other hand, neural networks are generally known to be robust against noisy data. However, a fully trained neural network does not provide easily understandable rules that can be used to understand the underlying model. In this paper, we propose a Differentiable Learning from Interpretation Transition ((delta )-LFIT) algorithm, that can simultaneously output logic programs fully explaining the state transitions, and also learn from data containing noise and error

Unsupervised Learning of Functional Groups for Computational Chemistry

Computer modelling now forms a standard part of chemists’ armoury. While some models are based on a combination of precise measurements and quantum theory, others try to link what knowledge there is available about a given chemical compound with its observed properties through Machine Learning (ML) models. Composition and structure are always the main aspects reflected in the training data, but ML approaches differ in the use of additional features signalling the presence of certain substructures that have been deemed of relevance. At one end, connectionist approaches of the ‘Deep Learning’ (DL) kind rely on multiple layers of neurons to combine the original aspects of the representation through non-linear functions into features of increasingly higher level of complexity and abstraction; at the other extreme, more traditional statistical ML approaches rely on the explicit provision of such features identified at an earlier stage, and added to a given data set through pre-processing

LazyBum: Decision tree learning using lazy propositionalization

Propositionalization is the process of summarizing relational data into a tabular (attribute-value) format. The resulting table can next be used by any propositional learner. This approach makes it possible to apply a wide variety of learning methods to relational data. However, the transformation from relational to propositional format is generally not lossless: different relational structures may be mapped onto the same feature vector. At the same time, features may be introduced that are not needed for the learning task at hand. In general, it is hard to define a feature space that contains all and only those features that are needed for the learning task. This paper presents LazyBum, a system that can be considered a lazy version of the recently proposed OneBM method for propositionalization. LazyBum interleaves OneBM’s feature construction method with a decision tree learner. This learner both uses and guides the propositionalization process. It indicates when and where to look for new features. This approach is similar to what has elsewhere been called dynamic propositionalization. In an experimental comparison with the original OneBM and with two other recently proposed propositionalization methods (nFOIL and MODL, which respectively perform dynamic and static propositionalization), LazyBum achieves a comparable accuracy with a lower execution time on most of the datasets

A New Algorithm for Computing Least Generalization of a Set of Atoms

This paper provides a new algorithm of computing a least generalization of a set of atoms. Our algorithm is based on the notion of anti-combination that is the inverse substitution of a combined substitution. In contrast to an anti-unification algorithm that computes a least generalization of two atoms, anti-combination can compute a least generalization of (more than two) atoms in parallel. We evaluate the proposed algorithm using randomly generated data and show that anti-combination outperforms the iterative application of an anti-unification algorithm in general.

Probabilistic Inductive Constraint Logic

Probabilistic logical models deal effectively with uncertain relations and entities typical of many real world domains. In the field of probabilistic logic programming usually the aim is to learn these kinds of models to predict specific atoms or predicates of the domain, called target atoms/predicates. However, it might also be useful to learn classifiers for interpretations as a whole: to this end, we consider the models produced by the inductive constraint logic system, represented by sets of integrity constraints, and we propose a probabilistic version of them. Each integrity constraint is annotated with a probability, and the resulting probabilistic logical constraint model assigns a probability of being positive to interpretations. To learn both the structure and the parameters of such probabilistic models we propose the system PASCAL for “probabilistic inductive constraint logic”. Parameter learning can be performed using gradient descent or L-BFGS described in “Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method“. PASCAL has been tested on 11 datasets and compared with a few statistical relational systems and a system that builds relational decision trees (TILDE): we demonstrate that this system achieves better or comparable results in terms of area under the precision–recall and receiver operating characteristic curves, in a comparable execution time.

Learning Probabilistic Logic Programs over Continuous Data

The field of statistical relational learning aims at unifying logic and probability to reason and learn from data. Perhaps the most successful paradigm in the field is probabilistic logic programming: the enabling of stochastic primitives in logic programming, which is now increasingly seen to provide a declarative background to complex machine learning applications. While many systems offer inference capabilities, the more significant challenge is that of learning meaningful and interpretable symbolic representations from data. In that regard, inductive logic programming and related techniques have paved much of the way for the last few decades.
Unfortunately, a major limitation of this exciting landscape is that much of the work is limited to finite-domain discrete probability distributions. Recently, a handful of systems have been extended to represent and perform inference with continuous distributions. The problem, of course, is that classical solutions for inference are either restricted to well-known parametric families (e.g., Gaussians) or resort to sampling strategies that provide correct answers only in the limit. When it comes to learning, moreover, inducing representations remains entirely open, other than “data-fitting” solutions that force-fit points to aforementioned parametric families.
In this paper, we take the first steps towards inducing probabilistic logic programs for continuous and mixed discrete-continuous data, without being pigeon-holed to a fixed set of distribution families. Our key insight is to leverage techniques from piecewise polynomial function approximation theory, yielding a principled way to learn and compositionally construct density functions. We test the framework and discuss the learned representations

Learning Higher-Order logic Programs

A key feature of inductive logic programming (ILP) is its ability to learn first-order programs, which are intrinsically more expressive than propositional programs. In this paper, we introduce techniques to learn higher-order programs. Specifically, we extend meta-interpretive learning (MIL) to support learning higher-order programs by allowing for emph{higher-order definitions} to be used as background knowledge. Our theoretical results show that learning higher-order programs, rather than first-order programs, can reduce the textual complexity required to express programs which in turn reduces the size of the hypothesis space and sample complexity. We implement our idea in two new MIL systems: the Prolog system namea{} and the ASP system nameb{}. Both systems support learning higher-order programs and higher-order predicate invention, such as inventing functions for tw{map/3} and conditions for tw{filter/3}. We conduct experiments on four domains (robot strategies, chess playing, list transformations, and string decryption) that compare learning first-order and higher-order programs. Our experimental results support our theoretical claims and show that, compared to learning first-order programs, learning higher-order programs can significantly improve predictive accuracies and reduce learning times

Typed meta-interpretive learning of logic programs

Meta-interpretive learning (MIL) is a form of inductive logic programming that learns logic programs from background knowledge and examples. We claim that adding types to MIL can improve learning performance. We show that type checking can reduce the MIL hypothesis space by a cubic factor. We introduce two typed MIL systems: Metagol(_{T}) and HEXMIL(_{T}), implemented in Prolog and Answer Set Programming (ASP), respectively. Both systems support polymorphic types and can infer the types of invented predicates. Our experimental results show that types can substantially reduce learning times.

Towards meta-interpretive learning of programming language semantics

We introduce a new application for inductive logic programming: learning the semantics of programming languages from example evaluations. In this short paper, we explored a simplified task in this domain using the Metagol meta-interpretive learning system. We highlighted the challenging aspects of this scenario, including abstracting over function symbols, nonterminating examples, and learning non-observed predicates, and proposed extensions to Metagol helpful for overcoming these challenges, which may prove useful in other domains.

Louise: A Meta-Interpretive Learner for Efficient Multi-clause Learning of Logic Programs.

We present Louise, a new Meta-Interpretive Learner that performs efficient multi-clause learning, implemented in Prolog. Louise is efficient enough to learn programs that are too large to be learned with the current state-of-the-art MIL system, Metagol. Louise learns by first constructing the most general program in the hypothesis space of a MIL problem and then reducing this “Top program” by Plotkin’s program reduction algorithm. In this extended abstract we describe Louise’s learning approach and experimentally demonstrate that Louise can learn programs that are too large to be learned by our implementation of Metagol, Thelma.

Constructing Generative Logical Models for Optimisation Problems using Domain Knowledge

In this paper we seek to identify data instances with a low value of some objective (or cost) function. Normally posed as optimisation problems, our interest is in problems that have the following characteristics: (a) optimal, or even near-optimal solutions are very rare; (b) it is expensive to obtain the value of the objective function for large numbers of data instances; and (c) there is domain knowledge in the form of experience, rules-of-thumb, constraints and the like, which is difficult to translate into the usual constraints for numerical optimisation procedures. Here we investigate the use of Inductive Logic Programming (ILP) to construct models within a procedure that progressively attempts to increase the number of near-optimal solutions. Using ILP in this manner requires a change in focus from discriminatory models (the usual staple for ILP) to generative models. Using controlled datasets, we investigate the use of probability-sampling of solutions based on the estimated cost of clauses found using ILP. Specifically, we compare the results obtained against: (a) simple random sampling; and (b) generative deep network models that use a low-level encoding and automatically construct higher-level features. Our results suggest: (1) Against each of the alternatives, probability-sampling from ILP-constructed models contain more near-optimal solutions; (2) The key to the effectiveness of ILP-constructed models is the availability of domain knowledge. We also demonstrate the use of ILP in this manner on two real-world problems from the area of drug-design (predicting solubility and binding affinity), using domain knowledge of chemical ring structures and functional groups. Taken together, our results suggest that generative modelling using ILP can be very effective for optimisation problems where: (a) the number of training instances to be used is restricted, and (b) there is domain knowledge relevant to low-cost solutions.

Detecting “fake news” before it was even written

Given the recent proliferation of disinformation online, there has been growing research interest in automatically debunking rumors, false claims, and ‘fake news’. A number of fact-checking initiatives have been launched, but the whole enterprise remains in a state of crisis: by the time a claim is finally fact-checked, it could have reached millions of users. An arguably more promising direction is to focus on fact-checking entire news outlets, which can be done in advance.

The motion of a human walker carries information about multiple attributes of the individual with many potential applications. Along with manifesting biologically intrinsic properties and personally identifiable attributes, one’s gait also changes dynamically based on one’s emotions [1] and state of health [2]. Of particular interest to the medical community is the manifestation in gait of neurodegenerative diseases, such as Parkinson’s [3] and musculoskeletal disorders, e.g. osteoarthritis [4]. The corresponding changes are usually depicted as a deviation of established clinical gait metrics from normative ranges. In clinical practice, specific gait metrics [5] that describe the spatio-temporal aspects of gait per gait cycle are used for disease diagnosis, rehabilitation progress monitoring and evaluation of the physiotherapy intervention

In the next article, we will discuss ILP2021, which was skipped a year due to the coronal pandemic.