Inductive logic programming 2020-2021 Papers

Machine Learning Technology Artificial Intelligence Technology Natural Language Processing Technology Semantic Web Technology Ontology Technology Reasoning Technology Knowledge Information Technology Collecting AI Conference Papers Digital Transformation Technology Navigation of this Blog

In the previous article, we discussed the ILP 2019. This time we will discuss the ILP 2021, which was skipped a year due to the corona pandemic.

Inductive logic programming (ILP) is a branch of machine learning that focuses on learning logical representations from relational data.The ILP conference series was started in 1991 and is the leading international forum on learning from structured or semi-structured relational data, multi-relational learning, and data mining. Initially focused on induction of logic programs, over the years it has greatly expanded its research horizons to include all aspects of logic learning, statistical relational learning, graph and tree mining, learning other (non-propositional) logic-based knowledge representation frameworks, exploring intersections with statistical learning and other probabilistic approaches, etc. Submissions are welcome.

Learning Explanations for Biological Feedback with Delays Using an Event Calculus

We propose the identification of feedback mechanisms in biological systems by learning logical rules in R. Thomas’ Kinetic Logic (Thomas and D’Ari in Biological feedback. CRC Press, 1990). The principal advantages claimed for Kinetic Logic are that it captures an important class of regulatory networks at an appropriate level of precision, and that the representation is close to that used routinely by biologists, with a well-understood relationship to a differential description. In this paper we present a formalisation of Kinetic Logic as a labelled transition system and provide a provably correct implementation in a modified form of the Event Calculus. The behaviour of a system is then a logical consequence of the core-axioms of a (modified) Event Calculus C, the axioms K implementing Kinetic Logic and the axioms H describing the system. This formulation allows us to specify system identification in the manner adopted in Inductive Logic Programming (ILP), namely, given C, K, system behaviour S and possibly some additional domain-knowledge B, find H s.t. $B \land C \land K \land H ⊨ S$

$B \land C \land K \land H ⊨ S$

. Identifying a suitable Kinetic Logic hypothesis requires the simultaneous identification of definite clauses for: (a) logical definitions relating the occurrence of events to values of fluents; (b) delays in changes of the values of fluents arising from the occurrence of events; and possibly (c) exceptions to changes in fluent values, arising from asynchronous behaviour inherent to the system. We use a standard ILP engine for (a), and special-purpose abduction procedures for (b) and (c). We demonstrate this combination of induction and abduction on several canonical feedback patterns described by Thomas, and to identify the regulatory mechanism in two well-known biological problems (immune-response and phage-infection).

Learning Any Memoryless Discrete Semantics for Dynamical Systems Represented by Logic Programs

Learning from interpretation transition (LFIT) automatically constructs a model of the dynamics of a system from the observation of its state transitions. So far the systems that LFIT handled were mainly restricted to synchronous deterministic dynamics. However, other dynamics exist in the field of logical modeling, in particular the asynchronous semantics which is widely used to model biological systems. In this paper, we propose a modeling of discrete memory-less multi-valued dynamic systems as logic programs in which a rule represents what can occur rather than what will occur. This modeling allows us to represent non-determinism and to propose an extension of LFIT to learn regardless of the update schemes, allowing to capture a large range of semantics. We also propose a second algorithm which is able to learn a whole system dynamics, including its semantics, in the form of a single propositional logic program with constraints. We show through theoretical results the correctness of our approaches. Practical evaluation is performed on benchmarks from biological literature.

Learning from Interpretation Transition Using Differentiable Logic Programming Semantics

The combination of learning and reasoning is an essential and challenging topic in neuro-symbolic research. Differentiable inductive logic programming is a technique for learning a symbolic knowledge representation from either complete, mislabeled, or incomplete observed facts using neural networks. In this paper, we propose a novel differentiable inductive logic programming system called differentiable learning from interpretation transition (D-LFIT) for learning logic programs through the proposed embeddings of logic programs, neural networks, optimization algorithms, and an adapted algebraic method to compute the logic program semantics. The proposed model has several characteristics, including a small number of parameters, the ability to generate logic programs in a curriculum-learning setting, and linear time complexity for the extraction of trained neural networks. The well-known bottom clause positionalization algorithm is incorporated when the proposed system learns from relational datasets. We compare our model with NN-LFIT, which extracts propositional logic rules from retuned connected networks, the highly accurate rule learner RIPPER, the purely symbolic LFIT system LF1T, and CILP++, which integrates neural networks and the propositionalization method to handle first-order logic knowledge. From the experimental results, we conclude that D-LFIT yields comparable accuracy with respect to the baselines when given complete, incomplete, and mislabeled data. Our experimental results indicate that D-LFIT not only learns symbolic logic programs quickly and precisely but also performs robustly when processing mislabeled and incomplete datasets.

Learning and Revising Dynamic Temporal Theories in the Full Discrete Event Calculus

This paper presents the first automatic method for learning and revising dynamic temporal theories in the full-fledged Discrete Event Calculus (DEC), where fluents may be temporarily released from the law of inertia and subject to qualitative or quantitative domain laws. This is done by proposing a reformulation of the DEC, called the eXploratory Event Calculus (XEC), which can be more efficiently handled by state-of-the-art answer set solvers, and which supports a range of different logical semantics and policy options for resolving conflicts relating to the truth value or release status of fluents. The paper shows how XEC outperforms DEC on standard reasoning benchmarks, and how it can be used with an ILP system XHAIL to provide the first proof-of-principle demonstration of theory learning and revision in the full-featured DEC.

Online Learning of Logic-Based Neural Network Structures

In this paper, we present two online structure learning algorithms for NeuralLog, NeuralLog+OSLR and NeuralLog+OMIL. NeuralLog is a system that compiles first-order logic programs into neural networks. Both learning algorithms are based on Online Structure Learner by Revision (OSLR). NeuralLog+OSLR is a port of OSLR to use NeuralLog as inference engine; while NeuralLog+OMIL uses the underlying mechanism from OSLR, but with a revision operator based on Meta-Interpretive Learning. We compared both systems with OSLR and RDN-Boost on link prediction in three different datasets: Cora, UMLS and UWCSE. Our experiments showed that NeuralLog+OMIL outperforms both the compared systems on three of the four target relations from the Cora dataset and in the UMLS dataset, while both NeuralLog+OSLR and NeuralLog+OMIL outperform OSLR and RDN-Boost on the UWCSE, assuming a good initial theory is provided.

Beneficial and Harmful Explanatory Machine Learning

Given the recent successes of Deep Learning in AI there has been increased interest in the role and need for explanations in machine learned theories. A distinct notion in this context is that of Michie’s definition of Ultra-Strong Machine Learning (USML). USML is demonstrated by a measurable increase in human performance of a task following provision to the human of a symbolic machine learned theory for task performance. A recent paper demonstrates the beneficial effect of a machine learned logic theory for a classification task, yet no existing work to our knowledge has examined the potential harmfulness of machine’s involvement for human comprehension during learning. This paper investigates the explanatory effects of a machine learned theory in the context of simple two person games and proposes a framework for identifying the harmfulness of machine explanations based on the Cognitive Science literature. The approach involves a cognitive window consisting of two quantifiable bounds and it is supported by empirical evidence collected from human trials. Our quantitative and qualitative results indicate that human learning aided by a symbolic machine learned theory which satisfies a cognitive window has achieved significantly higher performance than human self learning. Results also demonstrate that human learning aided by a symbolic machine learned theory that fails to satisfy this window leads to significantly worse performance than unaided human learning.

Top Program Construction and Reduction for Polynomial Time Meta-Interpretive Learning

Meta-Interpretive Learners, like most ILP systems, learn by searching for a correct hypothesis in the hypothesis space, the powerset of all constructible clauses. We show how this exponentially-growing search can be replaced by the construction of a Top program: the set of clauses in all correct hypotheses that is itself a correct hypothesis. We give an algorithm for Top program construction and show that it constructs a correct Top program in polynomial time and from a finite number of examples. We implement our algorithm in Prolog as the basis of a new MIL system, Louise, that constructs a Top program and then reduces it by removing redundant clauses. We compare Louise to the state-of-the-art search-based MIL system Metagol in experiments on grid world navigation, graph connectedness and grammar learning datasets and find that Louise improves on Metagol’s predictive accuracy when the hypothesis space and the target theory are both large, or when the hypothesis space does not include a correct hypothesis because of “classification noise” in the form of mislabelled examples. When the hypothesis space or the target theory are small, Louise and Metagol perform equally well.

Generating Contrastive Explanations for Inductive Logic Programming Based on a Near Miss Approach

In recent research, human-understandable explanations of machine learning models have received a lot of attention. Often explanations are given in form of model simplifications or visualizations. However, as shown in cognitive science as well as in early AI research, concept understanding can also be improved by the alignment of a given instance for a concept with a similar counterexample. Contrasting a given instance with a structurally similar example which does not belong to the concept highlights what characteristics are necessary for concept membership. Such near misses have been proposed by Winston (1970) as efficient guidance for learning in relational domains. We introduce an explanation generation algorithm for relational concepts learned with Inductive Logic Programming (textsc{GeNME}). The algorithm identifies near miss examples from a given set of instances and ranks these examples by their degree of closeness to a specific positive instance. A modified rule which covers the near miss but not the original instance is given as an explanation. We illustrate textsc{GeNME} with the well known family domain consisting of kinship relations, the visual relational Winston arches domain and a real-world domain dealing with file management. We also present a psychological experiment comparing human preferences of rule-based, example-based, and near miss explanations in the family and the arches domains

Human-like rule learning from images using one-shot hypothesis derivation

Unlike most computer vision approaches, which depend on hundreds or thousands of training images, humans can typically learn from a single visual example. Humans achieve this ability using background knowledge. Rule-based machine learning approaches such as Inductive Logic Programming (ILP) provide a framework for incorporating domain specific background knowledge. These approaches have the potential for human-like learning from small data or even one-shot learning , i.e. learning from a single positive example. By contrast, statistics based computer vision algorithms, including Deep Learning, have no general mechanisms for incorporating background knowledge. This paper presents an approach for one-shot rule learning called One-Shot Hypothesis Derivation (OSHD) based on using a logic program declarative bias. We apply this approach to two challenging human-like computer vision tasks: 1) Malayalam character recognition and 2) neurological diagnosis using retinal images. We compare our results with a state-of-the-art Deep Learning approach, called Siamese Network, developed for one-shot learning. The results suggest that our approach can generate human-understandable rules and outperforms the deep learning approach with a significantly higher average predictive accuracy

Machine learning of microbial interactions using Abductive ILP and Hypothesis Frequency/Compression Estimation

Interaction between species in microbial communities plays an important role in the functioning of all ecosystems, from cropland soils to human gut microbiota. Many statistical approaches have been proposed to infer these interactions from microbial abundance information. However, these statistical approaches have no general mechanisms for incorporating existing ecological knowledge in the inference process. We propose an Abductive/Inductive Logic Programming (A/ILP) framework to infer microbial interactions from microbial abundance data, by including logical descriptions of different types of interaction as background knowledge in the learning. This framework also includes a new mechanism for estimating the probability of each interaction based on the frequency and compression of hypotheses computed during the abduction process. This is then used to identify real interactions using a bootstrapping, re-sampling procedure. We evaluate our proposed framework on simulated data previously used to benchmark statistical interaction inference tools. Our approach has comparable accuracy to SparCC, which is one of the state-of-the-art statistical interaction inference algorithms, but with the the advantage of including ecological background knowledge. Our proposed framework opens up the opportunity of inferring ecological interaction information from diverse ecosystems that currently cannot be studied using other methods.

Symbolic DNN-Tuner

The application of deep learning models to increasingly complex contexts has led to a rise in the complexity of the models themselves. Due to this, there is an increase in the number of hyper-parameters (HPs) to be set and Hyper-Parameter Optimization (HPO) algorithms occupy a fundamental role in deep learning. Bayesian Optimization (BO) is the state-of-the-art of HPO for deep learning models. BO keeps track of past results and uses them to build a probabilistic model, building a probability density of HPs. This work aims to improve BO applied to Deep Neural Networks (DNNs) by an analysis of the results of the network on training and validation sets. This analysis is obtained by applying symbolic tuning rules, implemented in Probabilistic Logic Programming (PLP). The resulting system, called Symbolic DNN-Tuner, logically evaluates the results obtained from the training and the validation phase and, by applying symbolic tuning rules, fixes the network architecture, and its HPs, leading to improved performance. In this paper, we present the general system and its implementation. We also show its graphical interface and a simple example of execution.

Inclusion of Domain-Knowledge into GNNs using Mode-Directed Inverse Entailment

We present a general technique for constructing Graph Neural Networks (GNNs) capable of using multi-relational domain knowledge. The technique is based on mode-directed inverse entailment (MDIE) developed in Inductive Logic Programming (ILP). Given a data instance eB⊥B(e)Be⊥B(e) by a “bottom-graph” that can be converted into a form suitable for GNN implementations. This transformation allows a principled way of incorporating generic background knowledge into GNNs: we use the term `BotGNN’ for this form of graph neural networks. For several GNN variants, using real-world datasets with substantial background knowledge, we show that BotGNNs perform significantly better than both GNNs without background knowledge and a recently proposed simplified technique for including domain knowledge into GNNs. We also provide experimental evidence comparing BotGNNs favourably to multi-layer perceptrons (MLPs) that use features representing a “propositionalised” form of the background knowledge; and BotGNNs to a standard ILP based on the use of most-specific clauses. Taken together, these results point to BotGNNs as capable of combining the computational efficacy of GNNs with the representational versatility of ILP.

Using Domain-Knowledge to Assist Lead Discovery in Early-Stage Drug Design

We are interested in generating new small molecules which could act as inhibitors of a biological target, when there is limited prior information on target-specific inhibitors. This form of drug-design is assuming increasing importance with the advent of new disease threats for which known chemicals only provide limited information about target inhibition. In this paper, we propose the combined use of deep neural networks and Inductive Logic Programming (ILP) that allows the use of symbolic domain-knowledge (B) to explore the large space of possible molecules. Assuming molecules and their activities to be instances of random variables X and Y, the problem is to draw instances from the conditional distribution of X, given Y, B (D_X|Y,B). We decompose this into the constituent parts of obtaining the distributions D_X|B and D_Y|X,B, and describe the design and implementation of models to approximate the distributions. The design consists of generators (to approximate D_X|B and D_X|Y,B) and a discriminator (to approximate D_Y|X,B). We investigate our approach using the well-studied problem of inhibitors for the Janus kinase (JAK) class of proteins. We assume first that if no data on inhibitors are available for a target protein (JAK2), but a small numbers of inhibitors are known for homologous proteins (JAK1, JAK3 and TYK2). We show that the inclusion of relational domain-knowledge results in a potentially more effective generator of inhibitors than simple random sampling from the space of molecules or a generator without access to symbolic relations. The results suggest a way of combining symbolic domain-knowledge and deep generative models to constrain the exploration of the chemical space of molecules, when there is limited information on target-inhibitors. We also show how samples from the conditional generator can be used to identify potentially novel target inhibitors.

Learning Logic Programs Using Neural Networks by Exploiting Symbolic Invariance

Learning from Interpretation Transition (LFIT) is an unsupervised learning algorithm which learns the dynamics just by observing state transitions. LFIT algorithms have mainly been implemented in the symbolic method, but they are not robust to noisy or missing data. Recently, research works combining logical operations with neural networks are receiving a lot of attention, with most works taking an extraction based approach where a single neural network model is trained to solve the problem, followed by extracting a logic model from the neural network model. However most research work suffer from the combinatorial explosion problem when trying to scale up to solve larger problems. In particular a lot of the invariance that hold in the symbolic world are not getting utilized in the neural network field. In this work, we present a model that exploits symbolic invariance in our problem. We show that our model is able to scale up to larger tasks than previous work.

Synthetic Datasets and Evaluation Tools for Inductive Neural Reasoning

Logical rules are a popular knowledge representation language in many domains. Recently, neural networks have been proposed to support the complex rule induction process. However, we argue that existing datasets and evaluation approaches are lacking in various dimensions; for example, different kinds of rules or dependencies between rules are neglected. Moreover, for the development of neural approaches, we need large amounts of data to learn from and adequate, approximate evaluation measures. In this paper, we provide a tool for generating diverse datasets and for evaluating neural rule learning systems, including novel performance metrics.

autoBOT: evolving neuro‑symbolic representations for explainable low resource text classifcation

Learning from texts has been widely adopted throughout industry and science. While state-of-the-art neural language models have shown very promising results for text classification, they are expensive to (pre-)train, require large amounts of data and tuning of hundreds of millions or more parameters. This paper explores how automatically evolved text representations can serve as a basis for explainable, low-resource branch of models with competitive performance that are subject to automated hyperparameter tuning. We present autoBOT (automatic Bags-Of-Tokens), an autoML approach suitable for low resource learning scenarios, where both the hardware and the amount of data required for training are limited. The proposed approach consists of an evolutionary algorithm that jointly optimizes various sparse representations of a given text (including word, subword, POS tag, keyword-based, knowledge graph-based and relational features) and two types of document embeddings (non-sparse representations). The key idea of autoBOT is that, instead of evolving at the learner level, evolution is conducted at the representation level. The proposed method offers competitive classification performance on fourteen real-world classification tasks when compared against a competitive autoML approach that evolves ensemble models, as well as state-of-the-art neural language models such as BERT and RoBERTa. Moreover, the approach is explainable, as the importance of the parts of the input space is part of the final solution yielded by the proposed optimization procedure, offering potential for meta-transfer learning described in “Overview of Transfer Learning and Examples of Algorithms and Implementations“.

Non-Parametric Learning of Embeddings for Relational Data using Gaifman Locality Theorem

We consider the problem of full model learning from relational data. To this effect, we construct embeddings using symbolic trees learned in a non-parametric manner. The trees are treated as a decision-list of first order rules that are then partially grounded and counted over local neighborhoods of a Gaifman graph to obtain the feature representations. We propose the first method for learning these relational features using a Gaifman graph by using relational tree distances. Our empirical evaluation on real data sets demonstrates the superiority of our approach over handcrafted rules, classical rule-learning approaches, the state-of-the-art relational learning methods and embedding methods.

Mapping Across Relational Domains for Transfer Learning with Word Embeddings-based Similarity

Statistical machine learning models are a concise representation of probabilistic dependencies among the attributes of an object. Most of the models assume that training and testing data come from the same distribution. Transfer learning has emerged as an essential technique to handle scenarios where such an assumption does not hold, as it relies on leveraging the knowledge acquired in one or more learning tasks as a starting point to solve a new task. Statistical Relational Learning (SRL) extends statistical learning to represent and learn from data with several objects and their relations. In this way, SRL deals with data with a rich vocabulary composed of classes, objects, their properties, and relationships. When employing transfer learning to SRL, the primary challenge is to transfer the learned structure, mapping the vocabulary from a source domain to a different target domain. To address the problem of transferring across domains, we propose TransBoostler, which uses pre-trained word embeddings to guide the mapping as the name of a predicate usually has a semantic connotation that can be mapped to a vector space model. After transferring, TransBoostler employs theory revision operators further to adapt the mapped model to the target data. In the experimental results, TransBoostler has successfully transferred trees from a source to a distinct target domain, performing equal or better than previous work but requiring less training time.

Transfer Learning for Boosted Relational Dependency Networks Through a Genetic Algorithm

Machine learning aims at generalizing from observations to induce models that aid decisions when new observations arrive. However, traditional machine learning methods fail at finding patterns from several objects and their relationships. Statistical relational learning goes a step further to discover patterns from relational domains and deal with data under uncertainty. Most machine learning methods, SRL included, assume the training and test data come from the same distribution. Nonetheless, in several scenarios, this assumption does not hold. Transfer learning aims at acting on scenarios like that, leveraging learned knowledge from a source task to improve the performance in a target task when data is scarce. A costly challenge associated with transfer learning in relational domains is mapping from the source and target background knowledge language. This paper proposes GROOT, a framework that applies genetic algorithm-based solutions described in “Overview of genetic algorithms, application examples, and implementation examples” to discover the best mapping between the source and target tasks and adapt the transferred model. GROOT relies on a set of relational dependency trees built from the source data as a starting point to build the models for the target data. Over generations, individuals carry a possible mapping. They are submitted to genetic operators that recombine subtrees and revise the initial structure tree, enabling a prune or expansion of the branches. Experimental results conducted on Cora, IMDB, UW-CSE, and NELL datasets show that GROOT reaches results better than the baselines in most cases.

A Simulated Annealing Meta-heuristic for Concept Learning in Description Logics

Ontologies – providing an explicit schema for underlying data – often serve as background knowledge for machine learning approaches. Similar to ILP methods, concept learning utilizes such ontologies to learn concept expressions from examples in a supervised manner. This learning process is usually cast as a search process through the space of ontologically valid concept expressions, guided by heuristics. Such heuristics usually try to balance explorative and exploitative behaviors of the learning algorithms. While exploration ensures a good coverage of the search space, exploitation focuses on those parts of the search space likely to contain accurate concept expressions. However, at their extreme ends, both paradigms are impractical: A totally random explorative approach will only find good solutions by chance, whereas a greedy but myopic, exploitative attempt might easily get trapped in local optima. To combine the advantages of both paradigms, different meta-heuristics have been proposed. In this paper, we examine the Simulated Annealing meta-heuristic and how it can be used to balance the exploration-exploitation trade-off in concept learning. In different experimental settings, we analyse how and where existing concept learning algorithms can benefit from the Simulated Annealing meta-heuristic.

Embedding Models for Knowledge Graphs Induced by Clusters of Relations and Background Knowledge

Embedding models have been successfully exploited for predictive tasks on Knowledge Graphs (KGs). We propose TRANSROWL-HRS, which aims at making KG embeddings more semantically aware by exploiting the intended semantics in the KG. The method exploits schema axioms to encode knowledge that is observed as well as derived by reasoning. More knowledge is further exploited by relying on a successive hierarchical clustering process applied to relations, to make use of the several semantic meanings that the very same relation may have. An experimental evaluation on link prediction and triple classification tasks proves the improvement yielded by the proposed approach (coupled with different optimizers) compared to some baseline models.

Ontology Graph Embeddings and ILP for Financial Forecasting

There is a history of hybrid machine learning approaches where the result of an unsupervised learning algorithm is used to provide data annotation from which ILP can learn in the usual supervised manner. Here we consider the task of predicting the property of cointegration between the time series of stock price of two companies, which can be used to implement a robust pair-trading strategy that can remain profitable regardless of the overall direction in which the market evolves. We start with an original FinTech ontology of relations between companies and their managers, which we have previously extracted from SEC reports, quarterly filings that are mandatory for all US companies. When combined with stock price time series, these relations have been shown to help find pairs of companies suitable to pair trading. Here we use node2vec described in “Overview of Node2Vec, its algorithm and implementation examples” embeddings to produce clusters of companies and managers, which are then used as background predicates in addition to the relations linking companies and staff present in the ontology, and the values of the target predicate for a given time period. Progol is used to learn from this mixture of predicates combining numerical with structural relations of the entities represented in the data set to reveal rules with and predictive power.

Learning Programs by Learning from Failures

We describe an inductive logic programming (ILP) approach called learning from failures. In this approach, an ILP system (the learner) decomposes the learning problem into three separate stages: generate, test, and constrain. In the generate stage, the learner generates a hypothesis (a logic program) that satisfies a set of hypothesis constraints (constraints on the syntactic form of hypotheses). In the test stage, the learner tests the hypothesis against training examples. A hypothesis fails when it does not entail all the positive examples or entails a negative example. If a hypothesis fails, then, in the constrain stage, the learner learns constraints from the failed hypothesis to prune the hypothesis space, i.e. to constrain subsequent hypothesis generation. For instance, if a hypothesis is too general (entails a negative example), the constraints prune generalisations of the hypothesis. If a hypothesis is too specific (does not entail all the positive examples), the constraints prune specialisations of the hypothesis. This loop repeats until either (i) the learner finds a hypothesis that entails all the positive and none of the negative examples, or (ii) there are no more hypotheses to test. We introduce Popper, an ILP system that implements this approach by combining answer set programming and Prolog. Popper supports infinite problem domains, reasoning about lists and numbers, learning textually minimal programs, and learning recursive programs. Our experimental results on three domains (toy game problems, robot strategies, and list transformations) show that (i) constraints drastically improve learning performance, and (ii) Popper can outperform existing ILP systems, both in terms of predictive accuracies and learning times.

Lifting Symmetry Breaking Constraints with Inductive Logic Programming

Efficient omission of symmetric solution candidates is essential for combinatorial problem solving. Most of the existing approaches are instance-specific and focus on the automatic computation of Symmetry Breaking Constraints (SBCs) for each given problem instance. However, the application of such approaches to large-scale instances or advanced problem encodings might be problematic. Moreover, the computed SBCs are propositional and, therefore, can neither be meaningfully interpreted nor transferred to other instances. To overcome these limitations, we introduce a new model-oriented approach for Answer Set Programming that lifts the SBCs of small problem instances into a set of interpretable first-order constraints using the Inductive Logic Programming paradigm. Experiments demonstrate the ability of our framework to learn general constraints from instance-specific SBCs for a collection of combinatorial problems. The obtained results indicate that our approach significantly outperforms a state-of-the-art instance-specific method as well as the direct application of a solver.

Inductive Learning of Answer Set Programs for Autonomous Surgical Task Planning – Application to a Training Task for Surgeons

The quality of robot-assisted surgery can be improved and the use of hospital resources can be optimized by enhancing autonomy and reliability in the robot’s operation. Logic programming is a good choice for task planning in robot-assisted surgery because it supports reliable reasoning with domain knowledge and increases transparency in the decision making. However, prior knowledge of the task and the domain is typically incomplete, and it often needs to be refined from executions of the surgical task(s) under consideration to avoid sub-optimal performance. In this paper, we investigate the applicability of inductive logic programming for learning previously unknown axioms governing domain dynamics. We do so under answer set semantics for a benchmark surgical training task, the ring transfer. We extend our previous work on learning the immediate preconditions of actions and constraints, to also learn axioms encoding arbitrary temporal delays between atoms that are effects of actions under the event calculus formalism. We propose a systematic approach for learning the specifications of a generic robotic task under the answer set semantics, allowing easy knowledge refinement with iterative learning. In the context of 1000 simulated scenarios, we demonstrate the significant improvement in performance obtained with the learned axioms compared with the hand-written ones; specifically, the learned axioms address some critical issues related to the plan computation time, which is promising for reliable real-time performance during surgery.

Scalable Non-observational Predicate Learning in ASP

Recently, novel ILP systems under the answer set semantics have been proposed, some of which are robust to noise and scalable over large hypothesis spaces. One such system is FastLAS, which is significantly faster than other state-of-the-art ASPbased ILP systems. FastLAS is, however, only capable of Observational Predicate Learning (OPL), where the learned hypothesis defines predicates that are directly observed in the examples. It cannot learn knowledge that is indirectly observable, such as learning causes of observed events. This class of problems, known as non-OPL, is known to be difficult to handle in the context of non-monotonic semantics. Solving non-OPL learning tasks whilst preserving scalability is a challenging open problem. We address this problem with a new abductive method for translating examples of a non-OPL task to a set of examples, called possibilities, such that the original example is covered iff at least one of the possibilities is covered. This new method allows an ILP system capable of performing OPL tasks to be “upgraded” to solve non-OPL tasks. In particular, we present our new FastNonOPL system, which upgrades FastLAS with the new possibility generation. We compare it to other state-ofthe-art ASP-based ILP systems capable of solving non-OPL tasks, showing that FastNonOPL is significantly faster, and in many cases more accurate, than these other systems.

Probabilistic Inductive Constraint Logic

Probabilistic logical models deal efectively with uncertain relations and entities typical of many real world domains. In the feld of probabilistic logic programming usually the aim is to learn these kinds of models to predict specifc atoms or predicates of the domain, called target atoms/predicates. However, it might also be useful to learn classifers for interpretations as a whole: to this end, we consider the models produced by the inductive constraint logic system, represented by sets of integrity constraints, and we propose a probabilistic version of them. Each integrity constraint is annotated with a probability, and the resulting probabilistic logical constraint model assigns a probability of being positive to interpretations. To learn both the structure and the parameters of such probabilistic models we propose the system PASCAL for “probabilistic inductive constraint logic”. Parameter learning can be performed using gradient descent or L-BFGS described in “Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method“. PASCAL has been tested on 11 datasets and compared with a few statistical relational systems and a system that builds relational decision trees (TILDE): we demonstrate that this system achieves better or comparable results in terms of area under the precision–recall and receiver operating characteristic curves, in a comparable execution time.

Learning Hierarchical Probabilistic Logic Programs

Probabilistic logic programming (PLP) combines logic programs and probabilities. Due to its expressiveness and simplicity, it has been considered as a powerful tool for learning and reasoning in relational domains characterized by uncertainty. Still, learning the parameter and the structure of general PLP is computationally expensive due to the inference cost. We have recently proposed a restriction of the general PLP language called hierarchical PLP (HPLP) in which clauses and predicates are hierarchically organized. HPLPs can be converted into arithmetic circuits or deep neural networks and inference is much cheaper than for general PLP. In this paper we present algorithms for learning both the parameters and the structure of HPLPs from data. We first present an algorithm, called parameter learning for hierarchical probabilistic logic programs (PHIL) which performs parameter estimation of HPLPs using gradient descent and expectation maximization. We also propose structure learning of hierarchical probabilistic logic programming (SLEAHP), that learns both the structure and the parameters of HPLPs from data. Experiments were performed comparing PHIL and SLEAHP with PLP and Markov Logic Networks state-of-the art systems for parameter and structure learning respectively. PHIL was compared with EMBLEM, ProbLog2 and Tuffy and SLEAHP with SLIPCOVER, PROBFOIL+, MLB-BC, MLN-BT and RDN-B. The experiments on five well known datasets show that our algorithms achieve similar and often better accuracies but in a shorter time.

A Semantics for Hybrid Probabilistic Logic Programs with Function Symbols

Probabilistic Logic Programming (PLP) is a powerful paradigm for the representation of uncertain relations among objects. Recently, programs with continuous variables, also called hybrid programs, have been proposed and assigned a semantics. Hybrid programs are capable of representing real-world measurements but unfortunately the semantics proposal was imprecise so the definition did not assign a probability to all queries. In this paper, we remedy this and formally define a new semantics for hybrid programs. We prove that the semantics assigns a probability to all queries for a large class of programs.

An Asymptotic Analysis of Probabilistic Logic Programming, with Implications for Expressing Projective Families of Distributions

Probabilistic logic programming is a major part of statistical relational artificial intelligence, where approaches from logic and probability are brought together to reason about and learn from relational domains in a setting of uncertainty. However, the behaviour of statistical relational representations across variable domain sizes is complex, and scaling inference and learning to large domains remains a significant challenge. In recent years, connections have emerged between domain size dependence, lifted inference and learning from sampled subpopulations. The asymptotic behaviour of statistical relational representations has come under scrutiny, and projectivity was investigated as the strongest form of domain-size dependence, in which query marginals are completely independent of the domain size.
In this contribution we show that every probabilistic logic program under the distribution semantics is asymptotically equivalent to an acyclic probabilistic logic program consisting only of determinate clauses over probabilistic facts. We conclude that every probabilistic logic program inducing a projective family of distributions is in fact everywhere equivalent to a program from this fragment, and we investigate the consequences for the projective families of distributions expressible by probabilistic logic programs.
To facilitate the application of classical results from finite model theory, we introduce the abstract distribution semantics, defined as an arbitrary logical theory over probabilistic facts. This bridges the gap to the distribution semantics underlying probabilistic logic programming. In this representation, determinate logic programs correspond to quantifier-free theories, making asymptotic quantifier elimination results available for the setting of probabilistic logic programming.
This paper is under consideration for acceptance in TPLP.

Generative Clausal Networks: Relational Decision Trees as Probabilistic Circuits

In many real-world applications, the i.i.d. assumption does not hold and thus capturing the interactions between instances is essential for the task at hand. Recently, a clear connection between predictive modelling such as decision trees and probabilistic circuits, a form of deep probabilistic model, has been established although it is limited to propositional data. We introduce the first connection between relational rule models and probabilistic circuits, obtaining tractable inference from discriminative rule models while operating on the relational domain. Specifically, given a relational rule model, we make use of Mixed Sum-Product Networks (MSPNs)—a deep probabilistic architecture for hybrid domains—to equip them with a full joint distribution over the class and how (often) the rules fire. Our empirical evaluation shows that we can answer a wide range of probabilistic queries on relational data while being robust to missing, out-of-domain data and partial counts. We show that our method generalizes to different distributions outperforming strong baselines. Moreover, due to the clear probabilistic semantics of MSPNs we have informative model interpretations.

A First Step Towards Even More Sparse Encodings of Probability Distributions

Real world scenarios can be captured with lifted probability distributions. However, distributions are usually encoded in a table or list, requiring an exponential number of values. Hence, we propose a method for extracting first-order formulas from probability distributions that require significantly less values by reducing the number of values in a distribution and then extracting, for each value, a logical formula to be further minimized. This reduction and minimization allows for increasing the sparsity in the encoding while also generalizing a given distribution. Our evaluation shows that sparsity can increase immensely by extracting a small set of short formulas while preserving core information.

Feature Learning by Least Generalization

This paper provides an empirical study for feature learning based on induction. We encode image data into first-order expressions and compute their least generalization. An interesting question is whether the least generalization can extract a common pattern of input data. We also introduce two different methods for feature extraction based on symbolic manipulation. We perform experiments using the MNIST datasets and show that the proposed methods successfully capture features from training data and classify test data in around 90% accuracies. The results of this paper show potentials of induction and symbolic reasoning to feature learning or pattern recognition from raw data.

Automatic Conjecturing of P-Recursions Using Lifted Inference

Recent progress in lifted inference algorithms has made it possible to solve many non-trivial counting tasks from enumerative combinatorics in an automated fashion, by casting them as first-order model counting problems. Algorithms for this problem typically output a single number, which is the number of models of the first-order logic sentence in question on a given domain. However, in the combinatorics setting, we are more interested in obtaining a mathematical formula that holds for any given structure size. In this paper, we show that one can use lifted inference algorithms to conjecture linear recurrences with polynomial coefficients, one such class of formulas of interest.

Automatic Modeling of Dynamical Interactions Within Marine Ecosystems

Marine ecology models are used to study and anticipate population variations of plankton and microalgae species. These variations can have an impact on ecological niches, the economy or the climate. Our objective is the automation of the creation of such models. Learning From Interpretation Transition (LFIT) is a framework that aims at learning the dynamics of a system by observing its state transitions. LFIT provides explainable predictions in the form of logical rules. In this paper, we introduce a method that allows to extract an influence graph from a LFIT model. We also propose an heuristic to improve the model against noise in the data.

Polynomial Algorithm For Learning From Interpretation Transition

Learning from interpretation transition (LFIT) automatically constructs a model of the dynamics of a system from the observation of its state transitions. The previously proposed General Usage LFIT Algorithm (GULA) serves as the core block to several methods of the framework that capture different dynamics. But its exponential complexity limits the use of the whole framework to relatively small systems. In this paper, we introduce an approximated algorithm (PRIDE) which trades the completeness of GULA for a polynomial complexity. Both GULA and PRIDE source codes are available as open source at https://github.com/Tony-sama/pylfit under GPL-3.0 License.

Exploiting Temporal Relations of Events for Classification Tasks – Proof of Concept Study with Synthetic Data Sets