From the Proceedings of Inductive logic Programming 2010

Machine Learning Technology Artificial Intelligence Technology Natural Language Processing Technology Semantic Web Technology Ontology Technology Reasoning Technology Knowledge Information Technology Collecting AI Conference Papers Digital Transformation Technology

ILP 2010 20th International Conference, Inductive Logic Programming

In the previous article, we discussed ILP2019. In this article, we discuss the papers from the 20th International Conference on Inductive Logic Programming (ILP2010), held in Florence, Italy, June 27-30, 2010.

The ILP conference series started in 1991 and is the premier international fo- rum on logic-based approaches to machine learning. The conference has recently explored several intersections with statistical learning and other probabilistic approaches, expanding research horizons significantly.

The 20th edition was structured with invited talks, regular talks, a poster session, a panel session, and featured for the first time a tutorial day. The in- vited speakers were Michael Kifer, Avi Pfeffer, and David Poole. Abstracts of their talks can be found in this volume. Gianluigi Greco and Francesco Scarcello presented a tutorial on “Structural Decomposition Methods: Identifying Easy Instances of Hard Problems”; Volker Tresp presented a tutorial on “Multivari- ate Models for Relational Learning.” Ivan Bratko, Luc De Raedt, Peter Flach, Katsumi Inoue, Stephen Muggleton, David Poole, and Ashwin Srinivasan par- ticipated in a panel session highlighting successes and future trends of ILP 20 years after the first meeting.

The overall program featured 16 oral presentations and 15 poster presenta- tions. The presentations of both kind were selected on the basis of extended abstracts. Following the recent tradition of the conference, a selection of the papers accepted at ILP 2010 are published in this volume of the Lecture Notes in Artificial Intelligence series and in a special issue of the Machine Learning journal. From the initially submitted 44 extended abstracts (8 pages in LNCS format), 31 were accepted for presentation at the conference. Each submission was refereed by at least three Program Committee members and was accessible for additional comments by the entire Program Committee (except in the cases of conflict of interest) thanks to the open reviewing model supported by Easy- Chair. Out of the accepted contributions 5 were selected for the special issue, 11 were published as a long paper (16 pages), and 15 more as a short paper (8 pages) in the proceedings. These papers were prepared after the conference. Supplementary materials for some of the accepted papers can be retrieved from the conference website (http://ilp2010.dsi.unifi.it/).

ILP 2010 would not have taken place without the contribution of many peo- ple. We would like to thank the invited and tutorial speakers, the panelists, the Program Committee members, the additional reviewers, the authors of sub- mitted papers, the participants, the local organizers (especially Marco Lippi) as well as the sponsors (the Artificial Intelligence journal, the Machine Learning journal, the Association for Logic Programming, the University of Bari “Aldo Moro,” the PASCAL 2 Network of Excellence, and the Office of Naval Research Global) for their generous financial support.

Abstracts of Invited Talks

Rule Interchange Format: Logic Programming’s Second Wind?

Recent years have witnessed a strong upswing in the interest in rule systems technologies—both in their own right and in combination with existing Web standards. In particular, the Semantic Web is now seen as a vast playing field for rules within the academia as well as the industry. This renewed interest motivated the development of the Rule Interchange Format (RIF), a recent W3C Web standard for exchanging rules among different and dissimilar systems [1–5]. Despite its name, RIF is not merely a format: it is a collection of concrete rule languages, called RIF dialects, and a framework for defining new ones in harmony with each other. This includes formal specifications of the syntax, semantics, and XML serialization.

In this talk we argue that RIF is a major opportunity to re-introduce rule based technologies into the mainstream of knowledge representation and information processing, and to rekindle the interest in logic program- ming. First, we will introduce the main principles behind RIF and then discuss the application landscape that could emerge if this standard is embraced by the relevant communities: Logic Programming, Semantic Web, and Knowledge Representation. We will also reflect on the past of logic programming and speculate on how it could benefit from and contribute to RIF in the future.

Practical Probabilistic Programming

Probabilistic programming promises to make probabilistic modeling easier by making it possible to create models using the power of programming languages, and by applying general-purpose algorithms to reason about models. We present a new probabilistic programming language named Figaro that was designed with practicality and usability in mind. Figaro can represent models naturally that have been difficult to represent in other languages, such as probabilistic relational models and models with undirected relationships with arbitrary constraints. An important feature is that the Figaro language and reasoning algorithms are embedded as a library in Scala. We illustrate the use of Figaro through a case study.

Probabilistic Relational Learning and Inductive Logic Programming at a Global Scale

Building on advances in statistical-relational AI and the Se- mantic Web, this talk outlined how to create knowledge, how to evaluate knowledge that has been published, and how to go beyond the sum of human knowledge. If there is some claim of truth, it is reasonable to ask what evidence there is for that claim, and to not believe claims that do not provide evidence. Thus we need to publish data that can provide evidence. Given such data, we can also learn from it. This talk outlines how publishing ontologies, data, and probabilistic hypotheses/theories can let us base beliefs on evidence, and how the resulting world-wide mind can go beyond the aggregation of human knowledge. Much of the world’s data is relational, and we want to make probabilistic predictions in order to make rational decisions. Thus probabilistic relational learning and inductive logic programming need to be a foundation of the semantic web. This talk overviewed the technology behind this vision and the considerable technical and social problem that remain.

Research Papers

Learning Multi-class Theories in ILP

In this paper we investigate the lack of reliability and consistency of those binary rule learners in ILP that employ the one-vs-rest binarisation technique when dealing with multi-class domains. We show that we can learn a simple, consistent and reliable multi-class theory by combining the rules of the multiple one-vs-rest theories into one rule list or set. We experimentally show that our proposed methods produce coherent and accurate rule models from the rules learned by a well known ILP learner Aleph.

A Numerical Refinement Operator Based on Multi-Instance Learning

We present a numerical refinement operator based on multi- instance learning. In the approach, the task of handling numerical vari- ables in a clause is delegated to statistical multi-instance learning schemes. To each clause, there is an associated multi-instance classification model with the numerical variables of the clause as input. Clauses are built in a greedy manner, where each refinement adds new numerical variables which are used additionally to the numerical variables already known to the multi-instance model. In our experiments, we tested this approach with multi-instance learners available in the Weka workbench (like MI- SVMs). These clauses are used in a boosting approach that can take advantage of the margin information, going beyond standard covering procedures or the discrete boosting of rules, like in SLIPPER. The ap- proach is evaluated on the problem of hexose binding site prediction, a pharmacological application and mutagenicity prediction. In two of the three applications, the task is to find configurations of points with cer- tain properties in 3D space that characterize either a binding site or drug activity: the logical part of the clause constitutes the points with their properties, whereas the multi-instance model constrains the distances among the points. In summary, the new numerical refinement operator is interesting both theoretically as a new synthesis of logical and statisti- cal learning and practically as a new method for characterizing binding sites and pharmacophores in biochemical applications.

Not Far Away from Home: A Relational Distance-Based Approach to Understanding Images of Houses

Augmenting vision systems with high-level knowledge and reasoning can improve lower-level vision processes, by using richer and more structured in- formation. In this paper we tackle the problem of delimiting conceptual elements of street views based on spatial relations between lower-level components, e.g. the element ‘house’ is composed of windows and a door in a spatial arrangement. We use structured data: each concept can be seen as a graph representing spa- tial relations between components, e.g. in terms of right, up, close. We employ a distance-based approach between logical interpretations to match parts of images with known examples and provide an experimental evaluation on real images.

Approximate Inference for Logic Programs with Annotated Disjunctions

Logic Programs with Annotated Disjunctions (LPADs) are a promising language for Probabilistic Inductive Logic Programming. In order to develop efficient learning systems for LPADs, it is fundamental to have high-performing inference algorithms. The existing approaches take too long or fail for large problems. In this paper we adapt to LPAD the approaches for approximate inference that have been developed for ProbLog, namely k-best and Monte Carlo.

k-Best finds a lower bound of the probability of a query by identifying the k most probable explanations while Monte Carlo estimates the prob- ability by smartly sampling the space of programs. The two techniques have been implemented in the cplint suite and have been tested on real and artificial datasets representing graphs. The results show that both algorithms are able to solve larger problems often in less time than the exact algorithm.

Approximate Bayesian Computation for the Parameters of PRISM Programs

Probabilistic logic programming formalisms permit the definition of potentially very complex probability distributions. This com- plexity can often make learning hard, even when structure is fixed and learning reduces to parameter estimation. In this paper an approximate Bayesian computation (ABC) method is presented which computes ap- proximations to the posterior distribution over PRISM parameters. The key to ABC approaches is that the likelihood function need not be com- puted, instead a ‘distance’ between the observed data and synthetic data generated by candidate parameter values is used to drive the learning. This makes ABC highly appropriate for PRISM programs which can have an intractable likelihood function, but from which synthetic data can be readily generated. The algorithm is experimentally shown to work well on an easy problem but further work is required to produce acceptable results on harder ones.

Probabilistic Rule Learning

Traditionally, rule learners have learned deterministic rules from deterministic data, that is, the rules have been expressed as logical statements and also the examples and their classification have been purely logical. We upgrade rule learning to a probabilistic setting, in which both the examples themselves as well as their classification can be probabilistic. The setting is incorporated in the probabilistic rule learner ProbFOIL, which combines the principles of the relational rule learner FOIL with the probabilistic Prolog, ProbLog. We report also on some experiments that demonstrate the utility of the approach.

Interactive Discriminative Mining of Chemical Fragments

Structural activity prediction is one of the most important tasks in chemoinformatics. The goal is to predict a property of interest given structural data on a set of small compounds or drugs. Ideally, systems that address this task should not just be accurate, but they should also be able to identify an interpretable discriminative structure which describes the most discriminant structural elements with respect to some target.

The application of ILP in an interactive software for discriminative mining of chemical fragments is presented in this paper. In particular, it is described the coupling of an ILP system with a molecular visualisation software that allows a chemist to graphically control the search for interesting patterns in chemical fragments. Furthermore, we show how structural information, such as rings, functional groups such as carboxyls, amines, methyls, and esters, are integrated and exploited in the search.

MMRF for Proteome Annotation Applied to Human Protein Disease Prediction

Biological processes where every gene and protein participates is an essential knowledge for designing disease treatments. Nowa- days, these annotations are still unknown for many genes and proteins. Since making annotations from in-vivo experiments is costly, computational predictors are needed for different kinds of annotation such as metabolic pathway, interaction network, protein family, tissue, disease and so on. Biological data has an intrinsic relational structure, including genes and proteins, which can be grouped by many criteria. This hinders the possibility of finding good hypotheses when attribute-value representation is used. Hence, we propose the generic Modular Multi-Relational Framework (MMRF) to predict different kinds of gene and protein an- notation using Relational Data Mining (RDM). The specific MMRF application to annotate human protein with diseases verifies that group knowledge (mainly protein-protein interaction pairs) improves the pre- diction, particularly doubling the area under the precision-recall curve.

Extending ProbLog with Continuous Distributions

ProbLog is a recently introduced probabilistic extension of Prolog. The key contribution of this paper is that we extend ProbLog with abilities to specify continuous distributions and that we show how ProbLog’s exact inference mechanism can be modified to cope with such distributions. The resulting inference engine combines an interval calculus with a dynamic discretization algorithm into an effective solver.

Multivariate Prediction for Learning on the Semantic Web

One of the main characteristics of Semantic Web (SW) data is that it is notoriously incomplete: in the same domain a great deal might be known for some entities and almost nothing might be known for others. A popular example is the well known friend-of-a-friend data set where some members document exhaustive private and social information whereas, for privacy concerns and other reasons, almost nothing is known for other members. Although deductive reasoning can be used to complement factual knowledge based on the ontological background, still a tremendous number of potentially true statements remain to be uncovered. The paper is focused on the prediction of potential relationships and attributes by exploiting regularities in the data using statistical relational learning algorithms. We argue that multivariate prediction approaches are most suitable for dealing with the resulting high-dimensional sparse data matrix. Within the statistical framework, the approach scales up to large domains and is able to deal with highly sparse relationship data. A major goal of the presented work is to formulate an inductive learning approach that can be used by people with little machine learning background. We present experimental results using a friend-of- a-friend data set.

Learning Action Descriptions of Opponent Behaviour in the Robocup 2D Simulation Environment

The Robocup 2D simulation competition [13] proposes a dynamic environment where two opponent teams are confronted in a simplified soccer game. All major teams use a fixed algorithm to control its players. An unexpected opponent strategy, not previously considered by the developers, might result in winning all matches. To improve this we use ILP to learn action descriptions of opponent players; for learning on dynamic domains, we have to deal with the frame problem. The induced descriptions can be used to plan for desired field states. To show this we start with a simplified scenario where we learn the behaviour of a goal- keeper based on the actions of a shooter player. This description is used to plan for states where a goal can be scored. This result can directly be extended to a multiplayer environment.

Hypothesizing about Causal Networks with Positive and Negative Effects by Meta-level Abduction

Meta-level abduction discovers missing links and unknown nodes from incomplete networks to complete paths for observations. In this work, we extend applicability of meta-level abduction to deal with networks containing both positive and negative causal effects. Such networks appear in many domains including biology, in which inhibitory effects are important in signaling and metabolic pathways. Reasoning in networks with inhibition is inevitably nonmonotonic, and involves default assumptions in abduction. We show that meta-level abduction can consistently produce both positive and negative causal relations as well as invented nodes. Case studies of meta-level abduction are presented in p53 signal- ing networks, in which causal rules are abduced to suppress a tumor with a new protein and to stop DNA synthesis when damage is occurred.

BET: An Inductive Logic Programming Workbench

Existing ILP (Inductive Logic Programming) systems are implemented in different languages namely C, Progol, etc. Also, each system has its customized format for the input data. This makes it very tedious and time consuming on the part of a user to utilize such a system for experimental purposes as it demands a thorough understand- ing of that system and its input specification. In the spirit of Weka [1], we present a relational learning workbench called BET(Background + Examples = Theories), implemented in Java. The objective of BET is to shorten the learning curve of users (including novices) and to facilitate speedy development of new relational learning systems as well as quick integration of existing ILP systems. The standardized input format makes it easier to experiment with different relational learning algorithms on a common dataset.

Seeing the World through Homomorphism: An Experimental Study on Reducibility of Examples

We study reducibility of examples inseveral typical inductive logic programming benchmarks. The notion of reducibility that we use is related to theta-reduction, commonly used to reduce hypotheses in ILP. Whereas examples are usually not reducible on their own, they often become implicitly reducible when language for constructing hypotheses is fixed. We show that number of ground facts in a dataset can be almost halved for some real-world molecular datasets. Furthermore, we study the impact this has on a popular ILP system Aleph.

Learning Discriminant Rules as a Minimal Saturation Search

It is well known that for certain relational learning problems, traditional top-down search falls into blind search. Recent works in Inductive Logic Programming about phase transition and crossing plateau show that no general solution can face to all these difficulties. In this con- text, we introduce the notion of “minimal saturation” to build non-blind refinements of hypotheses in a bidirectional approach.

We present experimental results of this approach on some benchmarks inspired by constraint satisfaction problems. These problems can be specified in first order logic but most existing ILP systems fail to learn a correct definition, especially because they fall into blind search.

Variation of Background Knowledge in an Industrial Application of ILP

In several recent papers ILP has been applied to Systems Biology problems, in which it has been used to fill gaps in the descriptions of biological networks. In the present paper we describe two new applications of this type in the area of plant biology. These applications are of particular interest to the agrochemical industry in which improve- ments in plant strains can have benefits for modelling crop development. The background knowledge in these applications is extensive and is de- rived from public databases in a Prolog format using a new system called Ondex (developers BBSRC Rothamsted). In this paper we explore the question of how much of this background knowledge it is beneficial to include, taking into account accuracy increases versus increases in learning time. The results indicate that relatively shallow background knowledge is needed to achieve maximum accuracy.

Pruning Search Space for Weighted First Order Horn Clause Satisfiability

Many SRL models pose logical inference as weighted satisfiability solving. Performing logical inference after completely grounding clauses with all possible constants is computationally expensive and approaches such as LazySAT [8] utilize the sparseness of the domain to deal with this. Here, we investigate the efficiency of restricting the Knowledge Base (Σ) to the set of first order horn clauses. We propose an algorithm that prunes the search space for satisfiability in horn clauses and prove that the optimal solution is guaranteed to exist in the pruned space. The approach finds a model, if it exists, in polynomial time; otherwise it finds an interpretation that is most likely given the weights. We provide experimental evidence that our approach reduces the size of search space substantially.

Multi-relational Pattern Mining Based-on Combination of Properties with Preserving Their Structure in Examples

We propose an algorithm for multi-relational pattern mining through the problem established in WARMR. In order to overcome the combinatorial problem of large pattern space, another algorithm MAPIX restricts patterns into combination of basic patterns, called properties. A property is defined as a set of literals appeared in examples and is of an extended attribute-value form. Advantage of MAPIX is to make patterns from pattern fragments occurred in examples. Many patterns which are not appeared in examples are not tested. Although the range of patterns is clear and MAPIX enumerates them efficiently, a large part of patterns are out of the range. The proposing algorithm keeps the advantage and extends the way of combination of properties. The algorithm combines properties as they appeared in examples, we call it structure preserving combination.

Learning from Noisy Data Using a Non-covering ILP Algorithm

In this paper we describe the non-covering inductive logic programming program HYPER/N, concentrating mainly on noise han- dling as well as some other mechanisms that improve learning. We per- form some experiments with HYPER/N on synthetic weather data with artificially added noise, and on real weather data to learn to predict the movement of rain from radar rain images and synoptic data.

Can HOLL Outperform FOLL?

Learning first-order recursive theories remains a difficult learning task in a normal Inductive Logic Programming (ILP) setting, although numerous approaches have addressed it; using Higher-order Logic (HOL) avoids having to learn recursive clauses for such a task. It is one of the areas where Higher-order Logic Learning (HOLL), which uses the power of expressivity of HOL, can be expected to improve the learn- ability of a problem compared to First-order Logic Learning (FOLL). We present a first working implementation of λProgol, a HOLL system adapting the ILP system Progol and the HOL formalism λProlog, which was introduced in a poster last year [15]. We demonstrate that λProgol outperforms standard Progol when learning first-order recursive theories, by improving significantly the predictive accuracy of several worked ex- amples, especially when the learning examples are small with respect to the size of the data.

Incremental Learning of Relational Action Models in Noisy Environments

In the Relational Reinforcement Learning framework, we propose an algorithm that learns an action model (or an approximation of the transition func- tion) in order to predict the resulting state of an action in a given situation. This algorithm learns incrementally a set of first order rules in a noisy environment following a data-driven loop. Each time a new example is presented that contradicts the current action model, the model is revised (by generalization and/or specialization). As opposed to a previous version of our algorithm that operates in a noise-free context, we introduce here a number of indicators attached to each rule that allows to evaluate if the revision should take place immediately or should be delayed. We provide an empirical evaluation on usual RRL benchmarks.

When Does It Pay Off to Use Sophisticated Entailment Engines inILP?

Entailment is an important problem in computational logic particularly relevant to the Inductive Logic Programming (ILP) community as it is at the core of the hypothesis coverage test which is often the bottleneck of an ILP system. Despite developments in resolution heuristics and, more recently, in subsumption engines, most ILP systems simply use Prolog’s left-to-right, depth-first search selection function for SLD-resolution to perform the hypothesis coverage test.

We implemented two alternative selection functions for SLD-resolution: smallest predicate domain (SPD) and smallest variable domain (SVD); and developed a subsumption engine, Subsumer. These entailment engines were fully integrated into the ILP system ProGolem.

The performance of these four entailment engines is compared on a representative set of ILP datasets. As expected, on determinate datasets Prolog’s built-in resolution, is unrivalled. However, in the presence of even little non-determinism, its performance quickly degrades and a so- phisticated entailment engine is required.

Stochastic Refinement

The research presented in this paper is motivated by the following question. How can the generality order of clauses and the relevant concepts such as refinement be adapted to be used in a stochastic search? To address this question we introduce the concept of stochastic refinement operators and adapt a framework, called stochastic refinement search. In this paper we introduce stochastic refinements of a clause as a probability distribution over a set of clauses. This probability distribution can be viewed as a prior in a stochastic ILP search. We study the properties of a stochastic refinement search as two well known Markovian approaches: 1) Gibbs sampling algorithm and 2) random heuristic search. As a Gibbs sampling algorithm, a stochastic refinement search iteratively generates random samples from the hypothesis space accord- ing to a posterior distribution. We show that a minimum sample size can be set so that in each iteration a consistent clause is generated with a high probability. We study the stochastic refinement operators within the framework of random heuristic search and use this framework to characterise stochastic search methods in some ILP systems. We also study a special case of stochastic refinement search where refinement opera- tors are defined with respect to subsumption order relative to a bottom clause. This paper also provided some insights to explain the relative advantages of using stochastic lgglike operators as in the ILP systems Golem and ProGolem.

Fire! Firing Inductive Rules from Economic Geography for Fire Risk Detection

Wildfires can importantly affect the ecology and economy of large regions of the world. Effective prevention techniques are fundamental to mitigate their consequences. The design of such preemptive methods requires a deep understanding of the factors that increase the risk of fire, particularly when we can intervene on these factors. This is the case for the maintenance of ecological balances in the landscape that minimize the occurrence of wildfires. We use an inductive logic programming approach over detailed spatial datasets: one describing the landscape mosaic and characterizing it in terms of its use; and another describing polygonal areas where wildfires took place over several years. Our inductive process operates over a logic term representation of vectorial geographic data and uses spatial predicates to explore the search space, leveraging the framework of Spatial-Yap, its multi-dimensional indexing and tabling extensions. We show that the coupling of a logic-based spatial database with an inductive logic programming engine provides an elegant and powerful approach to spatial data mining.

Automating the ILP Setup Task: Converting User Advice about Specific Examples into General Background Knowledge

Inductive Logic Programming (ILP) provides an effective method of learning logical theories given a set of positive examples, a set of negative examples, a corpus of background knowledge, and specification of a search space (e.g., via mode definitions) from which to compose the theories. While specifying positive and negative examples is relatively straightforward, composing effective background knowledge and search-space definition requires detailed understanding of many aspects of the ILP process and limits the usability of ILP. We introduce two techniques to automate the use of ILP for a non-ILP expert. These techniques include automatic generation of background knowledge from user-supplied information in the form of a simple relevance language, used to describe important aspects of specific training examples, and an iterative-deepening-style search process.

Speeding Up Planning through Minimal Generalizations of Partially OrderedPlans

We present anovel strategy enabling to exploit existing plans in solving new similar planning tasks by finding a common generalized core of the existing plans. For this purpose we develop an operator yield- ing a minimal joint generalization of two partially ordered plans. In three planning domains we show a substantial speed-up of planning achieved when the planner starts its search space exploration from the learned common generalized core, rather than from scratch.

In the next article, we will discuss ISWC2011.