From Inductive logic Programming 2012 Proceedings

Machine Learning Technology Artificial Intelligence Technology Natural Language Processing Technology Semantic Web Technology Ontology Technology Reasoning Technology Knowledge Information Technology Collecting AI Conference Papers Digital Transformation Technology

ILP 2012 22nd International Conference Inductive Logic Programming

In the previous article, we discussed ILP2011. This issue discusses the proceedings of the 22nd International Conference on Inductive Logic Programming, ILP 2012, held in Dubrovnik, September 17-19, 2012 The ILP conference series began in 1991 and is the leading international Forum. Initially focused on the induction of logic programs, it has broadened its scope in recent years and has attracted much attention and interest. It now focuses on all aspects of learning from structured data, including logic learning, multi-branch relational learning, data mining, statistical relational learning, graph and tree structure mining, and relational reinforcement learning.

This edition of the conference solicited three types of submissions:

Long papers (12 pages) describing original mature work containing appropri- ate experimental evaluation and/or representing a self-contained theoretical contribution
Short papers (6 pages) describing original work in progress, brief accounts of original ideas without conclusive experimental evaluation, and other relevant work of potentially high scientific interest but not yet qualifying for the above category
Papers relevant to the conference topics and recently published or accepted for publication by a first-class conference such as ECML/PKDD, ICML, KDD, ICDM etc. or a journal such as MLJ, DMKD, JMLR etc.

We received 20 long and 21 short submissions, and one previously published paper. Each submission was reviewed by at least three Program Commitee members. The short papers were evaluated on the basis of both the submit- ted manuscript and the presentation at the conference, and the authors of a subset of papers were invited to submit an extended version. Eventually, 18 pa- pers were accepted for the present proceedings, nine papers were accepted for the Late-Breaking Papers volume published in the CEUR workshop proceedings series, and eight papers were invited for the Machine Learning journal special issue dedicated to ILP 2012, and subjected to further reviewing. The paper en- titled “Plane-Based Object Categorization Using Relational Learning” by Reza Farid and Claude Sammut won the Machine Learning journal best student paper award and appears in the mentioned special issue. The prize is kindly sponsored by the Machine Learning journal (Springer).

The papers in this volume represent well the current breadth of ILP research topics such as propositionalization, logical foundations, implementations, probabilistic ILP, applications in robotics and biology, grammatical inference, spatial learning, and graph-based learning.

The conference program included three invited talks. In the lecture entitled “Declarative Modeling for Machine Learning,” Luc De Raedt proposed applying the constraint programming methodology to machine learning and data mining and specifying machine learning and data mining problems as constraint satisfaction and optimization problems. In this way it is possible to develop applications and software that incorporates machine learning or data mining techniques by specifying declaratively what the machine learning or data mining problem is rather than having to outline how the solution needs to be computed.

Ben Taskar’s talk “Geometry of Diversity and Determinantal Point Processes: Representation, Inference and Learning” discussed approaches to inference and learning in graphical models using determinantal point processes (DPPs) that offer tractable algorithms for exact inference, including computing marginals, computing certain conditional probabilities, and sampling. He presented recent work on a novel factorization and dual representation of DPPs that enables efficient inference for exponentially sized structured sets.

Geraint A. Wiggins spoke about “Learning and Creativity in the Global Workspace” and presented a model based on Baars Global Workspace account of consciousness, that attempts to provide a general, uniform mechanism for information regulation. The key ideas involved are: information content and entropy, expectation, learning multi-dimensional, multi-level representations and data, and data-driven segmentation. The model was originally based on music, but can be generalized to language. Most importantly, it can account for not only perception and action, but also for creativity, possibly serving as a model for original linguistic thought.

Details are described below.

A Relational Approach to Tool-Use Learning in Robots

We present a robot agent that learns to exploit objects in its environment as tools, allowing it to solve problems that would otherwise be impossible to achieve. The agent learns by watching a single demonstration of tool use by a teacher and then experiments in the world with a variety of available tools. A variation of explanation-based learning (EBL) first identifies the most important sub-goals the teacher achieved using the tool. The action model constructed from this explanation is then refined by trial-and-error learning with a novel Inductive Logic Programming (ILP) algorithm that generates informative experiments while containing the search space to a practical number of experiments. Relational learning generalises across objects and tasks to learn the spatial and structural constraints that describe useful tools and how they should be employed. The system is evaluated in a simulated robot environment.

A Refinement Operator for Inducing Threaded-Variable Clauses

In Logic Programming, a thread of input/output variables is often used to carry state information through the body literals of the clauses that make up a logic program. When using Inductive Logic Programming (ILP) to synthesise logic programs, the standard refinement operators that define the search space cannot enforce this pattern and non-conforming clauses have to be discarded after being constructed. We present a new refinement operator that defines a search space that only includes Horn clauses that conform to this pattern of input/output variable threads, dramatically narrowing the search space and ILP run times. We further support our theoretical justification of the new operator with experimental results over a variety of datasets.

Propositionalisation of Continuous Attributes beyond Simple Aggregation

Existing propositionalisation approaches mainly deal with categorical attributes. Few approaches deal with continuous attributes. A first solution is then to discretise numeric attributes to transform them into categorical ones. Alternative approaches dealing with numeric attributes consist in aggregating them with simple functions such as average, minimum, maximum, etc. We propose an approach dual to discretisation that reverses the processing of objects and thresholds, and whose discretisation corresponds to quantiles. Our approach is evaluated thoroughly on artificial data to characterize its behaviour with respect to two attribute-value learners, and on real datasets.

Topic Models with Relational Features for Drug Design

To date, ILP models in drug design have largely focussed on models in first-order logic that relate two- or three-dimensional molecu- lar structure of a potential drug (a ligand) to its activity (for example, inhibition of some protein). In modelling terms: (a) the models have largely been logic-based (although there have been some attempts at probabilistic models); (b) the models have been mostly of a discrimina- tory nature (they have been mainly used for classification tasks); and (c) data for concepts to be learned are usually provided explicitly: “hidden” or latent concept learning is rare. Each of these aspects imposes certain limitations on the use of such models for drug design. Here, we propose the use of “topic models”—correctly, hierarchical Bayesian models—as a general and powerful modelling technique for drug design. Specifically, we use the feature-construction cabilities of a general-purpose ILP sys- tem to incorporate complex relational information into topic models for drug-like molecules. Our main interest in this paper is to describe com- putational tools to assist the discovery of drugs for malaria. To this end, we describe the construction of topic models using the GlaxoSmithKline Tres Cantos Antimalarial TCAMS dataset. This consists of about 13,000 inhibitors of the 3D7 strain of P. falciparum in human erythrocytes, obtained by screening of approximately 2 million compounds. We investigate the discrimination of molecules into groups (for example, “more active” and “less active”). For this task, we present evidence that suggests that when it is important to maximise the detection of molecules with high activity (“hits”), topic-based classifiers may be better than those that operate directly on the feature-space representation of the molecules. Besides the applicability for modelling anti-malarials, an ob- vious utility of topic-modelling as a technique of reducing the dimension- ality of ILP-constructed feature spaces is also apparent.

Pairwise Markov Logic

For many tasks in fields like computer vision, computational biology and information extraction, popular probabilistic inference methods have been devised mainly for propositional models that contain only unary and pairwise clique potentials. In contrast, statistical relational approaches typically do not restrict a model’s representational power and use high-order potentials to capture the rich structure of relational domains. This paper aims to bring both worlds closer together.

We introduce pairwise Markov Logic, a subset of Markov Logic where each formula contains at most two atoms. We show that every non- pairwise Markov Logic Network (MLN) can be transformed or ‘reduced’ to a pairwise MLN. Thus, existing, highly efficient probabilistic inference methods can be employed for pairwise MLNs without the overhead of devising or implementing high-order variants. Experiments on two rela- tional datasets confirm the usefulness of this reduction approach.

Evaluating Inference Algorithms for the Prolog Factor Language

Over the last years there has been some interest in mod- els that combine first-order logic and probabilistic graphical models to describe large scale domains, and in efficient ways to perform inference on these domains. Prolog Factor Language (PFL) is a extension of the Prolog language that allows a natural representation of these first-order probabilistic models (either directed or undirected). PFL is also capable of solving probabilistic queries on these models through the implementation of four inference algorithms: variable elimination, belief propagation, lifted variable elimination and lifted belief propagation. We show how these models can be easily represented using PFL and then we per- form a comparative study between the different inference algorithms in four artificial problems.

Polynomial Time Pattern Matching Algorithm for Ordered Graph Patterns

Ordered graphs, each of whose vertices has a unique order on edges incident to the vertex, can represent graph structured data such as Web pages, TEX sources, CAD and MAP. In this paper, in or- der to design computational machine learning for such data, we propose an ordered graph pattern with ordered graph structures and structured variables. We define an ordered graph language for an ordered graph pattern g as the set of all ordered graphs obtained from g by replacing structured variables in g with arbitrary ordered graphs. We present a polynomial time pattern matching algorithm for determining whether or not a given ordered graph is contained in the ordered graph language for a given ordered graph pattern. We also implement the proposed algorithm on a computer and evaluate the algorithm by reporting and discussing experimental results.

Fast Parameter Learning for Markov Logic Networks Using BayesNets

Markov Logic Networks (MLNs) are aprominent statistical relational model that have been proposed as a unifying framework for statistical relational learning. As part of this unification, their authors proposed methods for convert- ing other statistical relational learners into MLNs. For converting a first order Bayes net into an MLN, it was suggested to moralize the Bayes net to obtain the structure of the MLN and then use the log of the conditional probability table entries to calculate the weight of the clauses. This conversion is exact for convert- ing propositional Markov networks to propositional Bayes nets however, it fails to perform well for the relational case. We theoretically analyze this conversion and introduce new methods of converting a Bayes net into an MLN. An extended imperial evaluation on five datasets indicates that our conversion method outper- forms previous methods.

Bounded Least General Generalization

We study a generalization of Plotkin’s least general generalization. We introduce a novel concept called bounded least general generalization w.r.t. a set of clauses and show an instance of it for which polynomial-time reduction procedures exist. We demonstrate the practical utility of our approach in experiments on several relational learning datasets.

Itemset-Based Variable Construction in Multi-relational Supervised Learning

In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relation- ship. In this paper, we introduce an itemset based framework for constructing variables in secondary tables and evaluating their conditional information for the supervised classification task. We introduce a space of itemset based models in the secondary table and conditional density estimation of the related constructed variables. A prior distribution is defined on this model space, resulting in a parameter-free criterion to assess the relevance of the constructed variables. A greedy algorithm is then proposed in order to explore the space of the considered itemsets. Experiments on multi-relationalal datasets confirm the advantage of the approach.

A Declarative Modeling Language for Concept Learning in Description Logics

Learning in Description Logics (DLs) has been paid increasing attention over the last decade. Several and diverse approaches have been proposed which however share the common feature of extending and adapting previous work in Concept Learning to the novel representation framework of DLs. In this paper we present a declarative modeling language for Concept Learning in DLs which relies on recent results in the fields of Knowledge Representation and Machine Learning. Based on second-order DLs, it allows for modeling Concept Learning problems as constructive DL reasoning tasks where the construction of the solution to the problem may be subject to optimality criteria.

Identifying Driver’s Cognitive Load Using Inductive Logic Programming

This paper uses inductive logic programming (ILP) to identify a driver’s cognitive state in real driving situations to determine whether a driver will be ready to select a suitable operation and recommended service in the next generation car navigation systems. We mea- sure the driver’s eye movement and collect various data such as braking, acceleration and steering angles that are qualitatively interpreted and represented as background knowledge. A set of data about the driver’s degree of tension or relaxation regarded as a training set is obtained from the driver’s mental load based on resource-limited cognitive process analysis. Given such information, our ILP system has successfully produced logic rules that are qualitatively understandable for rule verification and are actively employed for user-oriented interface design. Realistic experiments were conducted to demonstrate the learning performance of this approach. Reasonable accuracy was achieved for an appropriate service providing safe driving.

Opening Doors: An Initial SRL Approach

Opening doors is an essential task that a robot should per- form. In this paper, we propose a logical approach to predict the action of opening doors, together with the action point where the action should be performed. The input of our system is a pair of bounding boxes of the door and door handle, together with background knowledge in the form of logical rules. Learning and inference are performed with the probabilistic programming language ProbLog. We evaluate our approach on a doors dataset and we obtain encouraging results. Additionally, a com- parison to a propositional decision tree shows the benefits of using a probabilistic programming language such as ProbLog.

Probing the Space of Optimal Markov Logic Networks for Sequence Labeling

Discovering relational structure between input features in sequence labeling models has shown to improve their accuracies in several problem settings. The problem of learning relational structure for sequence labeling can be posed as learning Markov Logic Networks (MLN) for sequence labeling, which we abbreviate as Markov Logic Chains (MLC). This objective in propositional space can be solved efficiently and optimally by a Hierarchical Kernels based approach, referred to as StructRELHKL, which we recently proposed. However, the applicability of StructRELHKL in complex first order settings is non-trivial and challenging. We present the challenges and possibilities for optimally and simultaneously learning the structure as well as parameters of MLCs (as against learning them separately and/or greedily). Here, we look into leveraging the StructRELHKL approach for optimizing the MLC learn- ing steps to the extent possible. To this end, we categorize first order MLC features based on their complexity and show that complex features can be constructed from simpler ones. We define a self-contained class of features called absolute features (AF), which can be conjoined to yield complex MLC features. Our approach first generates a set of relevant AFs and then makes use of the algorithm for StructRELHKL to learn their optimal conjunctions. We demonstrate the efficiency of our approach by evaluating on a publicly available activity recognition dataset.

What Kinds of Relational Features Are Useful for Statistical Learning?

A workmanlike, but nevertheless very effective combination of statistical and relational learning uses a statistical learner to construct models with features identified (quite often, separately) by a relational learner. This form of model-building has a long history in Inductive Logic Programming (ILP), with roots in the early 1990s with the LINUS system. Additional work has also been done in the field under the categories of propositionalisation and relational subgroup discovery, where a distinction has been made between elementary and non-elementary features, and statistical models have been constructed using one or the other kind of feature. More recently, constructing relational features has become an essential step in many model-building programs in the emerg- ing area of Statistical Relational Learning (SRL). To date, not much work—theoretical or empirical—has been done on what kinds of relational features are sufficient to build good statistical models. On the face of it, the features that are needed are those that capture diverse and complex relational structure. This suggests that the feature-constructor should examine as rich a space as possible, in terms of relational descriptions. One example is the space of all possible features in first-order logic, given constraints of the problem being addressed. Practically, it may be intractable for a relational learner to search such a space effectively for features that may be potentially useful for a statistical learner. Additionally, the statistical learner may also be able to capture some kinds of complex structure by combining simpler features together. Based on these observations, we investigate empirically whether it is acceptable for a relational learner to examine a more restricted space of features than that actually necessary for the full statistical model. Specifically, we consider five sets of features, partially ordered by the subset relation, bounded on top by the set Fd, the set of features corresponding to definite clauses subject to domain-specific restrictions; and bounded at the bottom by Fe, the set of “elementary” features with substantial additional constraints. Our results suggest that: (a) For relational datasets used in the ILP literature, features from Fd may not be required; and (b) Models obtained with a standard statistical learner with features from subsets of features are comparable to the best obtained to date.

Learning Dishonesty

Children behave dishonestly as a way of managing problems in daily life. Then our primary interest of this paper is how children learn dishonesty and how one could model human acquisition of dishonesty using machine learning techniques. We first observe the structural similarities between dishonest reasoning and induction, and then characterize mental processes of dishonest reasoning using logic programming. We argue how one develops behavioral rules for dishonest acts and refines them to more advanced rules.

Heuristic Inverse Subsumption in Full-Clausal Theories

Inverse entailment (IE) is a fundamental approach for hypothesis finding in explanatory induction. Some IE systems can find any hypothesis in full-clausal theories, but need several non-deterministic operators that treat the inverse relation of entailment. In contrast, inverse subsumption (IS) is an alternative approach for finding hypotheses with the inverse relation of subsumption. Recently, it has been shown that IE can be logically reduced into a new form of IS, provided that it ensures the completeness of finding hypotheses in full-clausal theories. On the other hand, it is still open to clarify how the complete IS works well in the practical point of view. For the analysis, we have implemented it with heuristic lattice search techniques used in the state-of-the-art ILP systems. This paper first sketches our IS system, and then shows its experimental result suggesting that the complete IS can practically find better hypotheses with high predictive accuracies.

Learning Unordered Tree Contraction Patterns in Polynomial Time

In this paper, we present a concept of edge contraction-based tree-structured patterns as a graph pattern suited to represent tree- structured data. A tree contraction pattern (TC-pattern) is an unordered tree-structured pattern common to a given tree-structured data, which is obtained by merging every uncommon connected substructure into one vertex by edge contraction. In this paper, in order to establish an algorithmic foundation for the discovery of knowledge from tree-structured data, we show that TC-patterns are learnable in polynomial time.

In the next article, we will discuss ILP2016.