From Inductive logic Programming 2011 Proceedings

Machine Learning Technology  Artificial Intelligence Technology  Natural Language Processing Technology  Semantic Web Technology  Ontology Technology  Reasoning Technology   Knowledge Information Technology  Collecting AI Conference Papers    Digital Transformation Technology

ILP 2011 21st International Conference, Inductive Logic Programming

In the previous article, we discussed ILP2010. This time, we will discuss ILP2011, which was held at Cumberland Lodge in the UK from July 31 to August 3, 2011, under the auspices of the Department of Computing of Imperial College London.

ILP 2011 marked 20 years since the first ILP workshop in 1991, which was held at Cumberland Lodge in Great Windsor Park. During this time, the ILP conference has evolved into the premier research on logic-based machine learning. The format of the proceedings of the 21st International Conference on Inductive Logic Programming (ILP2011) will be similar to previous conferences, and in particular to the format used at ILP2006. Submissions are requested in two phases. In the first phase, a 6-page short paper is prepared and presented at the conference and posted on the conference website prior to the conference. In the second phase, reviewers are selected to submit long papers (up to 15 pages). These papers were evaluated by the same reviewers to determine which papers would be published in the journal’s special issue and proceedings.

In the first phase, 66 papers were submitted. Each paper was reviewed by three reviewers. Thirty-one of these papers were invited as full-length papers. Of the full-length papers, 5 were accepted for a special issue of the Journal of Machine Learning and 24 were accepted for the proceedings. In addition, one paper was nominated by the PC referee for the Application Prize (“Michie ILP Application Prize” sponsored by Syngenta) and one for the Theory Prize (“Turing ILP Theory Prize” sponsored by Machine Learning Journal). The papers in the proceedings represented the diversity and vitality of current ILP research, including ILP theory, implementation, probabilistic ILP, biological applications, subgroup discovery, grammatical inference, relational kernels, Petri net learning, spatial learning, graph-based learning, and learning behavioral models.

In addition to the many technical papers presented this year, ILP 2011 featured invited talks by prominent artificial intelligence researchers Hector Geffner, Richard Sutton, and Toby Walsh.

Invited Talks

Planning is the model-based approach to autonomous behaviour where the action to do next is derived from a model. The main challenge in planning is computational, as all models, whether accommodating non-determinism and feedback or not, are intractable in the worst case. In the last few years, however, significant progress has been made resulting in algorithms that can produce plans effectively in a variety of settings. These developments have to do with the formulation and use of general inference techniques and transformations. In this talk, I review the inference techniques that have proven useful for solving individual planning instances, and discuss also the use of learning methods and transformations for solving complete planning domains. The former in- clude the automatic derivation of heuristic functions to guide the search for plans, and the identification of helpful actions and landmarks. The latter include methods for deriving generalized policies and finite state controllers capable of dealing with changes in the initial situation and in the number of objects. I’ll also discuss the alternative ways in which learning can be used in planning and the challenges ahead.

Intelligence can be defined, informally, as knowing a lot and being able to use that knowledge flexibly to achieve one’s goals. In this sense it is clear that knowledge is central to intelligence. However, it is less clear exactly what knowledge is, what gives it meaning, and how it can be efficiently acquired and used. In this talk we re-examine aspects of these age-old questions in light of modern experience (and particularly in light of recent work in reinforcement learning). Such questions are not just of philosophical or theoretical import; they directly effect the practicality of modern knowledge-based systems, which tend to become unwieldy and brittle difficult to change—as the knowledge base becomes large and diverse.

The key question for knowledge intensive intelligent systems is ‘What keeps the knowledge correct?’ and there have been essentially three kinds of answers: 1) people—human experts understand the knowledge and ensure that it matches their beliefs, 2) internal consistency—the system checks that its knowledge co- heres, and removes inconsistencies, and 3) grounding in data the system compares its knowledge with external data in some way and changes it as needed to match the data. All of these are valid and often useful ways to maintain correct knowledge, but, in practice, relying on people to maintain correctness has been the dominant approach, supplemented by checks for internal consistency. This approach is well suited to taking advantage of existing human expertise, but ulti- mately limited in its ability to scale to very large knowledge bases because of its reliance on people. The essence of this approach is that knowledge is essentially public, describing a state of affairs in the world (separate from the intelligent system) that is at least potentially accessible to people. This might be called the public-knowledge approach.

In this talk we consider an alternative to the public-knowledge approach that is based on keeping knowledge correct by grounding it in data. Consider the case in which the data involved is the ordinary data available during the routine operation of the intelligent system without human intervention. This is the case of greatest interest because in it the system can correct and learn its knowledge autonomously, enabling scaling to very large knowledge bases (see Sutton 2009, 2001). If the system were a robot, this data would be simply whatever data was available through its sensors and about its motor actions. Knowledge grounded in such sensorimotor data may have no public semantics; it is tightly bound to the individual and need not be accessible to other observers in any useful way. Knowledge in this sensorimotor approach is essentially private, personal,

Constraints can be exploited in paradigms outside of constraint pro- gramming. In particular, powerful global constraints can often be decomposed into small primitives and these decompositions can simulate complex propagation algorithms that perform sophisticated inference about a problem. We illustrate this approach with examples of exploiting constraints in propositional satisfiability (SAT), pseudo Boolean (PB) solving, integer linear programming (ILP) and answer set programming (ASP).

Special Issue Extended Abstracts

This paper presents a method for approximating posterior distributions over the parameters of a given PRISM program. A sequen- tial approach is taken where the distribution is updated one datapoint at a time. This makes it applicable to online learning situations where data arrives over time. The method is applicable whenever the prior is a mixture of products of Dirichlet distributions. In this case the true posterior will be a mixture of very many such products. An approximation is effected by merging products of Dirichlet distributions. An analysis of the quality of the approximation is presented. Due to the heavy computational burden of this approach, the method has been implemented in the Mercury logic programming language. Initial results using a hidden Markov model are presented.

Markov Logic Networks (MLNs) are a prominent model class that generalizes both first-order logic and undirected graphical models (Markov networks). The qualitative component of an MLN is a set of clauses and the quantitative component is a set of clause weights. Genera- tive MLNs model the joint distribution of relationships and attributes. A state-of-the-art structure learning method is the moralization approach: learn a 1st-order Bayes net, then convert it to conjunctive MLN clauses. The moralization approach takes advantage of the high quality inference algorithms for MLNs and their ability to handle cyclic dependencies. A weakness of the moralization approach is that it leads to an unnecessarily large number of clauses. In this paper we show that using decision trees to represent conditional probabilities in the Bayes net is an effective rem- edy that leads to much more compact MLN structures. The accuracy of predictions is competitive with the unpruned model and in many cases superior.

Like relational probabilistic models, the need for relational preference models naturally arises in real world applications involving multiple, heterogeneous, and richly interconnected objects. On the one hand, relational preferences should be represented into statements which are natural for human users to express. On the other hand, relational preference models should be endowed with a structure that supports tractable forms of reasoning and learning. This paper introduces the framework of conditional preference relational networks (CPR-nets), that maintains the spirit of the popular “CP-nets” by expressing relational preferences in a natural way using the ceteris paribus semantics. We show that acyclic CPR-nets support tractable inference for optimization and ranking tasks. In addition, we show that in the online learning model, tree-structured CPR-nets are efficiently learnable from both optimization tasks and ranking tasks. Our results are corroborated with experiments on a large-scale movie recommendation dataset.

ProbLog is a probabilistic extension of Prolog. Given the complexity of exact inference under ProbLog’s semantics, in many applications in machine learning approximate inference is necessary. Current approximate inference algorithms for ProbLog however require either dealing with large numbers of proofs or do not guarantee a low approximation error. In this paper we introduce a new approximate inference algorithm which addresses these shortcomings. Given a user-specified parameter k, this algorithm approximates the success probability of a query based on at most k proofs and ensures that the calculated probability p is (1 1/e)pp p, where pis the highest probability that can be calculated based on any set of k proofs.

Recently, there has been an increasing interest in generative relational models that represent probabilistic patterns over both links and attributes. A key characteristic of relational data is that the value of a predicate often depends on values of the same predicate for related entities. In this paper we present a new approach to learning directed relational models which utilizes two key concepts: a pseudo likelihood measure that is well defined for recursive dependencies, and the notion of stratification from logic programming. An issue for modelling recursive dependencies with Bayes nets are redundant edges that increase the complexity of learning. We propose a new normal form for 1st-order Bayes nets that removes the redundancy, and prove that assuming stratification, the normal form constraints involve no loss of modelling power. We incorporate these constraints in the learn-and-join algorithm of Khosravi et al., which is a state-of-the art structure learning algorithm that up- grades propositional Bayes net learners for relational data. Emprical evaluation compares our approach to learning recursive dependencies with undirected models (Markov Logic Networks). The Bayes net approach is orders of magnitude faster, and learns more recursive dependencies, which lead to more accurate predictions.

Research Papers

Inductive Logic Programming can be used to provide automated support to help correct the errors identified by model check- ing, which in turn provides the relevant context for learning hypotheses that are meaningful within the domain of interest. Model checking and Inductive Logic Programming can thus be seen as two complementary approaches with much to gain from their integration. In this paper we present a general framework for such an integration, discuss its main characteristics and present an overview of its application.

There is a growing interest in the field of Probabilistic Inductive Logic Programming, which uses languages that integrate logic programming and probability. Many of these languages are based on the distribution semantics and recently various authors have proposed systems for learning the parameters (PRISM, LeProbLog, LFI-ProbLog and EMBLEM) or both the structure and the parameters (SEM-CP-logic) of these languages. EMBLEM for example uses an Expectation Maximization approach in which the expectations are computed on Binary Decision Diagrams. In this paper we present the algorithm SLIPCASE for “Structure LearnIng of ProbabilistiC logic progrAmS with Em over bdds”. It performs a beam search in the space of the language of Logic Programs with Annotated Disjunctions (LPAD) using the log likelihood of the data as the guiding heuristics. To estimate the log likelihood of theory refinements it performs a limited number of Expectation Maximization iterations of EMBLEM. SLIPCASE has been tested on three real world datasets and compared with SEM-CP-logic and Learning using Structural Motifs, an algorithm for Markov Logic Networks. The results show that SLIPCASE achieves higher areas under the precision-recall and ROC curves and is more scalable.

We propose an approach to subgroup discovery in relational databases containing numerical attributes. The approach is based on detecting bumps in histograms constructed from substitution sets resulting from matching a first-order query against the input relational database. The approach is evaluated on seven data sets, discovering interpretable subgroups. The subgroups’ rate of survival from the training split to the testing split varies among the experimental data sets, but at least on three of them it is very high.

In this paper we discuss the design of an Inductive Logic Programming (ILP) system in Answer Set Programming (ASP) and more in general the problem of integrating the two. We show how to formalise the learning problem as an ASP program and provide details on how the optimisation features of modern solvers can be adapted to derive preferred hypotheses.

The paper presents a new projection operator, named AC- projection, which exhibits good complexity properties as opposed to the graph isomorphism operator typically used in graph mining. We study the size and structure of the search space and some practical proper- ties of the projection operator. These properties give us a specialization algorithm using simple local operations. Then we prove experimentally that we can achieve an important performance gain without or with non-significant loss of discovered patterns quality.

We propose an interleaved inductive abductive model for reasoning about complex spatiotemporal narratives. Typed Inductive Logic Programming (Typed-ILP) is used as a basis for learning the domain theory by generalising from observation data, whereas abductive reasoning is used for noisy data cor- rection by scenario and narrative completion thereby improving the inductive learning to get semantically meaningful event models. We apply the model to an airport domain consisting of video data for 15 turn-arounds from six cam- eras simultaneously monitoring logistical processes concerned with aircraft ar- rival, docking, departure etc and a verbs data set with 20 verbs enacted out in around 2500 vignettes. Our evaluation and demonstration focusses on the synergy afforded by the inductive abductive cycle, whereas our proposed model pro- vides a blue-print for interfacing common-sense reasoning about space, events and dynamic spatiotemporal phenomena with quantitative techniques in activity recognition.

This work presents an optimized version of XMuSer, an ILP based framework suitable to explore temporal patterns available in multi- relational databases. XMuSer’s main idea consists of exploiting frequent sequence mining, an efficient method to learn temporal patterns in the form of sequences. XMuSer framework efficiency is grounded on a new coding methodology for temporal data and on the use of a predictive sequence miner. The frameworks selects and map the most interesting sequential patterns into a new table, the sequence relation. In the last step of our framework, we use an ILP algorithm to learn a classification theory on the enlarged relational database that consists of the original multi-relational database and the new sequence relation.

We evaluate our framework by addressing three classification problems and map each one of three different types of sequential patterns: frequent, closed or maximal. The experiments show that our ILP based framework gains both from the descriptive power of the ILP algorithms and the efficiency of the sequential miners.

”Traditional” clustering, in broad sense, aims at organizing objects into groups (clusters) whose members are “similar” among them and are “dis- similar” to objects belonging to the other groups. In contrast, in conceptual clustering the underlying structure of the data together with the description language which is available to the learner is what drives cluster formation, thus providing intelligible descriptions of the clusters, facilitating their interpretation.

We present a novel conceptual clustering system for multi-relational data, based on the popular kmedoids algorithm. Although clustering is, generally, not straightforward to evaluate, experimental results on several applications show promising results. Clusters generated without class information agree very well with the true class labels of cluster’s members. Moreover, it was possible to obtain intelligible and meaningful descriptions of the clusters.

This paper characterizes the expressive power of a subclass of first-order logical decision trees (FOLDTs) as a fragment of first- order logic. Specifically, using safe FOLDTs one can express precisely the boolean combinations of safe existential sentences.

This paper investigates the problem of computing hypotheses in disjunctive normal form (DNF) for explanatory induction. This is contrasted to the usual setting of ILP, where hypotheses are obtained in conjunctive normal form (CNF), i.e., a set of clauses. We present two approaches to compute DNF hypotheses as well as several sound and complete algorithms. This problem naturally contains abduction from clausal theories, and can be related to model-based inductive reasoning, in which propositional reasoning methods such as SAT techniques and prime implicant computation can be utilized.

Abduction is one of the basic logical inferences (deduction, induction and abduction) and derives the best explanations for our observation. Statistical abduction attempts to define a probability distribution over explanations and to evaluate them by their probabilities. Logic-based probabilistic models (LBPMs) have been developed as a way to combine probabilities and logic, and it enables us to perform statistical abduction. However non-deterministic knowledge like preference and frequency seems difficult to represent by logic. Bayesian inference can reflect such knowledge on a prior distribution, and variational Bayes (VB) is known as an approximation method for it. In this paper, we pro- pose VB for logic-based probabilistic models and show that our proposed method is efficient in evaluating abductive explanations about failure in a logic circuit and a metabolic pathway.

Automatically extracting spatial information is a challenging novel task with many applications. We formalize it as an information extraction step required for a mapping from natural language to a formal spatial representation. Sentences may give rise to multiple spatial relations between words representing landmarks, trajectors and spatial indicators. Our contribution is to formulate the extraction task as a relational learning problem, for which we employ the recently introduced kLog framework. We discuss representational and modeling aspects, kLog’s flexibility in our task and we present current experimental results.

The ILP system Progol is incomplete in not being able to generalise a single example to multiple clauses. This limitation is referred as single-clause learning (SCL) in this paper. However, according to the Blumer bound, incomplete learners such as Progol can have higher predictive accuracy while use less search than more complete learners. This issue is particularly relevant in real-world problems, in which it is un- clear whether the unknown target theory or its approximation is within the hypothesis space of the incomplete learner. This paper uses two real- world applications in systems biology to study whether it is necessary to have complete multi-clause learning (MCL) methods, which is computationally expensive but capable of deriving multi-clause hypotheses that is in the systems level. The experimental results show that in both applications there do exist datasets, in which MCL has significantly higher predictive accuracies than SCL. On the other hand, MCL does not out- perform SCL all the time due to the existence of the target hypothesis or its approximations within the hypothesis space of SCL.

Within ILP much effort has been put into designing methods that are complete for hypothesis finding. However, it is not clear whether completeness is important in real-world applications. This paper uses a simplified version of grammar learning to show how a complete method can improve on the learning results of an incomplete method. Seeing the necessity of having a complete method for real-world applications, we introduce a method called -directed theory co-derivation, which is shown to be correct (ie. sound and complete). The proposed method has been implemented in the ILP system MC-TopLog and tested on gram- mar learning and the learning of game strategies. Compared to Progol5, an efficient but incomplete ILP system, MC-TopLog has higher predic- tive accuracies, especially when the background knowledge is severely incomplete.

This paper presents an approach to the integration of Relational Reinforcement Learning with Answer Set Programming and the Event Calculus. Our framework allows for background and prior knowledge formulated in a se- mantically expressive formal language and facilitates the computationally efficient constraining of the learning process by means of soft as well as compulsive (sub-)policies and (sub-)plans generated by an ASP-solver. As part of this, a new planning-based approach to Relational Instance-Based Learning is proposed. An empirical evaluation of our approach shows a significant improvement of learning efficiency and learning results in various benchmark settings.

Feature Terms are a generalization of first-order terms that have been introduced in theoretical computer science in order to formalize object-oriented capabilities of declarative languages, and which have been recently receiving increased attention for their usefulness in structured machine learning applications. The main obstacle with feature terms (as well as other formal representation languages like Horn clauses or Description Logics) is that the basic operations like subsumption have a very high computational cost. In this paper we model subsumption, antiunification and unification using constraint programming (CP), solving those operations in a more efficient way than using traditional methods.

Genetic Algorithms (GAs) are known for their capacity to explore large search spaces and due to this ability they were applied (to some extent) to Inductive Logic Programming (ILP). Although Estimation of Distribution Algorithms (EDAs) generally perform better than standard GAs, they have not been applied to ILP. This work presents EDA-ILP, an ILP system based on EDA and inverse entailment, and also its extension, the REDA-ILP, which employs the Reduce algorithm in bottom clauses to considerably reduce the search space. Experiments in real-world datasets showed that both systems were successfully compared to Aleph and GA-ILP (another variant of EDA-ILP created replacing the EDA by a standard GA). EDA-ILP was also successfully compared to Progol-QG/GA (and its other variants) in phase transition benchmarks. Additionally, we found that REDA-ILP usually obtains simpler theories than EDA-ILP, more efficiently and with equivalent accuracies. These results show that EDAs provide a good base for stochastic search in ILP. 

We consider an agent which learns a relational action model in order to be able to predict the effects of his actions. The model consists of a set of STRIPS-like rules, i.e. rules predicting what has changed in the current state when applying a given action as far as a set of preconditions is satisfied by the current state. Here several rules can be associated to a given action, therefore allowing to model conditional effects. Learning is online, as examples result from actions performed by the agent, and incremental, as the current action model is revised each time it is contradicted by unexpected effects resulting from his actions. The form of the model allows using it as an input of standard planners.

In this work, the learning unit IRALe 1 is embedded in an integrated system able to i) learn an action model ii) select its actions iii) plan to reach a goal. The agent uses the current action model to perform active learning, i.e. to select actions with the purpose of reaching states that will enforce a revision of the model, and uses its planning abilities to have a realistic evaluation of the accuracy of the model.

To date, the most expressive, and understandable dynamic models of biological systems identified by ILP have employed qualitative differential equations, or QDEs. The QDE representation provides a di- rect and simple abstraction of quantitative ODEs. However, the representation does have several limitations, including the generation of spurious behaviour in simulation, and a lack of methods for handling concurrency, quantitative information or stochasticity. These issues are largely absent in the long-established qualitative representation of Petri nets. A flourishing area of Petri net models for biological systems now exists, which has almost entirely been concerned with hand-crafted models. In this paper we show that pure and extended Petri nets can be represented as special cases of systems in which transitions are defined using a combination of logical constraints and constraints on linear terms. Results from a well-known combinatorial algorithm for identifying pure Petri nets from data and from the ILP literature on inverting entailment form the basis of constructing a maximal set of such transition constraints given data and background knowledge. An ILP system equipped with a constraint solver is then used to determine the smallest subset of transition constraints that are consistent with the data. This has several advantages over using a specialised Petri net learner for biological system identification, most of which arise from the use of background knowledge. As a result: (a) search-spaces can be constrained substantially using semantic and syntactic constraints; (b) we can perform the hierarchical identification of Petri models of large systems by re-use of well-established network models; and (c) we can use a combination of abduction and data-based justification to hypothesize missing parts of a Petri net. We demonstrate these advantages on well-known metabolic and signalling networks.

In this paper we demonstrate that machine learning (using Abductive ILP) can generate plausible and testable food webs from eco- logical data. In this approach, unlike previous applications of Abductive ILP, the abductive predicate ‘eats’ is entirely undefined before the start of the learning. We also explore a new approach, called Hypothesis Frequency Estimation (HFE), for estimating probabilities for hypothetical ‘eats’ facts based on their frequency of occurrence when randomly sampling the hypothesis space. The results of cross-validation tests suggest that the trophic networks with probabilities have higher predictive accuracies compared to the networks without probabilities. The proposed trophic networks have been examined by domain experts and comparison with the literature shows that many of the links are corroborated by the literature. In particular, links ascribed with high frequency are shown to correspond well with those having multiple references in the literature. In some cases novel high frequency links are suggested, which could be tested.

Hedge cue detection is a Natural Language Processing(NLP) task that consists of determining whether sentences contain hedges. These linguistic devices indicate that authors do not or cannot back up their opinions or statements with facts. This binary classification problem, i.e. distinguishing factual versus uncertain sentences, only recently received attention in the NLP community. We use kLog, a new logical and relational language for kernel-based learning, to tackle this problem. We present results on the CoNLL 2010 benchmark dataset that consists of a set of paragraphs from Wikipedia, one of the domains in which uncertainty detection has become important. Our approach shows competitive results compared to state-of-the-art systems.

Evaluations of advantages of Probabilistic Inductive Logic Programming (PILP) against ILP have not been conducted from a computational learning theory point of view. We propose a PILP frame- work, projection-based PILP, in which surjective projection functions are used to produce a “lossy” compression dataset from an ILP dataset. We present sample complexity results including conditions when projection- based PILP needs fewer examples than PAC. We experimentally confirm the theoretical bounds for the projection-based PILP in the Blackjack domain using Cellist, a system which machine learns Probabilistic Logic Automata. In our experiments projection-based PILP shows lower predictive error than the theoretical bounds and achieves substantially lower predictive error than ILP. To the authors’ knowledge this is the first pa- per describing both a computer learning theory and related empirical results on an advantage of PILP against ILP.

CF-induction is a sound and complete procedure for finding hypotheses in full clausal theories. It is based on the principle of Inverse Entailment (IE), and consists of two procedures: construction of a bridge theory and generalization of it. There are two possible ways to realize the generalization task in CF-induction. One uses a single deductive op- erator, called γ-operator, and the other uses a recently proposed form of inverse subsumption. Whereas both are known to retain the complete- ness of CF-induction, their logical relationship and empirical features have not been clarified yet. In this paper, we show their equivalence property and clarify the difference on their search strategies, which often leads to significant features on their obtained hypotheses.

A cograph (complement reducible graph) is a graph which can be generated by disjoint union and complement operations on graphs, starting with a single vertex graph. Cographs arise in many areas of computer science and are studied extensively.

With the goal of developing an effective data mining method for graph structured data, in this paper we introduce a graph pattern expression, called a cograph pattern, which is a special type of cograph having struc- tured variables. Firstly, we present a polynomial time matching algorithm for cograph patterns. Secondly, we give a polynomial time algorithm for obtaining a minimally generalized cograph pattern which explains given positive data. Finally, we show that the class of cograph pattern lan- guages is polynomial time inductively inferable from positive data.

In the next article, we will discuss ILP2012.

コメント

タイトルとURLをコピーしました