Inductive logic Programming 2008

Machine Learning Technology  Artificial Intelligence Technology  Natural Language Processing Technology  Semantic Web Technology  Ontology Technology  Reasoning Technology   Knowledge Information Technology  Collecting AI Conference Papers    Digital Transformation Technology

From Inductive logic Programming 2008

The 18th International Conference on Inductive Logic Programming was held in Prague, September 10–12, 2008. ILP returned to Prague after 11 years, and it is tempting to look at how the topics of interest have evolved during that time. The ILP community clearly continues to cherish its beloved first-order logic representation framework. This is legitimate, as the work presented at ILP 2008 demonstrated that there is still room for both extending established ILP approaches (such as inverse entailment) and exploring novel logic induction frameworks (such as brave induction). Besides the topics lending ILP research its unique focus, we were glad to see in this year’s proceedings a good num- ber of papers contributing to areas such as statistical relational learning, graph mining, or the semantic web. To help open ILP to more mainstream research areas, the conference featured three excellent invited talks from the domains of the semantic web (Frank van Harmelen), bioinformatics (Mark Craven) and cognitive sciences (Josh Tenenbaum). We deliberately looked for speakers who are not directly involved in ILP research. We further invited a tutorial on statis- tical relational learning (Kristian Kersting) to meet the strong demand to have the topic presented from the ILP perspective. Lastly, Stefano Bertolo from the European Commission was invited to give a talk on the ideal niches for ILP in the current EU-supported research on intelligent content and semantics.

For the main technical track, we received 46 abstracts followed by 36 full- paper submissions. Eight of them were accepted and sixteen rejected after re- views. The remaining 12 papers were given a conditionally-accepted status and their authors were given an additional three weeks to revise them. The revised papers were then re-reviewed by the program chairs and found acceptable for publication. It is our belief that the extra work demands we laid on the authors this year were an effective means of ensuring the quality of the conference. ILP 2008 also received 22 short submissions for the late-breaking papers track, which were reviewed separately. The accepted short papers appear in a separate pro- ceedings book. Extended versions of selected papers from both conference tracks will appear in a special issue of the Machine Learning Journal.

Organizing ILP 2008 was a great experience, thanks to the excellent help we received on several fronts. We are indebted to our generous sponsors who made possible the trips of the invited speakers (sponsored by the US Air Force Eu- ropean Office of Aerospace Research and Development, the PASCAL2 Network of Excellence, the Czech Society for Cybernetics and Informatics and the Euro- pean Commission) and the best student paper prize (sponsored by the Machine Learning Journal). We are equally grateful to Springer for collaborating so flex- ibly and pro-actively in preparing these proceedings and the Machine Learning Journal special issue. Thanks go as well to the diligent program committee for their reviews of the submitted papers. Their review activity was supported by the

MyReview System, which proved to be a powerful, yet easy-to-use, conference management software.

ILP would be just three letters were it not for the authors of the submitted papers. To them goes our foremost gratitude. Keep up the good work and submit to ILP 2009!

Invited Talks

Humanknowledgeoftheworldisoftenexpressedintheform of intuitive theories: systems of abstract concepts that organize, predict and explain our observations of the world. How are these powerful knowl- edge structures represented and acquired? I will describe computational frameworks for modeling people’s intuitive theories and theory-building processes, and some ways of testing these models experimentally with human learners. Our models of human learning and inference build on core approaches in Bayesian artificial intelligence, statistical relational learning and inductive logic programming, but also suggest new ways to extend these machine learning and reasoning approaches to more human- like capacities.

Statistical relational learning addresses one of the central questions of artificial intelligence: the integration of probabilistic reason- ing with first order logic representations and machine learning. A rich variety of different formalisms and learning techniques have been devel- oped. This tutorial provides an gentle introduction to and an overview of the state-of-the-art in statistical relational learning. It starts from clas- sical settings for inductive logic programming and shows how they can be extended with probabilistic methods. It touches upon lifted inference and recent developments in nonparametric approaches to statistical re- lational learning. While doing so, it reviews state-of-the-art statistical relational learning approaches.

Even at first glance, ILP and the Semantic Web have much in common. Both are about large volumes of data, both make use of background knowledge , and both use computationally tractable forms of logic. Nevertheless, the actual intersection of the two research areas is very small. In this talk, I will first give a birds eye overview of the Semantic Web programme (its goals, its methods, its achievements to date, and the important open challenges). I will then consider the most obvious use of ILP for the Semantic Web: could ILP be used to learn the ontologies that are such a crucial ingredient in the Semantic Web story? However, as with any result from Machine Learning, such ontologies will not be fully correct and complete. This will require that, from its side, the Semantic Web community must learn how to deal with such partially incomplete and incorrect ontologies. I will present the most recent work in this direction, the efforts to build LarKC, the Large Knowledge, a platform for infinitely scaleable distributed incomplete Semantic Web reasoning. Could the Large Knowledge Collider be the place where ILP and the Semantic Web finally meet?

A central challenge in computational biology is to uncover the mechanisms and cellular circuits that govern how the expression of various genes is controlled in response to a cell’s environment. This challenge presents many interesting opportunities for machine-learning methods, especially those that employ expressive representations. In this talk, I will discuss recent research in using machine-learning approaches to (i) recognize regulatory elements in genomic sequences, (ii) uncover networks of interactions among genes, and (iii) characterize the cel- lular responses induced by various stimuli. I will highlight tasks that call for models that use expressive representations, and discuss lessons learned about what types of representational attributes are important for these tasks.

Physical tools like the lever and the loom amplify human strength and dexterity and are often introduced, as in the case of the agricultural harvester, because a sudden abundance of one quantity of interest (wheat fields) introduces scarcity in other related quantities (the human labor needed to harvest them). The rate at which digital infor- mation is becoming available (both to organisations and individuals) is creating a similar scarcity in our ability to interpret it to make decisions that benefit us. In this talk I will describe funding opportunities that the EU will make available through its Framework Programme 7 to tackle this scarcity. I will discuss several trends that make the ILP community ideally placed to contribute to these efforts, together with some general patterns that have proved very effective in the engineering of successful proposals in the recent past.

Research Papers

Thefeasibilityofsymboliclearningstronglyreliesontheeffi- ciency of heuristic search in the hypothesis space. However, recent works in relational learning claimed that the phase transition phenomenon which may occur in the subsumption test during search acts as a plateau for the heuristic search, strongly hindering its efficiency. We further de- velop this point by proposing a learning problem generator where it is shown that top-down and bottom-up learning strategies face a plateau during search before reaching a solution. This property is ensured by the underlying CSP generator, the RB model, that we use to exhibit a phase transition of the subsumption test. In this model, the size of the current hypothesis maintained by the learner is an order parameter of the phase transition and, as it is also the control parameter of heuristic search, the learner has to face a plateau during the problem resolution. One advantage of this model is that small relational learning problems with interesting properties can be constructed and therefore can serve as a benchmark model for complete search algorithms used in learning. We use the generator to study complete informed and non-informed search algorithms for relational learning and compare their behaviour when fac- ing a phase transition of the subsumption test. We show that this gen- erator exhibits the pathological case where informed learners degenerate into non-informed ones.

(Multi-)relational regression consists of predicting continu- ous response of target objects called reference objects by taking into account interactions with other objects called task-relevant objects. In re- lational databases, reference objects and task-relevant objects are stored in distinct data relations. Interactions between objects are expressed by means of (many-to-one) foreign key constraints which may allow linking explanatory variables of a task-relevant ob ject in several alternative ways to the response variable. By materializing multiple as- signments in distinct attribute-value vectors, a reference object is repre- sented as a bag of multiple instances, although there is only one response value for the entire bag. This works points out the same assumption of multi-instance learning that is a primary instance is responsible for the observed response value of a reference object. We propose a top-down induction multi-relational model tree system which navigates foreign key constraints according to a divide-and-conquer strategy, derives a repre- sentation of reference objects as bags of attribute-value vectors and then, for each bag, constructs a primary instance as main responsible of the response value. Coefficients of local hyperplane are estimated in an EM implementation of the stepwise least square regression. Experiments con- firm the improved accuracy of our proposal with respect to traditional attribute-value and relational model tree learners.

TheproblemofdeterminingtheWorseCaseExecutionTime (WCET) of a piece of code is a fundamental one in the Real Time Sys- tems community. Existing methods either try to gain this information by analysis of the program code or by running extensive timing analy- ses. This paper presents a new approach to the problem based on using Machine Learning in the form of ILP to infer program properties based on sample executions of the code. Additionally, significant improvements in the range of functions learnable and the time taken for learning can be made by the application of more advanced ILP techniques

Markov Logic Networks (MLNs) combine Markov networks and first-order logic by attaching weights to first-order formulas and viewing these as templates for features of Markov networks. Learning the structure of MLNs is performed by state-of-the-art methods by max- imizing the likelihood of a relational database. This can lead to subop- timal results given prediction tasks. On the other hand better results in prediction problems have been achieved by discriminative learning of MLNs weights given a certain structure. In this paper we propose an algorithm for learning the structure of MLNs discriminatively by max- imimizing the conditional likelihood of the query predicates instead of the joint likelihood of all predicates. The algorithm chooses the struc- tures by maximizing conditional likelihood and sets the parameters by maximum likelihood. Experiments in two real-world domains show that the proposed algorithm improves over the state-of-the-art discriminative weight learning algorithm for MLNs in terms of conditional likelihood. We also compare the proposed algorithm with the state-of-the-art gen- erative structure learning algorithm for MLNs and confirm the results in [22] showing that for small datasets the generative algorithm is compet- itive, while for larger datasets the discriminative algorithm outperfoms the generative one.

We describe an experiment in the application of ILP to autonomous discovery in a robotic domain. An autonomous robot is per- forming experiments in its world, collecting data and formulating pre- dictive theories about this world. In particular, we are interested in the robot’s “gaining insights” through predicate invention. In the first ex- perimental scenario in a pushing blocks domain, the robot discovers the notion of objects’ movability. The second scenario is about discovering the notion of obstacle. We describe experiments with a simulated robot, as well as an experiment with a real robot when robot’s observations contain noise.

Theory revision systems are designed to improve the accu- racy of an initial theory, producing more accurate and comprehensible theories than purely inductive methods. Such systems search for points where examples are misclassified and modify them using revision opera- tors. This includes trying to add antecedents in clauses usually generated in a top-down approach, considering all the literals of the knowledge base. This leads to a huge search space which dominates the cost of the re- vision process. ILP Mode Directed Inverse Entailment systems restrict the search for antecedents to the literals of the bottom clause. In this work the bottom clause and modes declarations are introduced to im- prove the efficiency of theory revision antecedent addition. Experimental results compared to FORTE revision system show that the runtime of the revision process is on average three orders of magnitude faster, and generate more comprehensible theories without decreasing the accuracy. Moreover, the proposed theory revision approach significantly improves predictive accuracy over theories generated by Aleph system.

In this paper we focus on learning concept descriptions ex- pressed in Description Logics. After stating the learning problem in this context, a FOIL-like algorithm is presented that can be applied to general DL languages, discussing related theoretical aspects of learning with the inherent incompleteness underlying the semantics of this representation. Subsequently we present an experimental evaluation of the implemen- tation of this algorithm performed on some real ontologies in order to empirically assess its performance.

  • Feature Discovery with Type Extension Trees

    Weareinterestedinlearningcomplexcombinatorialfeaturesfromre- lational data. We rely on an expressive and general representation language whose semantics allows us to express many features that have been used in different sta- tistical relational learning settings. To avoid expensive exhaustive search over the space of relational features, we introduce a heuristic search algorithm guided by a generalized relational notion of information gain and a discriminant function. The algorithm succesfully finds interesting and interpretable features on artificial and real-world relational learning problems

  • Feature Construction Using Theory-Guided Sampling and Randomised Search

It has repeatedly been found that very good predictive mod- els can result from using Boolean features constructed by an an Induc- tive Logic Programming (ILP) system with access to relevant relational information. The process of feature construction by an ILP system, some- times called “propositionalization”, has been largely done either as a pre-processing step (in which a large set of possibly useful features are constructed first, and then a predictive model is constructed) or by tightly coupling feature construction and model construction (in which a predictive model is constructed with each new feature, and only those that result in a significant improvement in performance are retained). These represent two extremes, similar in spirit to filter and wrapper- based approaches to feature selection. An interesting, third perspective on the problem arises by taking search-based view of feature construc- tion. In this, we conceptually view the task as searching through subsets of all possible features that can be constructed by the ILP system. Clearly an exhaustive search of such a space will usually be intractable. We re- sort instead to a randomised local search which repeatedly constructs randomly (but non-uniformly) a subset of features and then performs a greedy local search starting from this subset. The number of possible fea- tures usually prohibits an enumeration of all local moves. Consequently, the next move in the search-space is guided by the errors made by the model constructed using the current set of features. This can be seen as sampling non-uniformly from the set of all possible local moves, with a view of selecting only those capable of improving performance. The result is a procedure in which a feature subset is initially generated in the pre-processing style, but further alterations are guided actively by actual model predictions. We test this procedure on language processing task of word-sense disambiguation. Good models have previously been obtained for this task using an SVM in conjunction with ILP features constructed in the pre-processing style. Our results show an improve- ment on these previous results: predictive accuracies are usually higher, and substantially fewer features are needed.

LP is a major approach to Relational Learning that ex- ploits previous results in concept learning and is characterized by the use of prior conceptual knowledge. An increasing amount of conceptual knowledge is being made available in the form of ontologies, mainly for- malized with Description Logics (DLs). In this paper we consider the problem of learning rules from observations that combine relational data and ontologies, and identify the ingredients of an ILP solution to it. Our proposal relies on the expressive and deductive power of the KR frame- work DL+log that allows for the tight integration of DLs and disjunctive Datalog with negation. More precisely we adopt an instantiation of this framework which integrates the DL SHIQ and positive Datalog. We claim that this proposal lays the foundations of an extension of Relational Learning, called Onto-Relational Learning, to account for ontologies

We identify a shortcoming of a standard positive-only clause evaluation function within the context of learning biological grammars. To overcome this shortcoming we propose L-modification, a modification to this evaluation function such that the lengths of individual examples are considered. We use a set of bio-sequences known as neuropeptide pre- cursor middles (NPP-middles). Using L-modification to learn from these NPP-middles results in induced grammars that have a better perfor- mance than that achieved when using the standard positive-only clause evaluation function. We also show that L-modification improves the per- formance of induced grammars when learning on short, medium or long NPPs-middles. A potential disadvantage of L-modification is discussed. Finally, we show that, as the limit on the search space size increases, the greater is the increase in predictive performance arising from L- modification.

Hidden Markov Models (HMM) have been successfully used in applications such as speech recognition, activity recognition, bioin- formatics etc. There have been previous attempts such as Hierarchical HMMs and Abstract HMMs to elegantly extend HMMs at multiple levels of temporal abstraction (for example to represent the user’s activities). Similarly, there has been previous work such as Logical HMMs on ex- tending HMMs to domains with relational structure. In this work we develop a representation that naturally combines the power of both rela- tional and hierarchical models in the form of Logical Hierarchical Hidden Markov Models (LoHiHMMs). LoHiHMMs inherit the compactness of representation from Logical HMMs and the tractability of inference from Hierarchical HMMs. We outline two inference algorithms: one based on grounding the LoHiHMM to a propositional HMM and the other based on particle filtering adapted for this setting. We present the results of our experiments with the model in two simulated domains.

We tackle the problem of statistical learning in the stan- dard knowledge base representations for the Semantic Web which are ultimately expressed in description Logics. Specifically, in our method a kernel functions for the ALCN logic integrates with a support vector machine which enables the usage of statistical learning with reference representations. Experiments where performed in which kernel classifi- cation is applied to the tasks of resource retrieval and query answering on OWL ontologies.

Integrating heterogeneous data from sources as diverse as web pages, digital libraries, knowledge bases, the Semantic Web and databases is an open problem. The ultimate aim of our work is to be able to query such heterogeneous data sources as if their data were conveniently held in a single relational database. Pursuant to this aim, we propose a generalisation of joins from the relational database model to enable joins on arbitrarily complex structured data in a higher- order representation. By incorporating kernels and distances for structured data, we further extend this model to support approximate joins of heterogeneous data. We demonstrate the flexibility of our approach in the publications domain by evaluating example approximate queries on the CORA data sets, joining on types ranging from sets of co-authors through to entire publications.

Statistical Relational Learning has received much attention this last decade. In the ILP community, several models have emerged for modelling and learning uncertain knowledge, expressed in subset of first order logics. Neverthe- less, no deep comparisons have been made among them and, given an application, determining which model must be chosen is difficult. In this paper, we compare two of them, namely Markov Logic Networks and Bayesian Programs, especially with respect to their representation ability and inference methods. The compari- son shows that the two models are substantially different, from the point of view of the user, so that choosing one really means choosing a different philosophy to look at the problem. In order to make the comparison more concrete, we have used a running example, which shows most of the interesting points of the ap- proaches, yet remaining exactly tractable.

Brave Induction

  • Brave Induction

    This paper considers the following induction problem. Given the background knowledge B and an observation O, find a hypothesis H such that a consistent theory B H has a minimal model satisfy- ing O. We call this type of induction brave induction. Brave induction is different from explanatory induction in ILP, which requires that O is satisfied in every model of B H. Brave induction is useful for learn- ing disjunctive rules from observations, or learning from the background knowledge containing indefinite or incomplete information. We develop an algorithm for computing brave induction, and extend it to induction in answer set programming.

  • A Statistical Approach to Incremental Induction of First-Order Hierarchical Knowledge Bases

Knowledgebasesplayanimportantroleinmanyformsofar- tificial intelligence research. A simple approach to producing such knowl- edge is as a database of ground literals. However, this method is neither compact nor computationally tractable for learning or performance sys- tems to use. In this paper, we present a statistical method for incremental learning of a hierarchically structured, first-order knowledge base. Our approach uses both rules and ground facts to construct succinct rules that generalize the ground literals. We demonstrate that our approach is computationally efficient and scales well to domains with many relations

ILP systems which use some form of Inverse Entailment (IE) are based on clause refinement through a hypotheses space bounded by a most specific clause. In this paper we give a new analysis of refinement operators in this setting. In particular, Progol’s refinement operator is re- visited and discussed. It is known that Progol’s refinement operator is in- complete with respect to the general subsumption order. We introduce a subsumption order relative to a most specific (bottom) clause. This sub- sumption order, unlike previously suggested orders, characterises Progol’s refinement space. We study the properties of this subsumption order and show that ideal refinement operators exist for this order. It is shown that efficient operators can be implemented for least generalisation and great- est specialisation in the subsumption order relative to a bottom clause. We also study less restricted subsumption orders relative to a bottom clause and show how Progol’s incompleteness can be addressed

In various application domains, data can be represented as bags of vectors. Learning functions over such bags is a challenging problem. In this paper, a neural network approach, based on cascade-correlation networks, is proposed to handle this kind of data. By defining special ag- gregation units that are integrated in the network, a general framework to learn functions over bags is obtained. Results on both artificially created and real-world data sets are reported

An outerplanar graph is a planar graph which can be em- bedded in the plane in such a way that all of vertices lie on the outer boundary. Many chemical compounds are known to be expressed by out- erplanar graphs. We proposed a block preserving outerplanar graph pat- tern (bpo-graph pattern, for short) as a graph pattern common to a set of outerplanar graphs like a dataset of chemical compounds. In this paper, firstly we give a polynomial time algorithm for finding a minimally gen- eralized bpo-graph pattern explaining a given set of outerplanar graphs. Secondly we give a pattern mining algorithm for enumerating all maxi- mal frequent bpo-graph patterns in a set of outerplanar graphs. Finally, in order to show the performance of the pattern mining algorithm, we report experimental results on chemical datasets.

In the next article, we will discuss ILP2009.

コメント

タイトルとURLをコピーしました