Uncertainty Reasoning for the Semantic Web 1

Machine Learning Technology Artificial Intelligence Technology Natural Language Processing Technology Semantic Web Technology Ontology Technology Reasoning Technology Knowledge Information Technology Collecting AI Conference Papers Digital Transformation Technology

This volume contains the proceedings of the first three workshops on Uncertainty Reasoning for the Semantic Web (URSW), held at the International Semantic Web Conferences (ISWC) in 2005, 2006, and 2007. In addition to revised and strongly extended versions of selected workshop papers, we have included invited contributions from leading experts in the field and closely related areas.

With this, the present volume represents the first comprehensive compilation of state-of-the-art research approaches to uncertainty reasoning in the context of the Semantic Web, capturing different models of uncertainty and approaches to deductive as well as inductive reasoning with uncertain formal knowledge.

The World Wide Web community envisions effortless interaction between humans and computers, seamless interoperability and information exchange among Web applications, and rapid and accurate identification and invocation of appropriate Web services. As work with semantics and services grows more ambitious, there is increasing appreciation of the need for principled approaches to the formal representation of and reasoning under uncertainty. The term uncertainty is intended here to encompass a variety of forms of incomplete knowledge, including incompleteness, inconclusiveness, vagueness, ambiguity, and others. The term uncertainty reasoning is meant to denote the full range of methods designed for representing and reasoning with knowledge when Boolean truth values are unknown, unknowable, or inapplicable. Commonly applied approaches to uncertainty reasoning include probability theory, DempsterShafer theory, fuzzy logic and possibility theory, and numerous other methodologies.

A few Web-relevant challenges which are addressed by reasoning under uncertainty include:

Uncertainty of available information: Much information on the World Wide Web is uncertain. Examples include weather forecasts or gambling odds. Canonical methods for representing and integrating such information are necessary for communicating it in a seamless fashion.

Information incompleteness: Information extracted from large information networks such as the World Wide Web is typically incomplete. The ability to exploit partial information is very useful for identifying sources of service or information. For example, that an online service deals with greeting cards may be evidence that it also sells stationery. It is clear that search effec- tiveness could be improved by appropriate use of technologies for handling uncertainty.

Information incorrectness: Web information is also often incorrect or only partially correct, raising issues related to trust or credibility. Uncertainty representation and reasoning helps to resolve tension amongst information sources having different confidence and trust levels, and can facilitate the merging of controversial information obtained from multiple sources.

Uncertain ontology mappings: The Semantic Web vision implies that numerous distinct but conceptually overlapping ontologies will co-exist and interoperate. It is likely that in such scenarios ontology mapping will benefit from the ability to represent degrees of membership and/or likelihoods of membership in categories of a target ontology, given information about class membership in the source ontologies.

Indefinite information about Web services: Dynamic composability of Web services will require runtime identification of processing and data resources and resolution of policy objectives. Uncertainty reasoning techniques may be nec- essary to resolve situations in which existing information is not definitive.

Uncertainty is thus an intrinsic feature of many important tasks on the Web and the Semantic Web, and a full realization of the World Wide Web as a source of processable data and services demands formalisms capable of representing and reasoning under uncertainty. Unfortunately, none of these needs can be addressed in a principled way by current Web standards. Although it is to some degree possible to use semantic markup languages such as OWL or RDF(S) to represent qualitative and quantitative information about uncertainty, there is no established foundation for doing so, and feasible approaches are severely limited. Furthermore, there are ancillary issues such as how to balance representational power vs. simplicity of uncertainty representations, which uncertainty representation techniques address uses such as the examples listed above, how to ensure the consistency of representational formalisms and ontologies, etc.

In response to these pressing demands, in recent years several promising approaches to uncertainty reasoning on the Semantic Web have been proposed. The present volume covers a representative cross section of these approaches, from extensions to existing Web-related logics for the representation of uncertainty to approaches to inductive reasoning under uncertainty on the Web.

In order to reflect the diversity of the presented approaches and to relate them to their underlying models of uncertainty, the contributions to this volume are grouped as follows:

Probabilistic and Dempster-Shafer Models

Probability theory provides a mathematically sound representation language and formal calculus for rational degrees of belief, which gives different agents the freedom to have different beliefs about a given hypothesis. As this provides a compelling framework for representing uncertain, imperfect knowledge that can come from diverse agents, there are many distinct approaches using probability in the context of the Semantic Web. Classes of probabilistic models covered with the present volume are Bayesian Networks, probabilistic extensions to Description and First-Order Logics, and models based on the DempsterShafer theory (a generalization of the classical Bayesian approach).

Fuzzy and Possibilistic Models

Fuzzy formalisms allow for representing and processing degrees of truth about vague (or imprecise) pieces of information. In fuzzy description logics and ontology languages, concept assertions, role assertions, concept inclusions, and role inclusions have a degree of truth rather than a binary truth value. The present volume presents various approaches which exploit fuzzy logic and possibility theory in the context of the Semantic Web.

Inductive Reasoning and Machine Learning

Machine learning is supposed to play an increasingly important role in the con- text of the Semantic Web by providing various tasks such as the learning of ontologies from incomplete data or the (semi-)automatic annotation of data on the Web. Results obtained by machine learning approaches are typically uncertain. As a logic-based approach to machine learning, inductive reasoning provides means for inducing general propositions from observations (example facts). Papers in this volume exploit the power of inductive reasoning for the purpose of ontology learning, and project future directions for the use of machine learning on the Semantic Web.

Hybrid Approaches

This volume segment contains papers which either combine approaches from two or more of the previous segments, or which do not rely on any specific classical approach to uncertainty reasoning.

Acknowledgements. We would like to express our gratitude to the authors of this volume for their contributions and to the workshop participants for inspiring discussions, as well as to the members of the workshop Program Committees and the additional reviewers for their reviews and for their overall support.

Probabilistic and Dempster-Shafer Models

Just Add Weights: Markov Logic for the Semantic Web

In recent years, it has become increasingly clear that the vision of the Semantic Web requires uncertain reasoning over rich, first- order representations. Markov logic brings the power of probabilistic modeling to first-order logic by attaching weights to logical formulas and viewing them as templates for features of Markov networks. This gives natural probabilistic semantics to uncertain or even inconsistent knowledge bases with minimal engineering effort. Inference algorithms for Markov logic draw on ideas from satisfiability, Markov chain Monte Carlo and knowledge-based model construction. Learning algorithms are based on the conjugate gradient algorithm, pseudo-likelihood and in- ductive logic programming. Markov logic has been successfully applied to problems in entity resolution, link prediction, information extraction and others, and is the basis of the open-source Alchemy system.

Semantic Science: Ontologies, Data and Probabilistic Theories

This chapter overviews work on semantic science. The idea is that, using rich ontologies, both observational data and theories that make (probabilistic) predictions on data are published for the purposes of improving or comparing the theories, and for making predictions in new cases. This paper concentrates on issues and progress in having machine accessible scientific theories that can be used in this way. This paper presents the grand vision, issues that have arisen in building such systems for the geological domain (minerals exploration and geohazards), and sketches the formal foundations that underlie this vision. The aim is to get to the stage where: any new scientific theory can be tested on all available data; any new data can be used to evaluate all existing theories that make predictions on that data; and when someone has a new case they can use the best theories that make predictions on that case

Probabilistic Dialogue Models for Dynamic Ontology Mapping

Agents need to communicate in order to accomplish tasks that they are unable to perform alone. Communication requires agents to share a common ontology, a strong assumption in open environments where agents from different backgrounds meet briefly, making it impos- sible to map all the ontologies in advance. An agent, when it receives a message, needs to compare the foreign terms in the message with all the terms in its own local ontology, searching for the most similar one. Ho- wever, the content of a message may be described using an interaction model: the entities to which the terms refer are correlated with other entities in the interaction, and they may also have prior probabilities determined by earlier, similar interactions. Within the context of an in- teraction it is possible to predict the set of possible entities a received message may contain, and it is possible to sacrifice recall for efficiency by comparing the foreign terms only with the most probable local ones. This allows a novel form of dynamic ontology matching.

An Approach to Probabilistic Data Integration for the Semantic Web

In previous work, we have introduced probabilistic description logic programs for the Semantic Web, which combine description logics, normal programs under the answer set (resp., well-founded) semantics, and probabilistic uncertainty. In this paper, we continue this line of research. We propose an approach to probabilistic data integration for the Semantic Web that is based on probabilistic description logic programs, where probabilistic uncertainty is used to handle inconsistencies between different data sources. It is inspired by recent works on probabilistic data integration in the database and web community [5,2].

Rule-Based Approaches for Representing Probabilistic Ontology Mappings

Using mappings between ontologies is a common way of approaching the semantic heterogeneity problem on the Semantic Web. To fit into the landscape of Semantic Web languages, a suitable logic-based representation formalism for mappings is needed, which allows to reason with ontologies and mappings in an integrated manner, and to deal with uncertainty and inconsistencies in automatically created mappings. We analyze the requirements for such a formalism, and propose to use frameworks that integrate description logic ontologies with probabilistic rules. We compare two such frameworks and show the advantages of using the probabilistic extensions of their deterministic counterparts. The two frameworks that we compare are tightly coupled probabilistic dl-programs, which tightly combine the description logics behind OWL DL resp. OWL Lite, disjunctive logic programs under the answer set semantics, and Bayesian probabilities, on the one hand, and generalized Bayesian dl-programs, which tightly combine the DLP-fragment of OWL Lite with Datalog (without negation and equality) based on the semantics of Bayesian networks, on the other hand. © 2008 Springer Berlin Heidelberg.

PR-OWL: A Bayesian Ontology Language for the Semantic Web

This paper addresses a major weakness of current technologies for the Semantic Web, namely the lack of a principled means to represent and reason about uncertainty. This not only hinders the realization of the original vision for the Semantic Web, but also creates a barrier to the development of new, powerful features for general knowledge applications that require proper treatment of uncertain phenomena. We propose to extend OWL, the ontology language recommended by the World Wide Web Consortium (W3C), to provide the ability to express probabilistic knowledge. The new language, PR-OWL, will allow legacy ontologies to interoperate with newly developed probabilistic ontologies. PR-OWL will move beyond the current limitations of deterministic classical logic to a full first-order probabilistic logic. By providing a principled means of modeling uncertainty in ontologies, PR-OWL will serve as a supporting tool for many applications that can benefit from probabilistic inference within an ontology language, thus representing an important step toward the W3C’s vision for the Semantic Web.

Discovery and Uncertainty in Semantic Web Services

Although Semantic Web service discovery has been extensively studied in the literature ([7], [12], [15] and [10]), we are far from achieving an effective, complete and automated discovery process. Using the incidence calculus [4], a truth-functional probabilistic calculus, and a lightweight brokering mechanism [17], the article explores the suitability of integrating probabilistic reasoning in Semantic Web services environments. We show how the combination of relaxation of the matching process and evaluation of web service capabilities based on a previous historical record of successful executions enables new possibilities in service discovery.

An Approach to Description Logic with Support for Propositional Attitudes and Belief Fusion

In the (Semantic) Web, the existence or producibility of certain, consensually agreed or authoritative knowledge cannot be assumed, and criteria to judge the trustability and reputation of knowledge sources may not be given. These issues give rise to formalizations of web information which factor in heterogeneous and possibly inconsistent assertions and intentions, and make such heterogeneity explicit and manageable for reasoning mechanisms. Such approaches can provide valuable metaknowledge in contemporary application fields, like open or distributed ontologies, social software, ranking and recommender systems, and domains with a high amount of controversies, such as politics and culture. As an approach to this, we introduce a lean formalism for the Semantic Web which allows for the explicit representation of controversial individual and group opinions and goals by means of so-called social contexts, and optionally for the probabilistic belief merging of uncertain or conflicting statements. Doing so, our approach generalizes concepts such as provenance annotation and voting in the context of ontologies and other kinds of Semantic Web knowledge.

Using the Dempster-Shafer Theory of Evidence to Resolve ABox Inconsistencies

Automated ontology population using information extraction algorithms can produce inconsistent knowledge bases. Confidence values assigned by the extraction algorithms may serve as evidence helping to repair produced inconsistencies. The Dempster-Shafer theory of evidence is a formalism, which allows appropriate interpretation of extractors’ confidence values. The paper presents an algorithm for translating the subontologies containing conflicts into belief propagation networks and repairing conflicts based on the Dempster-Shafer plausibility

An Ontology-Based Bayesian Network Approach for Representing Uncertainty in Clinical Practice Guidelines

Clinical Practice Guidelines (CPGs) play an important role in improving the quality of care and patient outcomes. Although sev- eral machine-readable representations of practice guidelines implemented with semantic web technologies have been presented, there is no imple- mentation to represent uncertainty with respect to activity graphs in clinical practice guidelines. In this paper, we are exploring a Bayesian Network(BN) approach for representing the uncertainty in CPGs based on ontologies. Based on the representation of uncertainty in CPGs, when an activity occurs, we can evaluate its efiect on the whole clinical pro- cess, which, in turn, can help doctors judge the risk of uncertainty for other activities, and make a decision. A variable elimination algorithm is applied to implement the BN inference and a validation of an aspirin therapy scenario for diabetic patients is proposed.

Fuzzy and Possibilistic Models

A Crisp Representation for Fuzzy SHOIN with Fuzzy Nominals and General Concept Inclusions

Fuzzy Description Logics are a family of logics which allow the representation of (and the reasoning within) structured knowledge affected by uncertainty and vagueness. They were born to overcome the limitations of classical Description Logics when dealing with such kind of knowledge, but they bring out some new challenges, requiring an appropriate fuzzy language to be agreed and needing practical and highly optimized implementations of the reasoning algorithms. In the current paper we face these problems by presenting a reasoning preserving procedure to obtain a crisp representation for a fuzzy extension of SHOIN , which makes possible to reuse a crisp representation language as well as currently available reasoners, which have demonstrated a very good performance in practice. As an additional contribution, we define the syntax and semantics of a novel fuzzy version of the nominal construct and allow to reason within fuzzy general concept inclusions.

Optimizing the Crisp Representation of the Fuzzy Description Logic SROIQ

Classical ontologies are not suitable to represent imprecise nor uncertain pieces of information. Fuzzy Description Logics were born to represent the former type of knowledge, but they require an appropriate fuzzy language to be agreed and an important number of available resources to be adapted. This paper faces these problems by presenting a reasoning preserving procedure to obtain a crisp representation for a fuzzy extension of the logic SROIQ which uses G¨odel implication in the semantics of fuzzy concept and role subsumption. This reduction allows to reuse a crisp representation language as well as currently available reasoners. Our procedure is optimized with respect to the related work, reducing the size of the resulting knowledge base, and is implemented in DeLorean, the first reasoner supporting fuzzy OWL DL.

Uncertainty Issues and Algorithms in Automating Process Connecting Web and User

We focus on replacing human processing web resources by automated processing. On an experimental system we identify uncertainty issues making this process difficult for automated processing and try to minimize human intervention. In particular we focus on uncertainty issues in a Web content mining system and a user preference mining system. We conclude with possible future development heading to an extension of OWL with uncertainty features.

Granular Association Rules for Multiple Taxonomies: A Mass Assignment Approach

The use of hierarchical taxonomies to organise information (or sets of objects) is a common approach for the semantic web and elsewhere, and is based on progressively finer granulations of objects. In many cases, seemingly crisp granulation disguises the fact that categories are based on loosely defined concepts that are better modelled by allowing graded membership. A related problem arises when different taxonomies are used, with different structures, as the integration process may also lead to fuzzy categories. Care is needed when information systems use fuzzy sets to model graded membership in categories – the fuzzy sets are not disjunctive possibility distributions, but must be interpreted conjunctively. We clarify this distinction and show how an extended mass assignment framework can be used to extract relations between fuzzy categories. These relations are association rules and are useful when integrating multiple information sources categorised according to different hierarchies. Our association rules do not suffer from problems associated with use of fuzzy cardinalities. Experimental results on discovering association rules in film databases and terrorism incident databases are demonstrated.

A Fuzzy Semantics for the Resource Description Framework

Semantic Web languages cannot currently represent vague or uncertain information. However, their crisp model-theoretic semantics can be extended to represent uncertainty in much the same way first-order logic was extended to fuzzy logic. We show how the interpretation of an RDF graph (or an RDF Schema ontology) can be a matter of values, addressing a common problem in real-life knowledge management. While unmodified RDF triples can be interpreted according to the new semantics, an extended syntax is needed in order to store fuzzy membership values within the statements. We give conditions an extended interpretation must meet to be a model of an extended graph. Reasoning in the resulting fuzzy languages can be implemented by current inferencers with minimal adaptations

Reasoning with the Fuzzy Description Logic f-SHIN: Theory, Practice and Applications

The last couple of years it is widely acknowledged that uncertainty and fuzzy extensions to ontology languages, like Description Logics (DLs) and OWL, could play a significant role in the improvement of many Semantic Web (SW) applications. Many of the tasks of SW like trust, matching, merging, ranking usually involve confidence or truth degrees that one requires to represent and reason about. Fuzzy DLs are able to represent vague concepts such as a “Tall” person, a “Hot” place, a “MiddleAged” person, a “near” destination and many more. In the current paper we present a fuzzy extension to the DL SHINmathcal{SHIN} . First, we present the semantics while latter a detailed reasoning algorithm that decides most of the key inference tasks of fuzzy- SHINmathcal{SHIN} . Finally, we briefly present the fuzzy reasoning system FiRE, which implements the proposed algorithm and two use case scenarios where we have applied fuzzy DLs through FiRE.

Inductive Reasoning and Machine Learning

Towards Machine Learning on the Semantic Web

In this paper we explore some of the opportunities and challenges for machine learning on the Semantic Web. The Semantic Web provides standardized formats for the representation of both data and ontological background knowledge. Semantic Web standards are used to describe meta data but also have great potential as a general data format for data communication and data integration. Within a broad range of possible applications machine learning will play an increasingly important role: Machine learning solutions have been developed to support the management of ontologies, for the semi-automatic annotation of unstructured data, and to integrate semantic information into web mining. Machine learning will increasingly be employed to analyze distributed data sources described in Semantic Web formats and to support approximate Semantic Web reasoning and querying. In this paper we discuss existing and future applications of machine learning on the Semantic Web with a strong focus on learning algorithms that are suitable for the relational character of the Semantic Web’s data structure. We discuss some of the particular aspects of learning that we expect will be of relevance for the Semantic Web such as scalability, missing and contradicting data, and the potential to integrate ontological background knowledge. In addition we review some of the work on the learning of ontologies and on the population of ontologies, mostly in the context of textual data.

Using Cognitive Entropy to Manage Uncertain Concepts in Formal Ontologies

A logical formalism to support the insertion of uncertain concepts in formal ontologies is presented. It is based on the search of extensions by means of two automated reasoning systems (ARS), and it is driven by what we call cognitive entropy.

Analogical Reasoning in Description Logics

This work presents a framework, founded on multi-relational instancebased learning, for inductive (memory-based) reasoning on knowledge bases expressed in Description Logics. The procedure, which exploits a relational dissimilarity measure based on the notion of Information Content, can be employed both to answer to class-membership queries and to predict assertions, that may not be logically entailed by the knowledge base. These tasks may be the baseline for other inductive methods for ontology construction and evolution. In a preliminary experimentation, we show that the method is sound. Besides it is actually able to induce new knowledge that might be acquired in the knowledge base.

Approximate Measures of Semantic Dissimilarity under Uncertainty

We propose semantic distance measures based on the criterion of approximate discernibility and on evidence combination. In the presence of incomplete knowledge, the distance functions measure the degree of belief in the discernibility of two individuals by combining estimates of basic probability masses related to a set of discriminating features. We also suggest ways to extend this distance for comparing individuals to concepts and concepts to other concepts. Integrated within a k-Nearest Neighbor algorithm, the measures have been experimentally tested on a task of inductive concept retrieval demonstrating the effectiveness of their application.

Ontology Learning and Reasoning – Dealing with Uncertainty and Inconsistency

Ontology Learning from text aims at generating domain ontologies from textual resources by applying natural language processing and machine learning techniques. It is inherent in the ontology learning process that the acquired ontologies represent uncertain and possibly contradicting knowledge. From a logical perspective, the learned ontologies are potentially inconsistent knowledge bases that thus do not allow meaningful reasoning directly. In this paper we present an approach to generate consistent OWL ontologies from learned ontology models by taking the uncertainty of the knowledge into account. We further present evaluation results from experiments with ontologies learned from a Digital Library.

Hybrid Approaches

Uncertainty Reasoning for Ontologies with General TBoxes in Description Logic

We present a reasoning procedure for ontologies with un- certainty described in Description Logic (DL) which include General TBoxes, i.e., include cycles and General Concept Inclusions (GCIs). For this, we consider the description language ALCU, in which uncertainty parameters are associated with ABoxes and TBoxes, and which allows General TBoxes. Using this language as a basis, we then present a tableau algorithm which encodes the semantics of the input knowledge base as a set of assertions and linear and/or nonlinear arithmetic constraints on certainty variables. By tuning the uncertainty parameters in the knowl- edge base, different notions of uncertainty can be modeled and reasoned with, within the same framework. Our reasoning procedure is determin- istic, and hence avoids possible empirical intractability in standard DL with General TBoxes. We further illustrate the need for blocking when reasoning with General TBoxes in the context of ALCU.

In the next article, we will discuss Uncertainty Reasoning for the Semantic Web 2 collection of papers.