Uncertainty Reasoning for the Semantic Web 2

Machine Learning Technology Artificial Intelligence Technology Natural Language Processing Technology Semantic Web Technology Ontology Technology Reasoning Technology Knowledge Information Technology Collecting AI Conference Papers Digital Transformation Technology

This is the second volume on “Uncertainty Reasoning for the Semantic Web,” containing revised and significantly extended versions of selected workshop pa- pers presented at three workshops on Uncertainty Reasoning for the Semantic Web (URSW), held at the International Semantic Web Conferences (ISWC) in 2008, 2009, and 2010, or presented at the First International Workshop on Un- certainty in Description Logics (UniDL) in 2010. The first volume contained the proceedings of the first three workshops on URSW at ISWC in 2005, 2006, and 2007.

The two volumes together represent a comprehensive compilation of state- of-the-art research approaches to uncertainty reasoning in the context of the Semantic Web, capturing different models of uncertainty and approaches to de- ductive as well as inductive reasoning with uncertain formal knowledge.

The World Wide Web community envisions effortless interaction between hu- mans and computers, seamless interoperability and information exchange among Web applications, and rapid and accurate identification and invocation of appro- priate Web services. As work with semantics and services grows more ambitious, there is increasing appreciation of the need for principled approaches to the for- mal representation of and reasoning under uncertainty. The term uncertainty is intended here to encompass a variety of forms of incomplete knowledge, in- cluding incompleteness, inconclusiveness, vagueness, ambiguity, and others. The term uncertainty reasoning is meant to denote the full range of methods designed for representing and reasoning with knowledge when Boolean truth values are unknown, unknowable, or inapplicable. Commonly applied approaches to uncer- tainty reasoning include probability theory, Dempster-Shafer theory, fuzzy logic and possibility theory, and numerous other methodologies.

A few Web-relevant challenges that are addressed by reasoning under uncer- tainty include:

Uncertainty of available information: Much information on the World Wide Web is uncertain. Examples include weather forecasts or gambling odds. Canonical methods for representing and integrating such information are necessary for communicating it in a seamless fashion.

Information incompleteness: Information extracted from large information networks such as the World Wide Web is typically incomplete. The ability to exploit partial information is very useful for identifying sources of service or information. For example, that an online service deals with greeting cards may be evidence that it also sells stationery. It is clear that search effec- tiveness could be improved by appropriate use of technologies for handling uncertainty.

Information incorrectness: Web information is also often incorrect or only partially correct, raising issues related to trust or credibility. Uncertainty

representation and reasoning helps to resolve tension among information sources having different confidence and trust levels, and can facilitate the merging of controversial information obtained from multiple sources.

Uncertain ontology mappings: The Semantic Web vision implies that nu- merous distinct but conceptually overlapping ontologies will co-exist and interoperate. It is likely that in such scenarios, ontology mapping will bene- fit from the ability to represent degrees of membership and/or likelihoods of membership in categories of a target ontology, given information about class membership in the source ontologies.

Indefinite information about Web services: Dynamic composability of Web services will require runtime identification of processing and data resources and resolution of policy objectives. Uncertainty reasoning techniques may be necessary to resolve situations in which existing information is not definitive.

Uncertainty is thus an intrinsic feature of many important tasks on the Web and the Semantic Web, and a full realization of the World Wide Web as a source of processable data and services demands formalisms capable of representing and reasoning under uncertainty. Unfortunately, none of these needs can be ad- dressed in a principled way by current Web standards. Although it is to some degree possible to use semantic mark-up languages such as OWL or RDF(S) to represent qualitative and quantitative information about uncertainty, there is no established foundation for doing so, and feasible approaches are severely limited. Furthermore, there are ancillary issues such as how to balance representational power vs. simplicity of uncertainty representations, which uncertainty represen- tation techniques address uses such as the examples listed above, how to ensure the consistency of representational formalisms and ontologies, etc.

In response to these pressing demands, in recent years, several promising ap- proaches to uncertainty reasoning on the Semantic Web have been proposed. The present volume covers a representative cross section of these approaches, from extensions to existing Web-related logics for the representation of uncertainty to approaches to inductive reasoning under uncertainty on the Web.

In order to reflect the diversity of the presented approaches and to relate them to their underlying models of uncertainty, the contributions to this volume are grouped as follows:

Probabilistic and Dempster-Shafer Models

Probability theory provides a mathematically sound representation language and formal calculus for rational degrees of belief, which gives different agents the freedom to have different beliefs about a given hypothesis. As this provides a compelling framework for representing uncertain, imperfect knowledge that can come from diverse agents, there are many distinct approaches using probabil- ity in the context of the Semantic Web. Classes of probabilistic models covered with the present volume are Bayesian networks, probabilistic extensions to de- scription and first-order logics, and models based on the Dempster-Shafer theory (a generalization of the classic Bayesian approach).

Fuzzy and Possibilistic Models

Fuzzy formalisms allow for the representing and processing of degrees of truth about vague (or imprecise) pieces of information. In fuzzy description logics and ontology languages, concept assertions, role assertions, concept inclusions, and role inclusions have a degree of truth rather than a binary truth value. The present volume presents various approaches that exploit fuzzy logic and possibility theory in the context of the Semantic Web.

Inductive Reasoning and Machine Learning

Machine learning is supposed to play an increasingly important role in the con- text of the Semantic Web by providing various tasks, such as the learning of ontologies from incomplete data or the (semi-)automatic annotation of data on the Web. Results obtained by machine learning approaches are typically uncer- tain. As a logic-based approach to machine learning, inductive reasoning provides means for inducing general propositions from observations (example facts). Pa- pers in this volume exploit the power of inductive reasoning for the purpose of ontology learning, and project future directions for the use of machine learning on the Semantic Web.

Hybrid Approaches

This volume segment contains papers that either combine approaches from two or more of the previous segments, or that do not rely on any specific classic approach to uncertainty reasoning.

We would like to express our gratitude to the authors of this volume for their contributions and to the workshop participants for inspiring discussions, as well as to the members of the workshop Program Committees and the additional reviewers for their reviews and for their overall support.

Probabilistic and Dempster-Shafer Models

PR-OWL 2.0 – Bridging the Gap to OWL Semantics

The past few years have witnessed an increasingly mature body of research on the Semantic Web, with new standards being developed and more complex use cases being proposed and explored. As complexity increases in SW applications, so does the need for principled means to cope with uncertainty inherent to real world SW applications. Not surprisingly, several approaches addressing uncertainty representation and reasoning on the Semantic Web have emerged [3, 4, 6, 7, 10, 11, 13, 14]. For example, PR-OWL [3] provides OWL constructs for representing Multi-Entity Bayesian Network (MEBN) [8] theories. This paper reviews some shortcomings of PR-OWL 1 [2] and describes how they will be addressed in PR-OWL 2. A method is presented for mapping back and forth from triples into random variables (RV). The method applies to triples representing both predicates and functions. A complex example is given for mapping an n-ary relation using the proposed schematic.

Probabilistic Ontology and Knowledge Fusion for Procurement Fraud Detection in Brazil

To cope with society’s demand for transparency and corruption prevention, the Brazilian Office of the Comptroller General (CGU) has carried out a number of actions, including: awareness campaigns aimed at the private sector; campaigns to educate the public; research initiatives; and regular inspections and audits of municipalities and states. Although CGU has collected information from hundreds of different sources – Revenue Agency, Federal Police, and others – the process of fusing all this data has not been efficient enough to meet the needs of CGU’s decision makers. Therefore, it is natural to change the focus from data fusion to knowledge fusion. As a consequence, traditional syntactic methods must be augmented with techniques that represent and reason with the semantics of databases. However, commonly used approaches fail to deal with uncertainty, a dominant characteristic in corruption prevention. This paper presents the use of Probabilistic OWL (PR-OWL) to design and test a model that performs information fusion to detect possible frauds in procurements involving Federal money. To design this model, a recently developed tool for creating PR-OWL ontologies was used with support from PR-OWL specialists and careful guidance from a fraud detection specialist from CGU

Understanding a Probabilistic Description Logic via Connections to First-Order Logic of Probability

This paper analyzes the probabilistic description logic P-(mathcal{SROIQ}) as a fragment of well-known first-order probabilistic logic (FOPL).P-(mathcal{SROIQ}) was suggested as a language that is capable of representing and reasoning about different kinds of uncertainty in ontologies, namely generic probabilistic relationships between concepts and probabilistic facts about individuals. However, some semantic properties of P-(mathcal{SROIQ}) have been unclear which raised concerns regarding whether it could be used for representing probabilistic ontologies. In this paper we provide an insight into its semantics by translating P-(mathcal{SROIQ}) into FOPL with a specific subjective semantics based on possible worlds. We prove faithfulness of the translation and demonstrate the fundamental nature of some limitations of P-(mathcal{SROIQ}). Finally, we briefly discuss the implications of the exposed semantic properties of the logic on probabilistic modeling.

Pronto: A Practical Probabilistic Description Logic Reasoner

This paper presents a system description of Pronto — the first probabilistic Description Logic reasoner capable of processing knowledge bases containing about a thousand of probabilistic axioms. We describe the design and architecture of the reasoner with an emphasis on the components that implement algorithms which are crucial for achieving such level of scalability. Finally, we present the results of the experimental evaluation of Pronto’s performance on series of propositional and non-propositional probabilistic knowledge bases.

Instance-Based Non-standard Inferences in EL with Subjective Probabilities

For practical ontology-based applications representing and reasoning with probabilities is an essential task. For Description Logics with subjective probabilities reasoning procedures for testing instance relations based on the completion method have been developed. In this paper we extend this technique to devise algorithms for solving nonstandard inferences for EL and its probabilistic extension Prob-EL01 c : computing the most specific concept of an individual and finding explanations for instance relations.

Fuzzy and Possibilistic Models

Finite Fuzzy Description Logics and Crisp Representations

Fuzzy Description Logics (DLs) are a formalism for the representation of structured knowledge that is imprecise or vague by nature. In fuzzy DLs, restricting to a finite set of degrees of truth has proved to be useful, both for theoretical and practical reasons. In this paper, we propose finite fuzzy DLs as a generalization of existing approaches. We assume a finite totally ordered set of linguistic terms or labels, which is very useful in practice since expert knowledge is usually expressed using linguistic terms. Then, we consider fuzzy DLs based on any smooth t-norm defined over this set. Initially we focus on the finite fuzzy DL ALCH, studying some logical properties, and showing the decidability of the logic by presenting a reasoning preserving reduction to the classical case. Finally, we extend our logic in two directions: by considering non-smooth t-norms and by considering additional DL constructors.

Reasoning in Fuzzy OWL 2 with DeLorean

Classical ontologies are not suitable to represent imprecise or vague information, which has led to several extensions using non-classical logics. In particular, several fuzzy extensions have been proposed in the literature. In this paper, we present the fuzzy ontology reasoner DELOREAN, the first to support a fuzzy extension of OWL 2. We discuss how to use it for fuzzy ontology representation and reasoning, and describe some implementation details and optimization techniques. An empirical evaluation demonstrates that these optimizations considerably improve the performance of the reasoner.

Dealing with Contradictory Evidence Using Fuzzy Trust in Semantic Web Data

Term similarity assessment usually leads to situations where contradictory evidence support has different views concerning the meaning of a concept and how similar it is to other concepts. Human experts can resolve their differences through discussion, whereas ontology mapping systems need to be able to eliminate contradictions before similarity combination can achieve high quality results. In these situations, different similarities represent conflicting ideas about the interpreted meaning of the concepts. Such contradictions can contribute to unreliable mappings, which in turn worsen both the mapping precision and recall. In order to avoid including contradictory beliefs in similarities during the combination process, trust in the beliefs needs to be established and untrusted beliefs should be excluded from the combination. In this chapter, we propose a solution for establishing fuzzy trust to manage belief conflicts using a fuzzy voting model.

Storing and Querying Fuzzy Knowledge in the Semantic Web Using FiRE

The great evolution of ontologies during the last decade, bred the need for storage and querying for the Semantic Web. For that pur- pose, many RDF tools capable of storing a knowledge base, and also performing queries on it, were constructed. Recently, fuzzy extensions to description logics have gained considerable attention especially for the purposes of handling vague information in many applications. In this paper we investigate on the issue of using classical RDF storing systems in order to provide persistent storing and querying over large-scale fuzzy information. To accomplish this we first propose a novel way for serial- izing fuzzy information into RDF triples thus classical storing systems can be used without any extensions. Additionally, we extend the exist- ing query languages of RDF stores in order to support expressive fuzzy queries proposed in the literature. These extensions are implemented through the FiRE fuzzy reasoning engine, which is a fuzzy DL reasoner for fuzzy-SHIN. Finally, the proposed architecture is evaluated using an industrial application scenario about casting for TV commercials and spots.

Transforming Fuzzy Description Logic ALCFL into Classical Description Logic ALCH

In this paper, we present a satisfiability preserving transformation of the fuzzy Description Logic ALCFL into the classical Description Logic ALCH. We can use the already existing DL systems to do the reasoning of ALCFL by applying the result of this paper. This work is inspired by Straccia, who has transformed the fuzzy Description Logic fALCH into the classical Description Logic ALCH.

A Fuzzy Logic-Based Approach to Uncertainty Treatment in the Rule Interchange Format: From Encoding to Extension

The Rule Interchange Format (RIF) is a W3C recommendation that allows rules to be exchanged between rule systems. Uncertainty is an intrinsic feature of real world knowledge, hence it is important to take it into account when building logic rule formalisms. However, the set of truth values in the RIF Basic Logic Dialect (RIF-BLD) currently consists of only two values (t and f), although the RIF Framework for Logic Dialects (RIF-FLD) allows for more. In this paper, we first present two techniques of encoding uncertain knowledge and its fuzzy semantics in RIF-BLD presentation syntax. We then propose an extension leading to an Uncertainty Rule Dialect (RIF-URD) to support a direct representation of uncertain knowledge. In addition, rules in Logic Programs (LP) are often used in combination with the other widely-used knowledge representation formalism of the Semantic Web, namely Description Logics (DL), in many application scenarios of the Semantic Web. To prepare DL as well as LP extensions, we present a fuzzy extension to Description Logic Programs (DLP), called Fuzzy DLP, and discuss its mapping to RIF. Such a formalism not only combines DL with LP, as in DLP, but also supports uncertain knowledge representation.

Inductive Reasoning and Machine Learning

PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation Using Probabilistic Methods

Formalizing an ontology for a domain manually is well-known as a tedious and cumbersome process. It is constrained by the knowledge acquisition bottleneck. Therefore, researchers developed algorithms and systems that can help to automatize the process. Among them are systems that include text corpora for the acquisition. Our idea is also based on vast amount of text corpora. Here, we provide a novel unsupervised bottom-up ontology generation method. It is based on lexico-semantic structures and Bayesian reasoning to expedite the ontology generation process. We provide a quantitative and two qualitative results illustrating our approach using a high throughput screening assay corpus and two custom text corpora. This process could also provide evidence for domain experts to build ontologies based on top-down approaches.

Semantic Web Search and Inductive Reasoning

Extensive research activities are recently directed towards the Semantic Web as a future form of the Web. Consequently, Web search as the key technology of the Web is evolving towards some novel form of Semantic Web search. A very promising recent such approach is based on combining standard Web pages and search queries with ontological background knowledge, and using standard Web search engines as the main inference motor of Semantic Web search. In this paper, we further enhance this approach to Semantic Web search by the use of inductive reasoning techniques. This adds especially the important ability to handle inconsistencies, noise, and incompleteness, which are all very likely to occur in distributed and heterogeneous environments, such as the Web. We report on a prototype implementation of the new approach and experimental results.

Ontology Enhancement through Inductive Decision Trees

The popularity of ontologies for representing the semantics behind many real-world domains has created a growing pool of ontologies on various topics. While different ontologists, experts, and organizations create the vast majority of ontologies, often for narrow application do-mains, they frequently overlap with other ontologies in broader domains, specifically as they pertain to the Semantic Web. These overlapping on-tologies sometimes model similar or matching theories, that may be in-consistent. To assist in the reuse of these ontologies, this paper describes a technique for enriching manually created ontologies by supplementing them with inductively derived rules, and reducing the number of incon-sistencies. The derived rules are translated from decision trees created by executing a tree based data mining algorithm with probability mea-sures over the data being modelled. These rules can be used to revise the ontology adding a higher level of granularity, in order to identify possible similarities missed by the original ontologists. We then discuss how this may be applied to ontology matching. We demonstrate the application of our technique by presenting an example, and discuss how various data types may be treated to generalize the semantics of an ontology for a broader application domain.

Assertion Prediction with Ontologies through Evidence Combination

Following previous works on inductive methods for ABox reasoning, we propose an alternative method for predicting assertions based on the available evidence and the analogical criterion. Once neighbors of a test individual are selected through some distance measures, a combination rule descending from the Dempster-Shafer theory can join together the evidence provided by the various neighbor individuals in order to predict unknown values in a learning problem. We show how to exploit the procedure in the problems of determining unknown class- and role-memberships or fillers for datatype properties which may be the basis for many further ABox inductive reasoning algorithms. This work presents also an empirical evaluation of the method on real ontologies.

Hybrid Approaches

Representing Uncertain Concepts in Rough Description Logics via Contextual Indiscernibility Relations

We investigate on modeling uncertain concepts via rough description logics, which are an extension of traditional description logics by a simple mechanism to handle approximate concept definitions through lower and upper approximations of concepts based on a rough-set semantics. This allows to apply rough description logics for modeling uncertain knowledge. Since these approximations are ultimately grounded on an indiscernibility relationship, the paper explores possible logical and numerical ways for defining such relationships based on the considered knowledge. In particular, the notion of context is introduced, allowing for the definition of specific equivalence relationships, to be used for approximations as well as for determining similarity measures, which may be exploited for introducing a notion of tolerance in the indiscernibility.

Efficient Trust-Based Approximate SPARQL Querying of the Web of Linked Data

The web of linked data represents a globally distributed dataspace, which can be queried using the SPARQL query language. However, with the growth in size and complexity of the web of linked data, it becomes impractical for the user to know enough about its structure and semantics for the user queries to produce enough answers. Moreover, there is a prevalence of unreliable data which can dominate the query results misleading the users and software agents. These problems are addressed in the paper by making use of ontologies available on the web of linked data to produce approximate results and also by presenting a trust model that associates RDF statements with trust values, which is used to give prominence to trustworthy data. Trustworthy approximate results can be generated by performing the relaxation steps at compile-time leading to the generation of multiple relaxed queries that are sorted in decreasing order of their similarity scores with the original query and executed. During their execution the trust scores of RDF data fetched are computed. However, the relaxed queries generated have conditions in common and we propose that by performing trust-based relaxations on-the-fly at runtime, the shared data between several relaxed queries need not be fetched repeatedly. Thus, the trust-based relaxation steps are integrated with the query execution itself resulting in performance benefits. Further opportunities for optimizations during query execution are identified and are used to prune relaxation steps which do not produce results. The implementation of our approach demonstrates its efficacy.