Uncertainty Reasoning for the Semantic Web 3

Machine Learning Artificial Intelligence Natural Language Processing Semantic Web Ontology Reasoning Knowledge Information Collecting AI Conference Papers Digital Transformation Navigate this blog
Uncertainty Reasoning

What is Uncertainty Reasoning?

Uncertainty Reasoning refers to a set of inference techniques designed to derive reliable conclusions and knowledge from information that is ambiguous, incomplete, noisy, or even contradictory. It plays a central role in many fields such as AI, robotics, the Semantic Web, and decision support systems.

Why is Uncertainty Reasoning Necessary?

In the real world, we often have to reason with incomplete information, such as:

– Data contains noise (e.g., fluctuations in sensor readings)
– Information is fragmented (e.g., only partial attributes are known)
– Judgments are subjective or vague (e.g., “I think it’s probably true”)
– Multiple sources provide conflicting information (e.g., differing diagnoses from two doctors)

Traditional inference assumes all knowledge is complete and consistent, which is insufficient for real-world reasoning. Hence, reasoning that **assumes uncertainty** is essential.

Major Approaches to Uncertainty Reasoning

1. Probability Theory
– The most fundamental framework
– Represents uncertainty as values between 0 and 1
– Example: `P(it will rain) = 0.8`
– Applications: Bayesian inference, Bayesian networks

2. Fuzzy Logic
– Handles degrees of truth rather than binary true/false
– Example: “John is tall” → degree = 0.7
– Strong in capturing vague concepts like “somewhat hot” or “almost complete”
– Used in control systems like air conditioning

3. Dempster-Shafer Theory
– Separates belief and plausibility for each information source
– Allows evidence aggregation without requiring exact probabilities
– Deals with degrees of confidence instead of probabilities

4. Possibility Theory
– Focuses on the range of possible outcomes
– Often used with fuzzy logic
– More flexible than probability theory, especially under data scarcity

5. Non-monotonic Logic
– Allows new information to override previous inferences
– Useful for exception handling
– Example: “Birds usually fly” → but “Penguins don’t fly”

Applications of Uncertainty Reasoning

Uncertainty reasoning has been widely applied to address the real-world challenges of incomplete or ambiguous information in various domains:

– In medical diagnosis, it helps estimate the likelihood or ranking of multiple potential diseases based on observed symptoms.
– In sensor networks, it accounts for the noise and variability in sensor data to make more reliable decisions.
– In autonomous driving, real-time decisions are made using uncertain environmental data to ensure safety.
– In question-answering AI, user inputs are often vague or ambiguous, so probabilistic reasoning helps select the most appropriate response.
– In knowledge integration, it is used to resolve conflicts and inconsistencies across different knowledge bases or data sources and create coherent knowledge structures.

Modern Developments in Uncertainty Reasoning

Uncertainty reasoning has evolved further with its integration into modern AI systems:

– Integration with Deep Learning:
Probabilistic neural networks and Bayesian deep learning models assign confidence levels to predictions, enabling more trustworthy AI-based decisions.

– Integration with the Semantic Web:
By extending formal knowledge representation frameworks like OWL with probabilities or fuzzy values, it becomes possible to reason flexibly with uncertain or ambiguous semantic relationships, thus expanding the applicability of complex knowledge bases.

– Integration with Reinforcement Learning:
In environments where observations and rewards are uncertain or incomplete, techniques like Bayesian reinforcement learning allow for optimal policy learning under uncertainty.

Overview of the Semantic Web and Uncertainty Reasoning

What is the Semantic Web?

The Semantic Web, proposed by Tim Berners-Lee, aims to enhance the Web by adding **semantic meaning** to data, enabling machines to understand, interpret, and reason with web content. It marks a shift from a “document-centered Web” to a “knowledge-centered Web.”

Core Technologies Supporting the Semantic Web:

– RDF (Resource Description Framework):
Represents information in subject–predicate–object triples, allowing structured, machine-readable data.

– OWL (Web Ontology Language):
A logic-based language used to formally define concepts, relationships, and constraints using description logic.

– SPARQL:
A query language for retrieving and manipulating RDF-structured data, similar in purpose to SQL.

– Reasoner:
A logical inference engine that deduces implicit knowledge from OWL-based rules.
Example: If “all dogs are mammals” and “Pochi is a dog,” then the reasoner can infer “Pochi is a mammal.”

These technologies enable:

– Precise expression of semantic relationships between data
– Advanced knowledge search and reasoning by machines
– Easy integration across heterogeneous data sources, such as through Linked Data

Why Combine the Semantic Web with Uncertainty Reasoning?

While the Semantic Web excels at formal and logical knowledge representation, real-world knowledge is often incomplete or vague. For example:

– Different sources may conflict in their assertions
– Some terms may have ambiguous or fuzzy meanings
– Data may be incomplete or missing

In such cases, strictly logical reasoning is inadequate. Therefore, uncertainty-aware reasoning techniques are essential to making the Semantic Web robust and realistic.

Approaches to Uncertainty in the Semantic Web

– Bayesian Logic / PR-OWL:
Introduces probabilities into OWL, suitable for domains like medical diagnostics.

– Fuzzy OWL:
Uses values between 0 and 1 to represent vague knowledge such as user preferences or sentiment.

– Markov Logic Networks (MLNs):
Assigns weights to logical formulas, enabling probabilistic reasoning over complex relationships (e.g., in social networks).

– Dempster-Shafer Theory:
Aggregates varying degrees of belief from multiple sources, useful in sensor data fusion.

– Possibility Theory:
Represents degrees of plausibility rather than strict probabilities, effective for interpreting vague or overlapping knowledge.

Key Use Cases

– Integrating heterogeneous knowledge bases
(e.g., combining medical ontologies from different hospitals with inconsistent assertions)

– Modeling user behavior and preferences under vagueness or uncertainty

– Evaluating the reliability of knowledge from various sources based on trustworthiness

– Supporting semantic search
by handling ambiguity and enabling flexible query interpretation through semantic similarity

Would you like this as a downloadable PDF or PowerPoint, or would you prefer to break this down into modules for further study or development use?

Uncertainty Reasoning for the Semantic Web 3

In this issue, we discuss Volume 3 of Uncertainty Reasoning for the Semantic Web. This volume contains revised and significantly extended versions of papers presented at three workshops on Uncertainty Reasoning for the Semantic Web (URSW) held in conjunction with the International Semantic Web Conference (ISWC) in 2011, 2012, and 2013. This is a revised and greatly expanded version of papers presented at three workshops on Uncertainty Reasoning for the Semantic Web (URSW) in conjunction with the International Semantic Web Conference (ISWC) in 2013. Volume 1 contains the proceedings of the first three workshops on URSW held at ISWC in 2005, 2006, and 2007. It contains revised versions of papers presented at the First International Workshop on Uncertainty in Description Logic (UniDL) held in 2010.

The workshop is a comprehensive compilation of state-of-the-art research approaches to uncertainty reasoning in the context of the Semantic Web, capturing different models of uncertainty and approaches to deductive and inductive reasoning with uncertain formal knowledge.

The World Wide Web community envisions effortless interaction between humans and computers, seamless interoperability and information exchange between Web applications, and rapid and accurate identification and invocation of appropriate Web services. As research on semantics and services becomes more ambitious, the need for a principled approach to the formal representation of uncertainty and reasoning under it is increasingly recognized. The term uncertainty is intended here to encompass various forms of incomplete knowledge, including incompleteness, inconclusiveness, ambiguity, vagueness, and others. The term uncertainty inference is intended to denote the full range of methods designed to represent and infer knowledge when the truth value of a Boolean expression is unknown, unknowable, or inapplicable. Probability theory, Dempster-Shafer theory, fuzzy logic, possibility theory, and many other methodologies are commonly applied to uncertainty inference.

Web-related challenges addressed by reasoning under uncertainty include

Uncertainty of available information: Much information on the World Wide Web is uncertain. Examples include weather forecasts and gambling odds.

Incompleteness of information: Information extracted from large information networks such as the World Wide Web is usually incomplete. The ability to use partial information can be very effective in identifying the source of a service or information. For example, the fact that an online service deals in greeting cards may be evidence that the service also sells stationery. Thus, it is clear that appropriate use of uncertainty-handling techniques can improve the effectiveness of search.

Information inaccuracy: Information on the Web is often inaccurate or only partially correct, which raises issues related to trustworthiness and credibility. Uncertainty representation and inference can help resolve tensions between sources of information with different levels of trust and credibility and facilitate the integration of contentious information from multiple sources.

Uncertain Ontology Mapping: The vision of the Semantic Web suggests that a number of different but conceptually overlapping ontologies will coexist and interoperate. In such a scenario, ontology mapping would benefit from the ability to express the degree or likelihood of belonging to a category in the target ontology, given information about the class membership of the source ontology.

Uncertain information about web services: dynamic combinations of web services require runtime identification of processing and data resources and resolution of policy goals. Uncertainty reasoning techniques may be needed to resolve situations where existing information is not definitive.

Thus, uncertainty is an essential feature of many important tasks on the Web and the Semantic Web, and to fully realize the World Wide Web as a source of processable data and services requires a framework that can represent and reason under uncertainty. Unfortunately, none of these needs are addressed in principle by current Web standards; while it is possible to some extent to represent qualitative and quantitative information about uncertainty using semantic markup languages such as OWL and RDF(S), the foundation for doing so is not established and feasible However, the foundation for this has not been established, and feasible approaches are very limited. Furthermore, there are ancillary issues such as how to strike a balance between expressiveness and simplicity of uncertainty representation, which uncertainty representation techniques are suitable for the above-mentioned applications, and how to ensure consistency between the representation format and ontology.

Against this background, several promising approaches to uncertainty reasoning on the Semantic Web have been proposed in recent years. These range from extensions of existing Web-related logic to uncertainty representation to approaches to inductive reasoning under uncertainty on the Web.

The approaches to those uncertainties in this issue are categorized as follows.

Probabilistic and Dempster-Shafer Models:

Probabilistic theory provides a mathematically sound representational language and a formal computational method for the degree of rational belief, which gives different agents the freedom to have different beliefs about a given hypothesis. There are many different approaches to using probability in the context of the Semantic Web, as it provides a compelling framework for representing uncertain and incomplete knowledge coming from diverse agents. Probabilistic models are discussed here, including Bayesian nets, probabilistic extensions of description logic and first-order logic, and models based on Dempster-Shafer theory (a generalization of classical Bayesian theory).

Fuzzy Models, Possibility Models:

Fuzzy forms allow representing and processing the truth or falsity of ambiguous (or imprecise) information. In fuzzy description logics and ontologies, concept assertions, role assertions, concept inclusions, and role inclusions are not binary truth values but degrees of truth. In this book, we describe various approaches that use fuzzy logic and possibility theory in the context of the Semantic Web.

Inductive Reasoning and Machine Learning:

Machine learning is expected to play an increasingly important role in the context of the Semantic Web by providing a variety of tasks, such as ontology learning from incomplete data and (semi-)automatic annotation of data on the Web. The results obtained by machine learning approaches are generally uncertain. Inductive inference, a logical approach to machine learning, provides a means to derive general propositions from observations (example facts). In this paper, we discuss the use of inductive inference for ontology learning and future directions of machine learning in the Semantic Web.

Hybrid Approaches:

In addition, there is a combination of two or more of the above approaches, or one that does not rely on a particular classical approach to uncertainty inference.

contents

Several approaches have been proposed to deal with uncertainty in Semantic Web SW. Probabilistic Ontology PO is one of the most promising approaches to model uncertainty in ontologies, but no support is provided to ontology engineers on how to create this more complex type of ontology. This task has proven to be so difficult and challenging that it motivated the creation of the Uncertainty Modeling Process for Semantic Technologies UMP-ST, a process to guide users in modeling POs This paper presents the UMP-ST plug-in, a tool that implements this process, and shows how the plug-in implemented in the UnBBayes Framework addresses the main problems related to modeling probabilistic ontologies: complexity of creation, difficulty of maintenance and evolution, and lack of a central tool to document these ontologies. To show how the UMP-ST plug-in overcomes these problems, we use a probabilistic ontology on the detection and prevention of procurement fraud in Brazil. This probabilistic ontology is a proof-of-concept use case developed as part of a research project of the Brazilian Comptroller General’s CGU. A short version of this paper was presented at URSW 2013.

Credal ALC combines the well-known ALC logic with probabilistic evaluation to allow terms to express concept and role uncertainty. We present a restricted version of CredalALC that can be viewed as a description language for a class of relational Bayesian networks. The resulting “ALC network” provides a simplified illumination pathway to both CredalALC and relational Bayesian networks. Next, we describe the implementation of the approximate variational inference and lifted exact inference algorithms in a freely available package.

The recently introduced Datalog± is a tractable knowledge representation format that allows representing and reasoning over lightweight ontologies; Datalog± extends ordinary Datalog by negative constraints and the possibility of rules containing existential quantifiers and equal signs in the ruleheads, while at the same time restricting rule syntax by adding so-called guards to the rule At the same time, it restricts the rule syntax by adding so-called guards to the rule body, thus gaining decidability and tractability. This paper examines how the recently proposed probabilistic extension of Datalog± can be used to represent ontology mappings in typical information integration environments such as data exchange, data integration, and peer-to-peer integration. To reconstruct the mapping history, detect cycles, and allow debugging of mappings, we also propose to extend it with provenance annotations.

This paper considers the problem of learning the structure and parameters of probabilistic description logics under the DISPONTE semantics; DISPONTE is based on the distributional semantics of probabilistic logic-type languages and assigns probabilities to assertions and term axioms. The system EDGE returns the values of the probabilities associated with the axioms given the DISPONTE knowledge base (KB) and a set of positive and negative examples in the form of concept assertions. This paper presents LEAP, a system that uses EDGE to learn the structure and parameters of the DISPONTE KB. LEAP is based on the system CELOE for ontology engineering and uses its search strategy in the space of possible axioms. LEAP uses CELOE’s We will present experiments that demonstrate the potential of EDGE and LEAP.

The semantics of probabilistic descriptive logic is based on the distributional semantics of the probabilistic logical product. This semantics is called DISPONTE and can represent definite probabilistic descriptions. We also present two systems for computing the probability of a query against a probabilistic knowledge base, BUNDLE and TRILL: BUNDLE is based on the Pellet reasoner, and TRILL is based on the declarative Prolog language. Both algorithms compute a propõtical Boolean expression representing the set of explanations for a query: BUNDLE constructs a conjunctive normal form in which each conjunct corresponds to an explanation, and TRILL computes a general Boolean pin point expressions are computed. Both algorithms then construct a Binomial Decision Diagram (BDD) representing the equation and compute probabilities from the BDD using a dynamic programming algorithm. Experiments comparing the performance of BUNDLE and TRILL will also be presented.

The emergence of initiatives such as LOD (Linked Open Data) in the last few years has led to a significant increase in the amount of structured semantic data on the Web. Ontologies have played a central role in this development. Ontologies are important because they allow for the explicit and formal representation of real-world domains, thus producing semantic data that is generally understood and shareable. However, the sharing and reuse of such data can be hampered by the presence of ambiguities in the data, which prevent the meaning of the data from being made explicit. In this paper, we introduce and evaluate an ambiguity ontology, a meta-ontology that allows for the explicit identification and description of ambiguous entities and the properties associated with their ambiguity in ontologies. The rationale for this is that such descriptions attached to an ambiguous ontology may narrow the possible interpretations that the latter ambiguous elements may be assumed by its users.

We consider the fuzzy description logic ALCOI with semantics based on the finite residual De Morgan lattice. We show that reasoning in this logic is ExpTime-complete for general TBoxes; in the sublogics of ALCI and ALCO, it is PSpace-complete for acyclic TBoxes. This is consistent with the known complexity bounds for reasoning in classical description logics between ALC and ALCOI.

This paper proposes an ontology alignment framework with two core features: the use of background knowledge and the ability to handle imprecision in the matching process and the resulting concept alignment. The procedure is based on the use of a generic reference vocabulary, which is used to define an explicit semantic space for the ontologies to be matched; a generic Wikipedia-based background knowledge source such as Yago seems to be an appropriate choice of reference vocabulary. The result of this procedure is a combined fuzzy knowledge system that captures what the two source ontologies have in common. The proposed approach allows to discover relationships between concepts such as many-to-many. An important application of this method is found in the field of cross-language ontology matching.

The number of available domain ontologies has been increasing over time. However, there is still a vast amount of data stored and managed in RDBMSs. This complementarity can be used both to discover knowledge patterns that are not formalized in the ontology but can be learned from the data, and to enhance inference on the ontology by relying on a combination of formal domain models and evidence from the data. We propose a method for learning association rules from both ontologies and RDBMSs in an integrated manner. The extracted patterns can be used to enrich the available knowledge (in both formats) or to improve existing ontologies. We also propose a method to automatically reason on a grounded knowledge base (a knowledge base linked to RDBMS data) based on a standard Tableaux algorithm that combines logical and statistical reasoning to understand heterogeneous data sources.

Real-world knowledge often contains a variety of uncertainties. Therefore, in the context of the Semantic Web, it is difficult to model the real-world domain using purely logical formalisms alone. Alternative approaches almost always assume that probabilistically enhanced knowledge is available, but this is rarely known a priori. Furthermore, purely deductive rigorous inference may not be feasible in a web-scale ontology knowledge base, which cannot take advantage of statistical regularities in the data. Approximate deductive and inductive reasoning have been proposed to alleviate these problems. In this paper, we propose to view the concept member prediction problem (the problem of predicting whether an individual in a description logic knowledge base is a member of a certain concept) as an estimation of a conditional probability distribution that models the posterior probability of the aforementioned individual becoming a concept member, given possible knowledge internalized from a knowledge base about the individual We propose to capture this. Specifically, we model such a posterior probability distribution as a Bayesian network, generatively and discriminatively structured, using an individual’s concept membership in a set of feature concepts representing available knowledge about the individual.

Given the increasing availability of structured machine-processable knowledge in the context of the Semantic Web, relying solely on pure deductive reasoning may have its limitations. In this study, we propose a novel method for similarity-based class member prediction in description logic knowledge bases. The method is nonparametric and has interesting complexity properties that make it a potential candidate for large-scale inductive reasoning. We also evaluate its effectiveness by comparing it with other approaches based on inductive reasoning in the SW literature.

In many systems, the determination of trust is reduced to an estimate of reputation. However, reputation is only one way to determine trust. Estimation of trust can be addressed from a variety of other perspectives. In this chapter, we model trust as dependent on user reputation, user attributes, and origin. It then explores the effects of combining trust computed by these different methods. Specifically, the first contribution of this chapter is a study of the correlation between demographics and trust. This research will help us understand which user categories are better candidates for annotation tasks in the cultural heritage domain. The next section details the procedure for calculating reputation-based trust ratings. User reputation is modeled in a subjective logic based on the user’s performance in the evaluated system (in the case of the work presented here, Waisda?); the third contribution is a procedure for computing confidence values based on performance information expressed using the W3C PROV model. We show how merging the results of these procedures is beneficial for the reliability of the estimated confidence values. The proposed method and its integration were evaluated by estimating and validating the confidence of tags created in the video tagging game Waisda? from the Netherlands Institute for Sound and Image Research. Through a quantitative analysis of the results, we demonstrate that the use of provenance and demographic information is beneficial for the accuracy of confidence ratings.

Subjective logic is a powerful probability theory that is effective for dealing with uncertain data. Since subjective logic is effective in dealing with internal noise in Semantic Web data, and the Semantic Web provides a means of obtaining evidence that is useful for ev-idential reasoning based on subjective logic, subjective logic and the Semantic Web can be mutually beneficial The Semantic Web and Subjective Logic can be mutually beneficial. This chapter describes three extensions and applications of subjective logic in the Semantic Web: the use of deterministic and probabilistic semantic similarity measures for weighting subjective opinions, methods for considering partial observations, and “open world opinions,” or subjective opinions based on Dirichlet processesdescribed in “Overview of the Dirichlet Process (DP), its algorithms, and examples of implementations” that extend multinomial opinions will be described. For each of these extensions, examples and applications will be provided to demonstrate their effectiveness.

Web data often exhibit high levels of uncertainty. We focus on categorical web data and express these levels of uncertainty as first- or second-order uncertainties. As specific examples, we show how to quantify and treat these uncertainties using beta-binomial and Dirichlet multinomial models, and how to account for unseen categories in the sample using Dirichlet processes. Finally, the paper concludes by illustrating how these higher-order models can be used as a basis for the analysis of data sets, after accounting for at least some of the uncertainty. We also show how Battacharyya statistical distance can be used to quantify the similarity between Dirichlet distributions, and how the results can be used to visually and automatically analyze piracy attack web data sets.

Preference representation and inference is an important issue in many real-world scenarios. Currently, there are many approaches for assessing preferences qualitatively or quantitatively. The most prominent qualitative approach to expressing preferences is CP-net, whose clear graph structure combines a simple representation of user desires with good computational properties for computing optimal results. In this section, we introduce the ontology CP-net. It uses CP-net on the ontology domain to represent preferences, i.e., the variable values are logical expressions constrained with respect to the background domain ontology.

Social media content makes up the majority of all text content appearing on the Internet. These user-generated content (UGC) streams provide opportunities and challenges for media analysts to analyze vast amounts of new data and use them to infer and deduce new information. The main challenge with natural language is its ambiguity and vagueness. The grammatical structure of sentences is used to automatically resolve ambiguity. However, when moving to informal language, which is widely used in social media, the language becomes more ambiguous and, consequently, more difficult to understand automatically.

Information extraction (IE) is a field of research that enables the structured use of unstructured text. Named entity extraction (NEE) is a subtask of IE that aims to locate phrases (mentions) in text that represent the names of entities such as people, organizations, and places, regardless of their type. Named Entity Disambiguation (NED) is the task of correctly determining the person, place, event, etc. referred to by a given mention.

The purpose of this paper is to provide an overview on several approaches that mimic the human way of recognizing and disambiguating named entities, especially in domains without formal sentence structure. The proposed methods open the door to more sophisticated applications based on user contributions on social media. We propose a robust combined framework for NEE and NED in semi-formal and informal text. The results prove that robustness is achieved independent of language and domain, and independent of the chosen extraction and disambiguation method. It was also shown to be robust to the non-normality of the language used. We found reinforcement effects and developed a technique to improve extraction quality by feeding back disambiguation results. To improve the disambiguation results, we present a method for handling the uncertainty involved in extraction.

コメント

タイトルとURLをコピーしました