Ontology Matching Technology

Machine Learning Ontology Clojure Python Digital Transformation Artificial Intelligence Probabilistic Generative Model  Web Technology Knowledge Information Processing Technology Natural Language Processing Navigation of this blog

Ontology Matching Technology

Ontology matching technology will be a technology for coordinating relationships between different ontologies. An ontology is a formalized framework for expressing knowledge and concepts about a particular domain of a field, which allows data integration and retrieval between different ontologies. However, between different ontologies, the same concept may be expressed in different terms or the same term may represent different concepts, and ontology matching techniques are used to solve such problems.

There are various ontology matching techniques. For example, methods that use existing linguistic resources such as thesauruses and dictionaries, and methods that use grammatical pattern matching, similarity computation, and semantic analysis. These methods can be used to match terms and concepts across different ontologies to achieve ontology integration and data integration.

Ontology matching techniques are applied to data integration and retrieval between different disciplines and industries. Ontology matching technology can also increase the reusability of ontologies and reduce the cost of ontology maintenance and development. However, ontology matching technology has its own challenges and problems to improve the accuracy of matching, and research and development of ontology matching technology will continue to be conducted in the future. In this section, we describe this ontology matching technology based on “Ontology Matching“.

Ontology Matching

Ontology matching can be a technique that aims to find correspondences between semantically related entities in different ontologies.

These correspondences can represent equivalence between ontology entities or other relations such as consequence, subsumption, disjointness, etc. Many different matching solutions have been proposed from various perspectives such as databases, information systems, and artificial intelligence.

Various methods have been proposed for ontology matching, including data interlinking, ontology partitioning and pruning, context-based matching, matcher tuning, alignment debugging, and user participation in matching.

The table of contents is as follows.

<Part1 Matching Problem>

  • Chapter1 Application
  • Chapter2 The Matching Problem
  • Chapter3 Methodology

<Part2 Ontology Matching Techniques>

There are many cases in which similarity in natural language is evaluated and used in tasks that deal with various types of data. From the perspective of ontology matching, we will summarize ontology matching techniques that describe similarity for individual instance levels and for collections as graph data.

In this article, we will discuss the mathematical definition of similarity/dissimilarity and the definition of distance between entities.

We describe a matching method that considers only strings.

String-based methods use the structure of the string (as a sequence of characters). String-based methods usually determine that the class Book and Textbook are similar, but the class Book and Volume are not.

There are various methods for comparing strings, depending on how the strings are viewed. For example, it can be an exact sequence of characters, an incorrect sequence of characters, a set of characters, or a set of words. (Cohen et al. 2003b) compares various string matching techniques, ranging from distance similarity functions to token-based distance functions. In this section, we discuss the most frequently used techniques.

We distinguish between (1) normalization techniques, which are used to reduce the strings to be compared to a common format; (2) substring or subsequence techniques, which seek similarity based on common characters between strings; (3) edit distance, which further assesses the possibility that one string is an incorrect version of another; (4) statistical measures, which establish the importance of words in a string by weighting the relationship between the two strings; and (5) path comparison.

In the previous article, we have considered a string as a sequence of characters, but for general use, it is important to treat it as a string (sentence) rather than a single word. Here, for example, when we divide the string “theoretical peer-reviewed journal article” into words (theoretical, peer, reviewed, journal, article), which are easily identifiable character sequences derived from dictionary entries In this case, the words are not in a bag of words as used in information retrieval, but in an array with a grammatical structure. Here a word like peer has a meaning and is associated with a concept, but a more useful concept that should be properly handled in a text would be a term such as peer-review or peer-reviewed journal.

A term will be a phrase that identifies a concept and is often used to label concepts in an ontology. Therefore, ontology matching can greatly benefit from the recognition and identification of terms in strings. This is equivalent to recognizing the term Peer-reviewed journal with the label peer-reviewed scientific periodical (not journal review paper).

The language-based approach uses natural language processing (NLP) techniques to extract meaningful terms from the text. By comparing these terms and their relationships, the similarity of names and comments of entities in the ontology can be assessed. Although they are based on some linguistic knowledge, we can broadly distinguish between (1) methods that rely only on algorithms (intrinsic methods) and (2) methods that use external resources such as dictionaries (extrinsic methods).

Instead of, or in addition to, comparing names and identifiers, information about the structure of the entities in the ontology can be compared. This comparison can be subdivided into comparison of the internal structure of the entity, that is, in addition to names and annotations, comparison of properties or, in the case of OWL ontologies, properties that take values in data types, and comparison of the entity with other entities to which the entity is related. The former is called internal structure and the latter is called relational structure.

Internal structures are mainly used for database schema matching, while relational structures are more important for formal ontology and Semantic Web network matching.

Methods based on internal structure are sometimes called constraint-based approaches in the literature (Rahm and Bernstein 2001). These methods are based on the internal structure of the entities and use criteria such as the set of properties, the range of properties (attributes and relations), cardinality or multiplicity, and the transitivity or symmetry of properties to calculate the similarity between entities.

When individual representations (or instances) are available, this is a very convenient case for a matching system. matching becomes very easy when two ontologies share the same set of indivisuals. For example, if two classes share the exact same set of individuals, we can strongly presume that these classes are equivalent.

Even if the classes do not share the same set of individuals, the matching process can be based on tangible indicators that are not easily changed. For example, the title of a book has no reason to change. In other words, if a book has a different title, then it is not the same book. Then the matching would again be based on individual comparisons.
In this way, we can classify the extension methods into three categories: (1) those that apply to ontologies with a common instance set, (2) those that propose individual identification techniques before using conventional methods, and (3) those that do not require identification, i.e., those that apply to heterogeneous instance sets.

The basic similarity introduced in the previous sections can be considered as a local method because it only considers their proper characteristics (name, internal structure, extensions) to evaluate the similarity or dissimilarity between two entities. In this article, we will discuss a global method that uses the external structure of an entity instead of its internal structure to compare various entities.

If there are nonlinearities in the constraints governing cycles and relational similarity in the ontology, a simple similarity calculation is not possible. Therefore, there is a need to perform iterative similarity computation techniques. These approaches are in the form of approximating the optimal solution to the constraints. The first approach to solving these matching problems is through classical optimization methods. Another global approach is the probabilistic approach, which calculates the probability of related entities based on the structure of the ontology and alignments. Furthermore, there are semantic methods that rely on the global interpretation of ontologies and alignments respectively.

After the taxonomic structure, the next best-known structure is the mereologic (part-whole) structure, which corresponds to the part-of relationship. 2 classes, Book and Volume, are found to be equivalent, suggesting that if there is a mereologic relationship between the classes InBook and BookChapter, respectively, the two classes may be similarly related. If there is a mereologic relationship between the classes InBook and BookChapter, it implies that the two classes may be related as well. This reasoning also applies in the opposite direction, from part to whole. This is the case when there are separate parts, i.e., when the parts of a journal Issue are distinguished into editporial, article, recesions, and letture.
The difficulty in dealing with such a structure is that it is not easy to find a property that has just such a structure. For example, theProceedings class can have some whole-part relationship with theInProceedings class, which is represented by properties communications. These InProceedings objects will in turn have a mere structure represented by section properties.

However, if we can detect a relationship that supports the mere structure, we can use this to calculate the similarity between the classes. If they share similar parts, they can be said to be more similar. This will be more useful when comparing class extensions. This is more useful when comparing class extensions, because objects that share the same set of parts can be inferred to be the same.

The computation of the combined similarity is still local, since it only considers the neighbors of the nodes to provide the similarity. However, the similarity is for the entire ontology, and the final similarity value may ultimately depend on all ontology entities. Furthermore, if the ontology is not reduced to a directed acyclic graph, distances defined in a local way may be defined in a circular way. This is the most common case. For example, it occurs when the distance between two classes depends on the distance between instances that depend on the distance between the classes, or when there is a circulation in the ontology. The two graphs in this circulation are homo-morphic in many ways.

For such cases, we need to define a strategy to compute this global similarity. Two such methods will be presented here: the first will be defined as a process of propagating similarity in a graph, and the second will transform the definition of similarity into a set of equations to be solved by numerical techniques.

In the previous article, we discussed an iterative similarity calculation approach using a set of similarity equations. In this article, we will discuss an optimization approach that extends them further.

Expectation Maximization (EM) is an iterative approach to the maximum likelihood estimation problem (Dempster et al. 1977). It is used to estimate the parameters (of a parametric probability distribution) and to perform a maximum posterior evaluation when the observed data are missing or only partially present, as in a hidden Markov model. The missing data are estimated by conditional expectation probabilities based on the observed data. In other words, missing data is enhanced by inferring potentially useful information on it. The likelihood function is then maximized by assuming that the missing data is known. In other words, each iteration of EM consists of two steps: expectation, called the E step, and maximization, called the M step, which maximizes successive local improvements of the likelihood function approximated by the E step. It is a fixed point method, and convergence is guaranteed by increasing the likelihood at each iteration.

Example of expectation maximization. The ontology matching problem is treated as maximum likelihood estimation in (Doshi et al. 2009; Thayasivam and Doshi 2011). This method searches for the maximum likelihood estimate of the alignment from the observed data in the presence of missing correspondences, which are treated as hidden variables.

In the previous article, we discussed two methods for calculating similarity: expectation maximization and particle swarm optimization. In this article, we will discuss the extension of these methods to stochastic approaches.

Stochastic methods are sometimes used universally in ontology matching, for example, to increase the number of available matching candidates. In this section, we introduce several methods based on Bayesian networks, Markov networks, and Markov logic networks.

The main characteristic of semantic methods is that they use model-theoretic semantics to justify their results. Therefore, they are classified as deductive methods. As with other global methods, purely deductive methods do not work very well on their own for tasks that are inherently inductive, such as ontology matching. Therefore, an anchor is needed. That is, entities that are declared to be equivalent (based on name identity, external resources, or user input, for example). These anchors constitute the initial alignment for the application of deductive methods. The semantic method serves to amplify these seed alignments.

The basis of the semantic method is to infer new correspondences or to test the sufficiency of the alignments. This can be achieved by using a rear sonar that implements alignment semantics. There are several such systems, but the most commonly used method will be the one that uses reduced semantics (Meilicke et al. 2009; Meilicke and Stuckenschmidt 2009; Meilicke 2011).

In the following, we introduce semantic methods based on propositional and modal satisfiability, and methods based on description logic to infer new correspondences. Methods for detecting and repairing alignment inconsistencies will be presented in the next chapter.

The basic and global techniques introduced in the previous sections are the building blocks for constructing a matching system. Once the similarity or dissimilarity between ontology entities is obtained, all that is left is to calculate the alignment. This requires a more comprehensive process. In this chapter, we consider the following points in particular in order to build a practical matching system.

      • Prepare to process large ontologies, if necessary.
      • Organize combinations of various similarity and matching algorithms.
        Use of background knowledge sources.
      • Aggregating the results of basic methods to compute compound similarity between entities.
      • Aggregate the results of basic methods to compute compound similarity between entities.
      • Learn a matcher from the data and tune it.
      • Extract alignments from the resulting (dis)similarity: in fact, different alignments with different characteristics may in fact be extracted from the same (dis)similarity.
      • Improve alignments by disambiguation, debugging, and repair.
    • Strategies in similarity matching methods (2) Context-Based approach

In the previous article, we gave a brief overview of ontology matching strategies. In this article, I will discuss one of those approaches, context-based matching.

When matching two ontologies, there is often no common ground on which to base the comparison. The goal of ontology matching is to find this foundation. This may be accomplished by comparing the content of the ontologies, or by dealing with the context of the ontologies, i.e., the relationship of the ontologies to the environment in which they are used.

This common ground is often found by associating the ontology with external resources. These resources may differ in three specific dimensions: breadth, form, and state.

In the previous article, we discussed context-based matching, which is one of the ontology matching strategies. In this article, we will discuss another approach, weighted selection.

By aggregating the similarities, we can realize the composition of the matchers. This aggregation takes the similarities provided by different matchers and combines them into a single similarity. There are two types of matchers: the competitive type, which matches entities of the same type with different ratings of their similarity, and the complementary type, which identifies the similarity of entities of different types. Each of these must be aggregated differently. In this section, three types of aggregation are listed: weighting (which combines similarities arithmetically by giving different weights to matchers), voting, and argumentation.

In the previous article, we discussed weighted selection, which is one of the ontology matching strategies. In this article, we will discuss learning for alignment sorting.

In this section, we describe an algorithm that learns how to sort alignments by presenting a number of correct alignments (positive examples) and incorrect alignments (negative examples). The main difference between these approaches is that the techniques in this section require some sample data for learning. This can be provided by the algorithm itself, such as only a subset of the correspondences to be determined, or it can be determined by the user, or it can be brought from external resources.

Matchers using machine learning usually work in two phases. (1) a learning or training phase and (2) a classification or matching phase. In the first phase, training data for the learning process is created, for example by manually matching two ontologies, and the system learns the matcher from this data. In the second stage, the learned matcher is used for matching new ontologies. Feedback on the obtained alignments is provided, which may be reflected again in step (1). Learning can be handled online so that the system can learn continuously, or offline if speed is more important than accuracy.

In this article, we will discuss the tuning approach to matching.

Tuning is the process of improving the functionality of a matcher by adjusting it in terms of the following

Improving the quality of the matching results (measured in terms of accuracy, recalls, F-values, etc.).
Improving the performance of the matcher, as measured by execution time and resource consumption such as main memory, CPU, and bandwidth.

Tuning is usually done before matching, i.e., as a pre-match effort, after matching, i.e., as a post-match effort, or iteratively, i.e., involving both or one of the two aforementioned stages. This adjustment process can be done automatically, semi-automatically, or manually. The user makes adjustments before or after matching, either by using a graphic interface or by directly editing a configuration file.

In this article, we will discuss an alignment extraction approach.

The goal of matching is to identify a sufficient set of correspondences between the ontologies. A (dis)similarity measure between entities in both ontologies yields a large set of correspondences. The ones that will be part of the resulting alignment need to be extracted based on similarity. This can be accomplished by a special extraction method acting on the similarity matrix or on a pre-alignment that has already been extracted. Here, we distinguish between the extractor itself, which converts the (dis)similarity matrix into an alignment, and the filter, which reduces the candidate correspondences to one of these formats.

The similarity filter transforms the (dis)similarity matrix by, for example, setting cells below a certain threshold to zero or unitizing cells above the threshold. The alignment extractor generates an alignment from the similarity matrix. This is the main topic of this section. The alignment filter can further manipulate the alignment using the same kind of operations as the similarity filter.
The user can act as an alignment filter. The alignment is obtained by displaying the entity pairs along with their similarity scores and ranks, and leaving the selection of the appropriate pair to the user. This user input can be received as a definitive anchor in a helper environment, as a definition of an anchor to help the system, or as feedback of relevance in a learning algorithm.
Going a step further, it is also possible to define an algorithm to automate the extraction of alignments from similarity scores. A variety of strategies can be applied to this task, depending on the characteristics of the alignment in question.

In this article, we will discuss alignment ambiguity improvement.

Alignment improvement means measuring the quality of the generated alignment, reducing the alignment so that the quality improves, and possibly expanding the resulting alignment repeatedly. There are several possibilities for the initial reduction of the alignment, so it is necessary to select the one to be used. When the measurements reach a certain threshold, an alignment is selected as a result of this process. If the selected alignment is not fed back to the matcher, but provided as a direct output, the improvement acts as an alignment filter.

General alignment improvement framework. The initial alignment (A′′) is generated by the matcher. This is evaluated based on inherent measures (consistency, agreement, constraint violation) to determine a set of compatible subalignments (C). One of these is selected and fed back to the process as the input alignment (A) so that the matcher can improve it. This process is iterated until the scale shows a satisfactory value, and the last calculated alignment (A′) is finally returned.

<Part3 Systems ans Evaluation>

  • Chapter8 Overview of Matching System
  • Chapter9 Evaluation of Matching System

<Part4 Representing, Explaining, and Processing Alignment>

  • Chapter10 Frameworks and Formats: Representing Alignment
  • Chapter11 User Involvement
  • Chapter12 Processing Alignment

<Part5 Conclusion>

  • Chapter13 Conclusions

<Appendix>

  • Appendix A Legends of Figure
  • Appendix B Running Example
  • B.1 culture-shop.owl
  • B.2 library.owl
  • B.3 scralign.rdf
  • B.4 Alternative Alignment for Evaluation
  • Appendix C Exercises
  • Appendix D Solutions

コメント

タイトルとURLをコピーしました