Matching strategy for ontology similarity, selection by weighting

Artificial Intelligence Technology Web Technology Knowledge Information Processing Technology Semantic Web Technology Ontology Technology Natural Language Processing Machine learning Ontology Matching Technology

In the previous article, we gave a brief overview of ontology matching strategies. In this article, we will discuss one of these approaches, selection by weighting.

Aggregating Similarities and Alignments
As shown in the previous section, the composition of a matcher can be achieved by aggregating similarities. This aggregation takes the similarities provided by different matchers and combines them into a single similarity. There are two types of matchers: the competitive type, which matches entities of the same type with different ratings of their similarity, and the complementary type, which identifies the similarity of entities of different types. Each of these must be aggregated differently. In this section, we list three types of aggregation: weighting (which combines similarities arithmetically by giving different weights to matchers), voting, and argumentation.

Weighting
Compound similarity will be about the aggregation of dissimilar similarities. As explained in the previous section, since structured objects (classes, individuals) are often involved in many different relations, we can calculate the similarity between each of the ontology entities to which two objects are related. For example, the similarity between two classes depends on the similarity derived from their names, the similarity of their superclasses, the similarity of their instances, and the similarity of their properties. These similarities must be aggregated into a single similarity measure.

Triangle Norm
The triangle norm is used as a conjunction operator in uncertainty calculations.

Definition 7.3 (Triangular norm) 
The triangular norm T is a function from D × D → D 
(where D is a set ordered by ≤ and given an upper bound ⊤), 
satisfying the following conditions

\[\begin{eqnarray} T(x,T)&=&x \ &(boundary\ condition)&\\ x\leq y \ &\Rightarrow& \ T(x,z)\leq T(y,z) \ &(monotony)&\\T(x,y)&=&T(y,x) \ &(commutativity)&\\T(x,T(y,z))&=&T(T(x,y),z) \ &(associativity)& \end{eqnarray} \]

Typical examples of trigonometric norms would be min(x,), x x y, and max(x + y – 1, 0). All are normalized; min is the only idempotent norm (∀x, min(x, x)=x). The triangular norm is a candidate for the combination that requires the highest score from all the aggregated values. By associativity, the triangular norm can be extended to n-ary measures. Any triangular norm on the unit interval can be expressed as a combination of these three functions (Hájek 1998).
Another triangular norm for aggregating multiple dimensions would be the Weighted product.

Definition 7.4 (Weighted Product) 
Let o be a set of n-dimensional, analyzable objects, 
then the weighted product between two such objects is ∀x, x′∈o.

\[\delta(x,x’)=\displaystyle \prod_{i=1}^n \delta(x_i,x’_i)^{w_i}\]

δ(xi,xi′) is the dissimilarity of pairs of objects in dimension i, and wi is the weight in dimension i.
These operators have the disadvantage that if the measure in one dimension is zero, the result will also be zero.
Example of triangular norm. In this section, we consider two ontologies, the first one consisting of the concepts Product, Provider, Creator, and the second one consisting of the concepts Book, Translator, Publisher, Writer. The following two tables show the results of applying edit distance and WordNet-based distance to these labels.The following table shows the summation of these distances in the triangular norm, that is, the min operation and the weighted product.

The similarity of the data in the first table is not very similar to each other, but the results are very similar when aggregated in the triangular norm.
Contrary to multidimensional aggregators, triangulation tends to imply dependencies between values in different dimensions, and values given in one dimension can override values in other dimensions.

Multidimensional distances and weighted sums
When there is a need to aggregate the differences between several properties, one of the most common types of distance is the Minkowski distance. Unlike the previous ones, these measures are suitable for independent dimensions and tend to balance the values between dimensions.

Definition 7.6 (Minkowski distance) 
Let o be an n-dimensional, analyzable set of objects, 
then the Minkowski distance between two such objects is ∀x, x′∈o.

\[\delta(x,x’)=\sqrt[p]{\displaystyle\sum_{i=1}^n\delta(x_i,x’_i)^p}\]

δ(xi,xi′) is the dissimilarity of a pair of objects in dimension i.
Examples of Minkowski distances are the Euclidean distance (for p = 2), the Manhattan (aka city block) distance (for p = 1), and the Chebyshev distance (for p = +∞). These are used to aggregate measurements of independent dimensions.
Example of Minkowski distance. We start with the distances computed on the labels by the min aggregation operator described above and the distances obtained from the Hamming distances on the set of instances of the concept. These distances usually take into account independent dimensions.

Aggregating these two distances using the (normalized) Euclidean and Manhattan distances, we get the following.

The values of the Euclidean distances are lower than those of the Manhattan distances, but they are very close.
These distances can be weighted to give more importance to some dimensions. These distances can be normalized by dividing the result by the maximum possible distance (which is not always possible). However, when p≠1, the main drawback is that it is not linear. This causes problems when trying to find these distances when they are defined as a function of each other (Valtchev 1999).
A simple linear aggregate can be further refined by adding weights to this sum. Weighted linear aggregation takes into account that the values being aggregated are not of equal importance. For example, the similarity of properties is more important than the similarity of comments. Therefore, the aggregation function uses a set of weights w1,…,wn corresponding to categories of entities such as classes and properties. ,wn sets corresponding to categories of entities such as classes and properties. The aggregation function can be defined as follows.

Definition 7.8 (Weighted sum) 
Let o be an n-dimensional, analyzable set of objects, 
then the weighted sum between two such objects is ∀x, x′∈o.

\[\delta(x,x’)=\displaystyle\sum_{i=1}^nw_i\times \delta(x_i,x’_i)\]

δ(xi,xi′) is the dissimilarity of pairs of objects in dimension i, and wi is the weight of dimension i.

The weighted sum can be thought of as a generalization of the Manhattan distance, with weights for each dimension. It is also equivalent to a weighted average with normalized weights. In fact, the weights may vary depending on the category of objects being aggregated. The function can then use a set of weights wCP that depends on the category of the objects. Cand also represents the type of value of P that is calculated.
This kind of measure can be normalized ((sum_{i=1}^n w_i=1)) if all values are normalized, as follows

Example weighted sum. From the previous example, it appears that the measure on instances is more accurate than the measure on labels. This can be inferred from the fact that both sets of labels do not have a common name and the distance is lower in the latter case. Thus, the weighting of these dimensions becomes promising. Considering the same input set as in the previous example, the computed weighted sum is as follows

The above results show that ⟨Provider, Publisher⟩ and ⟨Creator, Writer⟩ are clear match candidates. It cannot be selected as a match candidate due to the low similarity between Product and Book.

Fuzzy Aggregation and Weighted Average
Fuzzy aggregation operators are used to assimilate like quantities in a way that preserves the structure of the aggregated domain.

Definition 7.10 (Fuzzy Aggregation Operators) 
A fuzzy aggregation operator f is a function from Dn → D 
(where D is a set ordered by ≤ and given an upper bound ⊤) 
with ∀x, x1, . ... , xn, y1, ... ... which satisfies ∀x, x1, . . , xn,
 y1, . . , yn ∈ D satisfies the following conditions.

\[\begin{eqnarray} f(x,\dots,x)&=&x \ &(idempotency)&\\ \forall x_i,\ y_i,x_i\leq y_i&\Rightarrow& f(x_1,\dots,x_n)\leq f(y_1,\dots,y_n)&(increasing\ monotony)&\\f \ is\ a\ &continuous&\ function\ &(continuity)&\end{eqnarray}\]

min is also a fuzzy aggregate function. A general result for these measures is that for any fuzzy aggregation function f , the aggregates are ordered by f (x , y ) ≥ min(x, y) ≥ x × y ≥ max(x + y – 1, 0). A typical example of a fuzzy aggregation operator would be the weighted average (Gal 2011).

Definition 7.11 (Weighted Average) 
Let o be a set of n-dimensional, analyzable objects. 
The weighted average between two such objects is ∀x, x′∈o.

\[\delta(x,x’)=\frac{\sum_{i=1}^nw_i\times\delta(x_i,x’_i)}{\sum_{i=1}^nw_i}\]

δ(xi,xi′) is the dissimilarity of a pair of objects in dimension i, and wi is the weight in dimension i.
The simple average function will be a function such that all weights are equal. If the values are normalized, then the weighted average is also normalized. In fact, a normalized weighted sum is also a weighted average.
Fuzzy aggregation functions should be used when you want to aggregate the results of competing algorithms (which are efficient with respect to some aspects, but not with respect to others) and try to take advantage of all of them. These can be very useful when you want to use a learning algorithm to learn the weights of the measurements. (Gal et al. 2005a) argue that these measures are always preferable to the triangular norm for aggregating confidence measures.
Some systems, such as LCS and MoTo, take their cues from linguistic quantifiers such as “most of” (Yager 1988, 1993) to define fuzzy aggregation functions.

Harmonic Adaptive Weighted Sum
(Mao et al. 2010) introduced the concept of harmonic adaptive weighted sum to weight different matchers. This operation gives higher weights to measures that are more discriminating. This operator calculates, for each measure, the proportion of cells in the matrix whose similarity is inferior to all other cells in the same row and column (in the case of dissimilarity), with the largest possible value (the size of the smallest ontology).

Definition 7.12 (Harmonic Adaptive Weighted Sum) 
Let o and o′ be two sets of objects comparable in n measures, 
then the harmonic adaptive weighted sum of measures 
between two such objects is ∀x ∈ o, x′ ∈ o′.

\[\delta(x,x’)=\displaystyle\sum_{i=1}^n h(\delta_i)\times\delta(x,x’)\]

Such that δi(x,x′) is a measure of the dissimilarity of pairs of objects along the i-th measure.
measure, and h is such that

\[h(\delta_i)=\frac{|\{\begin{eqnarray}<e,e’>\in o \times o’;∧\begin{cases}\forall y \in o’ | {e’},\sigma(e,y)\gt \sigma(e,e’)\\ \forall x \in o | {e},\sigma(x,e’) \gt \sigma(e,e’)\end{cases}\end{eqnarray}\}|}{min(|o|,|o’|)}\]

This measure is determined by normalization by dividing the result by the sum of the weights. This measure gives preference to more discriminating similarities or distances.

Ordered weighted average
Another aggregation operator here will be the ordered weighted average (Yager 1988). This associates weights with the position of each of the dimension values, rather than with the dimension itself. This gives more weight to the highest (or lowest) values, among others. This is important when aggregating the results of a matcher. This is important when aggregating matcher results, because this way we can ignore which dimension they come from and keep only the results of the highest match.

Definition 7.13 (Ordered weighted average) 
Let o be a set of n-dimensional, analyzable objects, 
and let the ordered weighted average operator f be the set of objects 
from Dn → D (where D is ordered by ≤ and given an upper bound ⊤) 
such that ∀x, x1, . ... , a function satisfying xn ∈ D. , xn ∈ D, defined as.

\[f(x_1,\dots,x_n)=\displaystyle\sum_{i=1}^n w_i\times x’_i \]

The ordered weighted average has the properties of the average operator (commutativity, monotonicity, and evenness). The max, min, and average functions are special cases of the ordered weighted average.

Voting
The aggregation of alignments can be done by considering each matcher as an independent source of information, and the decision to include a correspondence in an alignment as a vote in favor of this correspondence. This can be decided by a simple majority vote.

Definition 7.14 (Majority voting) 
Let {Ai }i∈I be the set of alignments for the same ontology o and o′ ,
 then the alignment A chosen by majority voting from {Ai }i∈I 
is wi ×xi′ - w1,... ,wn be the set of weights in [0 1] 
such that ni = 1 wi = 1.

\[ A= \{c\in \bigcup_{i\in I}A_i||\{A_i|c\in A_i\}_{i\in I}|\gt\frac{|I|}{2}\} \]

This can be refined by considering the reliability associated with the correlation as a weight.

Definition 7.15 (Majority weighted vote) 
Let {Ai}i∈I be the set of alignments for the same ontologies o and o′ ,
 then the alignment A elected by majority vote from {Ai}i∈I is as follows.

\[ A= \{c\in \bigcup_{i\in I}A_i|\displaystyle\sum_{i\in I}kì(c)\gt\frac{|I|}{2}\} \]

It would also be possible to set thresholds according to distributed weights rather than majority voting.
In fact, any voting technique (Taylor 2005) can be applied to the alignment adjustment, and all the summation measures previously considered (weighted sum, weighted average, ordered weighted average) can be turned into voting techniques by adding a threshold.

Dempster-Shafer Theory
The Dempster-Shafer theory of evidence provides a numerical mechanism for correcting and reasoning about uncertain information. The greatest strength of the Dempster-Shafer theory is that it allows for the combination of evidence from independent sources and proposes a simple rule known as the Dempster binding rule. (Dempster 1967; Shafer 1976)
In Dempster-Shafer theory, the sample space is called the “frame of discernment” or simply the “frame”. It is denoted by Ω and consists of a set of unique and mutually exclusive hypotheses. Evidence is used to select the best hypothesis in Ω. In other words, it is a piece of evidence that implies a hypothesis. Data sources, such as experts and sensors, provide such evidential claims.

The evidence on the frame Ω is the mass function m, also known as the basic assignment function, i.e., expressed as m : 2Ω → [0 1], such that the following two conditions hold

\[\begin{eqnarray} m(\emptyset)&=&0\\ \displaystyle\sum_{A⊆\Omega}m(A)&=&1\end{eqnarray}\]

Also, A is a subset of Ω, and m(A) denotes the strength of the evidence that supports claim A exactly. The first condition above implies that the set Ω must be complete, which corresponds to the closed world assumption. The second condition implies that the expert’s statements must be normalized, meaning that each source of evidence is equally important. Based on the mass function, several measures are defined.

In particular, beliefs about more specific propositions, such as subsets of A, are governed by the belief measure Bel : 2Ω → [0 1], as follows.

\[Bel(A)=\displaystyle\sum_{B⊆A}m(B) \]

Bel(A) is a measure of the overall (i.e., including a particular subset of) beliefs or legitimate endorsements assigned to A.
Moreover, there may be another belief B that is consistent with (overlaps with) A. This is handled by the following plausibility measure Pl : 2Ω → [0 1].

\[ Pl(A)=\displaystyle\sum_{A\cap B\neq\emptyset}m(B)\]

Pl(A) is a measure of the maximum possible support that can be assigned to A, if justified by additional information.

The mass assigned by the mass function can be viewed as a segment of the unit interval. The complete information about the measure of belief for a set A is represented by the belief interval [Bel(A), P l(A)]. On the other hand, Pl(A) – Bel(A) represents the ignorance (missing data) or uncertainty interval about A.

デンプスター・シェイファー理論における信念、妥当性、不確実性のつながり。

The Dempster coupling rule aggregates the two mass functions by balancing the coupled evidence with the conflicting one.

Definition 7.16 (Dempster coupling rule) 
Given two mass functions m and m′, the
The coupled mass function m⊕m′ of the non-null hypothesis A is defined
 as follows

\[m\oplus m'(A)=\frac{\sum_{B\cap B’=A}m(B)\times m'(B’)}{1-\sum_{B\cap B’=\emptyset}m(B)\times m'(B’)}\]

Because normalization is a rule that emphasizes consensus among experts and ignores conflicting evidence, the combination of conflicting evidence can lead to counterintuitive conclusions (Zadeh 1984). A possible solution to this problem is to relax the assumption of a closed world (Smets 1990).

An example of the use of Dempster-Shafer theory, which has been used in (Besana 2006; Wang et al. 2007; Nagy and Vargas-Vera 2010) to combine various matching results. In this setting, the similarity measures provided by different matchers, such as edit distance and WordNet, are considered as subjective expert assessments that provide supporting evidence. Specifically, the normalized similarity values correspond to the mass values assigned to the pairs of entities to be matched and constitute the elements of the frame. For example, mwordnet (paper, article) = 0.86 means that, according to the WordNet matcher, the mass function supporting the claim that paper corresponds to article will have a value of 0.86. The frame contains all possible correspondences evaluated by the matchers involved. The masses provided by the matchers are combined by the Dempster rule. Based on the combined evidence, the most probable correspondence from the mass distribution needs to be selected. This can be done, for example, by a threshold that rejects pairs of entities with very high plausibility and very low belief measures.

Argumentation
Argumentation is a method of obtaining agreement between parties by making arguments for or against a particular position. In ontology matching, there are two roles

If two agents accept each other’s claims, negotiate consistency between the two agents.
Integrity through matching. In particular, multi-agent consistency negotiation can be seen as another aggregation technique between two consistencies. (Silva et al. 2005) present such a system based on quantitative negotiation rather than argumentation.

Arguments will be those that allow the agent to provide counterarguments or select arguments based on preferences. It has been used in ontology matching to find agreement between alignments (Trojahn et al. 2011). In this case, correspondences are seen as arguments, which tend to attack each other depending on whether they contradict each other or are based on techniques previously proposed by different agents. Therefore, metadata about correspondences is important because it provides a basis for preferring or attacking a particular correspondence based on its source, how it was obtained, or the credibility attached to it.

An example of an argument for alignment. Consider two agents C and P corresponding to the ontologies o and o′ , expressed in description logic as follows.

\[ \begin{eqnarray} o &=& \{Micro-company ≡ Company ⊓ ≤5 employee\}\\o′ &= &\{SME ≡ Firm ⊓ ≤10 associate\}\end{eqnarray}\]

Assume that they have found the following alignment A.

\[\begin{eqnarray}A=\{&<Company,Firm,=>&, \ &(\gamma_1)&\\&<emploee,associate,≤>&,\ &(\gamma_2)&\\&<Micro-company,SME,≤>&\}\ &(\gamma_3)&\end{eqnarray}\]

The three correspondences are denoted by γ1, γ2, and γ3, respectively, and the arguments in favor of γ1 are as follows
a1: Every Company known on one side is a Firm on the other side, and vice versa.
a2: The two names, Company and Firm, are synonymous in WordNet.
Arguments in favor of γ3 are as follows
a3: In addition to alignment (no γ3), it means that the two ontologies correspond.
a4: A micro-enterprise known on one side is an SME on the other side (not vice versa).
Also, as a counterargument
a5: The two names, Micro-company and SME, are not similar by any string distance and are not considered synonymous by WordNet.
a6: The only features they have in common are associate and employee, and they differ in domain and cardinality.
(Laera et al. 2006), arguments are represented according to a value-based argumentation framework (Bench-Capon 2003). These consist of a flag indicating whether one agrees (+) or disagrees (-) with the correspondence, and the type of method (basic method) that supports this correspondence. A simple representation of these arguments is as follows.

\[\begin{eqnarray} &a_1:& <Company,Firm,=, &<+,extensional>&>\\&a_2:&<Company,Firm,=, &<+,terminological>&>\\&a_3:&<Micro-company,SME,≤, &<+,semantic>&>\\&a_4:&<Micro-company,SME,≤, &<+extensional>&>\\&a_5:&<Micro-company.SME,≤, &<-,terminological>&>\\&a_6:&<Micro-company,SME,≤, &<-,structual>&> \end{eqnarray}\]

Such arguments can be provided by existing basic matchers. Another, more elaborate way of defining arguments is to allow the correspondences themselves to be justified. This is expressive enough, for example, to express that the structural similarity between micro-company and SME depends on the terminological similarity between employee and associate.
The rationale for such an argument is that some agents may prefer or trust some technologies more than others.

For example, we can imagine that agent C prefers terminological arguments over extensional arguments, extrapolative arguments over semantic arguments, and semantic arguments over structural arguments. Similarly, P can have different priorities, preferring structural, semantic, terminological, and extensional arguments.
In logical theory (Dung 1995; Amgoud et al. 2000), given a set of arguments and agent preferences, a consensus arrangement between the two parties can be defined. An argumentation framework consists of a set of arguments and an attack relation (Dung 1995). A set of arguments is called acceptable if none of its elements attacks any other element and all elements are acceptable. Finally, the preferred extension is the inclusion maximum acceptable set.

The agents’ objectives will be (i) to exchange arguments so that they can work on a common argument, and (ii) to decide which arguments, and positions, they can accept together. For this purpose, they can adopt a cautious approach that selects only arguments that are part of all preferred extensions, or a credible approach that selects all arguments that are in at least one preferred extension of each agent.
For example, C has the preferred extension {a5, a1, a2, a6}, while P has the preferred extension {a6, a5, a2, a1}. However, the largest common subset of arguments between C and P is {a1, a2, a5, a6}, which would select the preferred sequence consisting of γ1 and γ2.
Different results have been obtained using different refinements of the argumentation. They are value-based argumentation, which allows for different preferences between arguments (Laera et al. 2006), strength-value-based argumentation, which builds preferences for confidences and votes between agents respectively, and voting-value-based argumentation (Trojahn et al. 2008).

Summary on similarity and alignment aggregation
The simultaneous use of multiple matchers and similarity measures is common. In order to provide an alignment, these measures need to be integrated. Integration may be based on an aggregation of similarity values or ranks. It can also be based on the opinions of stakeholders (matchers or agents) and can be integrated using simple voting or more elaborate argumentation strategies.
Once the integration is performed, the resulting similarities can be used to extract alignments or for further synthesis or aggregation. It can also serve as a basis for training and tuning the matcher.

In the next article, we will discuss learning to sort the alignment.

Strategies in similarity matching methods (3) Weighted selection approach

コメント