Ontology Similarity by Relational Pertinent Matching, EM and PSO

Artificial Intelligence Technology Web Technology Knowledge Information Processing Technology Semantic Web Technology Ontology Technology Natural Language Processing Machine learning Ontology Matching Technology

In the previous article, we discussed an iterative similarity calculation approach using a set of similarity equations. In this article, we will discuss optimization approaches that extend these approaches further. In this article, we will first discuss two of them, expectation maximization and particle swarm optimization.

Expectation Maximization (EM)
Expectation Maximization (EM) is an iterative approach to the maximum likelihood estimation problem (Dempster et al. 1977). It is used to estimate the parameters (of a parametric probability distribution) and perform a maximum posterior evaluation when the observed data are missing or only partially present, as in a hidden Markov model. The missing data are estimated by conditional expectation probabilities based on the observed data. In other words, missing data is enhanced by inferring potentially useful information on it. The likelihood function is then maximized by assuming that the missing data is known. In other words, each iteration of EM consists of two steps: expectation, called the E step, and maximization, called the M step, which maximizes successive local improvements of the likelihood function approximated by the E step. It is a fixed point method and convergence is guaranteed by increasing the likelihood at each iteration.

Example of expectation maximization. The ontology matching problem is treated as maximum likelihood estimation in (Doshi et al. 2009; Thayasivam and Doshi 2011). The method searches for the maximum likelihood estimate of the alignment from the observed data in the presence of missing correspondences, which are treated as hidden variables; let M be a binary match matrix (the rows and columns represent entities in the ontology that match, and the corresponding cells are filled with “1” if they match and “0” otherwise). The corresponding cells are filled with “1” for match and “0” otherwise). The method iteratively searches for a match matrix M that provides the maximum conditional probability P(o|o′,M) of the source ontology o, given the target ontology o′, the match assignments made through the match matrix M, and the initial correspondence In EM terms, each match In EM terminology, each match assignment variable is considered a model, and thus the match matrix is considered a mixture model, where X is the set of observed data instances, Y is the set of missing values, and M is the match matrix or mixture model.

The two EM steps are formalized as follows: E-step In the E-step, the weighted sum of log-likelihood is calculated as follows.

\[Q(M^{i+1}|M^i)=\displaystyle\sum_{y\in Y}P(y|X,M^i)L(M^{i+1}|X,y) \]

where i is the iteration number, weights are the conditional probabilities of the hidden variables (correspondences), and L(Mi+1|X,y) is the log likelihood of the model at iteration i + 1. Since the maximum values of L and log(L) are the same, we use logarithms to simplify the computation. The initial match seed and the probability of the hidden variables can be estimated by term similarity or simple structural heuristics such as “if two entities match, then their parent nodes are also likely to match”.

M-step In this step, the parameters that maximize the expected log-likelihood (of the E-step) are computed, and the match matrix (mixture model) that maximizes the expected value is selected. In practice, the condition of maximization is often relaxed by selecting a mixture model that only improves the previous model, as is known as the generalized expectation maximization method.

\[ M_*^{i+1}\in {M:Q(M|N^i)\geq Q(M^i|M^i)}\]

This match matrix then becomes the input for determining the distribution of the hidden variables in the next E step. In this way, the EM method iteratively modifies the match matrix (mixture model) by maximizing the weighted sum of the log-likelihoods of these models.

Particle Swarm Optimisation (PSO)
PSO is a non-deterministic optimization method that belongs to Swarm Intelligence (Engelbrecht 2005). PSO is a meta-heuristic that works at any time, can be interrupted at any time, and provides the best results found so far. It will be inspired by the collective intelligence of groups of simple agents called particles, such as a flock of birds or a school of fish, which adapt to the environment they live in (Kennedy and Eberhart 1995).
Given an initial group of randomly generated solutions, the particles memorize and share the best solution they have seen so far. The swarm of particles moves (erases) over the solution space based on the information in this shared space. In this way, the particle swarm explores the solution space while looking for promising areas. In particular, the motion of the particles is evaluated using velocity vectors. These vectors are based on the individual best positions of the particles (pBest), known as the personal influence component, and the results of the whole group of particles (gBest), known as the social influence component (shown in simplified form). In each iteration, the particles change their velocity towards a better position based on pBest and/or gBest. In this way, evolution searches only for the best solution; unlike genetic algorithms described in “Overview of genetic algorithms, application examples, and implementation examples“, PSO described in “Overview and Implementation of Particle Swarm Optimization” does not perform genetic operations such as crossover or mutation. Instead, the particles are updated by the velocity vector.

An example of PSO. Ontology matching can be viewed as an optimization problem tackled using discrete particle swarm optimization (DPSO) solutions (Bock and Hettenhausen 2012).DPSO was introduced in (Kennedy and Eberhart 1997) and further improved by using variable dimensionality for each particle in (Corea et al. 2006). 2006), where it is further improved by using variable dimensionality for each particle. The objective function to be optimized is the alignment quality. A particle represents a candidate alignment, and its dimensionality represents the number of correspondences in the alignment. The particles evolve by a predefined number of iterations (e.g., 200), with the population predefined, e.g., 50 particles, by using a velocity vector that establishes the new position of the particle in the solution space, based on memory (pBest) and communication between the particles (gBest). The evaluation of the correspondences is done using a fitness function. For this purpose, the results of the various collators (applied to the entities of the two ontologies) are aggregated into a single fitness value: DPSO returns the global best alignment (gBest) with respect to the fitness function.

This method basically consists of three steps.

Initialization: Assign a random number of correspondences to each particle (since the number of correct correspondences is not known in advance), evaluate them with a fitness function, and calculate the pBest alignment for each particle.
Swarm iteration: update the globally optimal alignment gBest when a new optimal particle appears.
In updating the velocity vector, we restrict the random reinitialization of particles to ensure convergence. Specifically, each component of the velocity vector is increased for correspondences that exist in the global or personal best alignment. These are controlled by dedicated parameters: β for when the correspondence is in pBest, γ for when the correspondence is in gBest, and so on. These correspondences are marked to be preserved (and thus will not be replaced by random reinitialization in the iterations). These correspondences can be further promoted as correlations that are never replaced within a particle using customizable thresholds. The fitness and velocity vector updates for each particle can be computed in parallel.

Summary of Optimization Methods
In order to calculate the alignment as an optimization problem, we have introduced two methods: one is to optimize the quality of the similarity measure, and the other is to optimize the quality of the alignment directly. Not limited to these examples, any optimization method may be used. The use of such methods for ontology matching is still in its infancy, but as soon as we can re-formulate the problem as an optimization problem, we will be able to use them.
In this section, we consider global methods that give specific interpretations to the computed similarities and alignments. The first one gives a probabilistic interpretation to the alignments; the second one exploits the semantics of the alignments and ontologies.

In the next article, we will discuss probabilistic methods.

Similarity in global matching (4) Optimized matching method、EM and PSO

コメント