Implementation examples of EM algorithms and various applications

Machine Learning Artificial Intelligence Digital Transformation Algorithms and Data Structures General Machine Learning Noise Removal and Missing Value Interpolation Navigation of this blog Python

EM Algorithm

The EM algorithm (Expectation-Maximization Algorithm) is an iterative optimization algorithm widely used in statistical estimation and machine learning. In particular, it is often used for parameter estimation in stochastic models with latent variables.

The EM algorithm considers a simultaneous stochastic model of observed data and latent variables. The general objective is to find the parameters of the model that maximize the log-likelihood of the observed data by integrating out the latent variables.

The steps of the EM algorithm are as follows

Initialization: Set initial values for the parameters.
E-step (Expectation Step): Calculate the posterior distribution of the latent variable under the current parameters. This means finding the expected value of the latent variable for each data point based on the current parameters.
M-step (Maximization Step): Using the expected value of the latent variable computed in the E-step, we find the value that maximizes the parameter. This updates the parameters to maximize the log-likelihood.
Convergence Decision: Steps 2 and 3 are repeated until the amount of parameter updating falls below a set threshold or the specified number of iterations is reached.

The EM algorithm updates the parameters in the direction of increasing the log-likelihood, thus converging the parameters through iterative optimization. In each step, parameter optimization proceeds by computing the posterior distribution of the latent variable in the E step and maximizing the parameters in the M step.

Examples of EM Algorithm Applications

EM algorithms have been applied to a variety of real-world problems. Some typical applications are listed below.

Mixture Models: The EM algorithm is widely used for estimating parameters for mixture models. Mixture models are a method of modeling data as a linear combination of multiple probability distributions. For example, a Gaussian mixture model assumes that the data are generated from multiple Gaussian distributions; the EM algorithm is used to estimate the parameters (mean, variance) and mixture coefficients for each Gaussian distribution from the observed data.
Hidden Markov Model (HMM): An HMM is a probabilistic model for modeling serial data. The EM algorithm is used to estimate the parameters of the HMM, and in the E-step, based on the current parameters, a posteriori distribution of the latent states at each time is estimated. The posterior distribution of latent states at each time is estimated, and the M step is used to maximize the parameters.
Missing data processing: When missing values are included in the data set, the EM algorithm is used to supplement missing data; in the E step, the posterior distribution for the missing portion of the data containing missing values is estimated, and in the M step, the parameters are maximized. This allows parameter estimation and forecasting that takes missing data into account.
Rating Prediction: The EM algorithm is also applied to the rating prediction problem used in recommendation systems. This would be, for example, where a matrix decomposition-based method is used to model the ratings between users and items with common latent features, and the EM algorithm is used to estimate the parameters of the model (the feature vectors of the users and items).

Extension of EM Algorithm

Various extensions have been proposed for the EM algorithm, including

EM with Memoization (EM with Memoization): Memoization EM is a speed-up method for the EM algorithm. In the usual EM algorithm, multiple expectation calculations are required to calculate the posterior probability of each data point in E steps.
SEM (Structured EM): SEM is an extension of the EM algorithm for structured data. While the usual EM algorithm assumes that the data are independent, SEM addresses cases where there is structure or dependencies among the data. This is useful, for example, when dealing with graph structures or series data.
Randomized EM: Randomized EM is a method of randomizing the initialization step of the EM algorithm to make it less prone to local solutions. While the results of ordinary EM algorithms can be greatly affected by initialization, Randomized EM improves stability by trying multiple initializations and selecting the best result.
EM with Mixture Proportion Constraints: EM with Mixture Proportion Constraints is a method to improve estimation results by imposing constraints on the mixing ratios in a mixture model. In a mixture model, the mixing coefficients for each component must be non-negative and sum to one, but this constraint may not be satisfied in real data. Mixture ratio constrained EM improves model fit and interpretability by enforcing this constraint.

The following is an image of the processing flow of a concrete application example and an example implementation in python.

Processing Mixed Models with EM Algorithm

<Outline of processing steps>

As an example of processing EM algorithms, consider parameter estimation for Gaussian Mixture Model (GMM). These are performed in the following steps.

Initialization:
- Randomly initialize the mean (μ) and variance (σ^2) of K Gaussian distributions.
- Prepare the data set.
E-step (Expectation step):
- For each data point, calculate the posterior probability from each Gaussian distribution. In other words, calculate the probability that each data point is generated from each Gaussian distribution.
- The posterior probability is calculated using Bayes’ theorem and the parameters (μ, σ^2) of each Gaussian distribution.
M-step (Maximization step):
- The parameters (μ, σ^2) of each Gaussian distribution are re-estimated.
- The mean (μ) of each Gaussian distribution is calculated as a weighted mean using the posterior probabilities calculated in the E step. The weights are the posterior probabilities.
- The variance (σ^2) of each Gaussian distribution is calculated as a weighted variance using the posterior probabilities calculated in the E step.
Convergence Decision:
- The E and M steps are repeated until the change in parameters is sufficiently small.
- Usually, the convergence decision is made by monitoring the amount of change in the log-likelihood and the amount of change in the parameters.
Output of results:
- Use the converged parameters to determine the Gaussian distribution or cluster to which each data point belongs.

By repeating the above steps, the EM algorithm estimates the parameters of the Gaussian mixture model: in the E step, the posterior probabilities of the Gaussian distribution to which each data point belongs are calculated, and in the M step, the posterior probabilities are used to re-estimate the parameters of the Gaussian distribution. This process is iterative, and convergence yields the final parameter estimation results.

<Example implementation in python>

Below is an example implementation of the EM algorithm using Python for parameter estimation in a mixed Gaussian model.

import numpy as np

def expectation_maximization(data, num_components, num_iterations):
    # Get the number of data dimensions and the number of data points
    num_features = data.shape[1]
    num_samples = data.shape[0]

    # Parameter initialization
    # Randomly initialize mean, variance, and mixing coefficients
    means = np.random.randn(num_components, num_features)
    variances = np.random.rand(num_components, num_features)
    variances += 1e-6  # Add small values to prevent dividing by zero
    mixing_coeffs = np.ones(num_components) / num_components

    # Iterative EM Algorithm
    for iteration in range(num_iterations):
        # E-step: Compute posterior probabilities of latent variables for each data point
        posteriors = np.zeros((num_samples, num_components))
        for k in range(num_components):
            # Calculate the posterior probability of the kth Gaussian distribution for each data point
            prior = mixing_coeffs[k]
            likelihood = gaussian_pdf(data, means[k], variances[k])
            posterior = prior * likelihood
            posteriors[:, k] = posterior
        posteriors /= np.sum(posteriors, axis=1, keepdims=True)  # normalization 

        # M-step: Maximize parameters
        for k in range(num_components):
            # Re-estimate parameters for each Gaussian distribution
            posterior_sum = np.sum(posteriors[:, k])
            means[k] = np.sum(posteriors[:, k].reshape(-1, 1) * data, axis=0) / posterior_sum
            variances[k] = np.sum(posteriors[:, k].reshape(-1, 1) * (data - means[k])**2, axis=0) / posterior_sum
            mixing_coeffs[k] = posterior_sum / num_samples

    return means, variances, mixing_coeffs

def gaussian_pdf(x, mean, variance):
    # Probability density function of Gaussian distribution
    exponent = -0.5 * ((x - mean) ** 2) / variance
    coef = 1 / np.sqrt(2 * np.pi * variance)
    return coef * np.exp(exponent)

In this implementation, for a given data set (data), the EM algorithm is run using a specified number of mixture Gaussian distributions. num_components specifies the number of components in the mixture model and num_iterations specifies the number of iterations of the EM algorithm.

In the E step, the posterior probabilities of each Gaussian component are computed for each data point, and in the M step, those posterior probabilities are used to re-estimate the parameters of each component (mean, variance, and mixture coefficient). This process is repeated for the specified number of iterations.

Processing Hidden Markov Models with EM Algorithm

<Summary of Processing Steps>

The following is the process flow of parameter estimation for the Hidden Markov Model (HMM) using the EM algorithm.

Initialization:
- Randomly initialize the state transition probability matrix (transition matrix), output probability matrix, and initial state probabilities of the HMM.
- Prepare the dataset.
E-step (Expectation step):
- Based on the dataset, calculate the posterior probabilities (forward and backward probabilities) to each state at each time.
- The forward algorithm is used to compute the forward probability and the backward algorithm is used to compute the backward probability.
M-step (Maximization step):
- Re-estimate the parameters of the HMM using the posterior probabilities computed in the E step.
- Transition matrix estimation: Estimate the transition probabilities between each state using the posterior probabilities.
- Estimation of the output probability matrix: Using the posterior probabilities and the observed data, estimate the output probability of the observed values in each state.
- Estimate initial state probabilities: Estimate initial state probabilities using posterior probabilities.
Convergence Decision:
- The E and M steps are repeated until the change in parameters is sufficiently small.
- Convergence is usually determined by monitoring the change in the log-likelihood and the change in the parameters.
Output of results:
- Estimate the most plausible series of states at each time using the converged parameters.

In the parameter estimation of the hidden Markov model using the EM algorithm, the posterior probabilities at each time are computed in the E step, and the transition matrix, output probability matrix, and initial state probabilities are re-estimated in the M step. This process is repeated, and when the parameters converge, the final parameter estimation results are obtained. The resulting series of most plausible states at each time can also be obtained.

<Implementation in Python>

To implement the processing of Hidden Markov Models (HMM) using the EM algorithm in Python, it is common to use several libraries and functions. Below is an example implementation of the processing of a Hidden Markov Model with the EM algorithm using the hmmlearn library in Python.

First, install the hmmlearn library.

pip install hmmlearn

Next, the following code is used to estimate the parameters of the HMM using the EM algorithm.

from hmmlearn import hmm
import numpy as np

# Data Set Preparation
X = np.array([[0], [1], [0], [1], [0]])

# Initialization of HMM
model = hmm.GaussianHMM(n_components=2, n_iter=100)

# Parameter Estimation by EM Algorithm
model.fit(X)

# Estimated transition matrix
print("Estimated transition matrix:")
print(model.transmat_)

# Estimated output probability matrix
print("Estimated output probability matrix:")
print(model.emissionprob_)

In the above code, the hmm.GaussianHMM class from the hmmlearn library is used to initialize an HMM with two states. n_components specifies the number of states and n_iter specifies the number of iterations of the EM algorithm.

The dataset X represents the observed data, defined in this example as a sequence of continuous values in one dimension. model.fit(X) is called to estimate the parameters of the HMM by the EM algorithm, and the estimated transition matrix and output probability matrix are stored in model.transmat_ and model.emissionprob_.

Processing missing data with EM algorithm

<Summary of processing steps>

The general flow for processing missing data using the EM algorithm is as follows

Dataset Preparation:
- Prepare a dataset containing missing values.
Initialization:
- Initialize the parameters of the model to complement the missing values. This includes the distribution of the observed variables and the parameters involved.
E-step (Expectation step):
- Estimates the posterior probability or expectation of the completed data using the data without missing values.
- Based on the observed data, the posterior probabilities or expected values of the unobserved variables of the completed data are computed.
M-step (Maximization step):
- The parameters of the model are re-estimated using the posterior probabilities or predictions estimated in the E step.
- The parameters are updated using methods such as maximum likelihood estimation described in “Overview of Maximum Likelihood Estimation and Algorithms and Their Implementations” and maximum posterior probability estimation based on the supplemented data.
Convergence Decision:
- The E and M steps are iteratively repeated until the parameter changes satisfy convergence conditions.
- The amount of change in the parameters and the change in the log-likelihood are used to determine convergence.
Resulting output:
- The final estimated parameters can be used to obtain data that complements the missing values.

<Implementation in python>

There are different approaches to implementing EM algorithms for dealing with missing data, depending on the specific missing data pattern and the model used. Below is an example of a common missing data estimation implementation using Python.

First, consider the following data set.

import numpy as np

# Dataset containing missing values
dataset = np.array([[1, 2, np.nan, 4],
                    [5, np.nan, 7, 8],
                    [9, 10, np.nan, 12]])

In this example, we have a two-dimensional data set, with some elements represented as missing values (np.nan).

Next, we show the procedure for estimating the missing data using the EM algorithm.

import numpy as np

def em_algorithm(dataset, max_iterations=100):
    # Get the number of rows and columns in the data set
    n_samples, n_features = dataset.shape

    # Initialize missing values with random values
    estimated_data = np.copy(dataset)
    estimated_data[np.isnan(dataset)] = np.random.rand(np.isnan(dataset).sum())

    # Iterative EM Algorithm
    for _ in range(max_iterations):
        # E-step: Estimate missing values
        for i in range(n_samples):
            for j in range(n_features):
                if np.isnan(dataset[i, j]):
                    # Missing values are complemented by predicted values
                    estimated_data[i, j] = estimated_data[i].mean()

        # M-step: Model parameter re-estimation

        # Omitted: Implement the process of re-estimating the parameters of the model.

    return estimated_data

# Apply EM algorithm to estimate missing data
estimated_data = em_algorithm(dataset)

# Display estimation results
print("Estimated Data:")
print(estimated_data)

The above code defines the em_algorithm function, which implements the EM algorithm. In this function, the missing values in the data set are initialized with random values; in the E step, the missing values are complemented with predicted values; in the M step, which is omitted, the process of re-estimating the parameters of the model is implemented if necessary. Finally, the em_algorithm function is called to estimate the missing data and display the estimation results.

Processing of rating predictions using the EM algorithm

<Summary of Processing Steps>

The general processing flow for rating prediction using the EM algorithm is shown below.

Data Preparation:
- Prepare the rating data given by the user for the item.
- The rating data is expressed in a format that includes information such as user ID, item ID, and rating value.
Initialization:
- Initialize the parameters of the model for rating prediction. This includes user-specific trends and item-specific characteristics.
E-step (Expectation step):
- Estimates the posterior probability of rating an item for each user based on the rating data.
- The posterior probability or predicted value of the item rating is calculated for each user.
M-step (Maximization step):
- Re-estimate the parameters of the model using the posterior probabilities and predictions estimated in the E step.
- The parameters are updated using methods such as maximum likelihood estimation and Bayesian estimation.
Convergence decision:
- The E and M steps are iteratively repeated until the parameter changes satisfy convergence conditions.
- The amount of change in the parameters and the amount of change in the log-likelihood are used to determine convergence.
Rating Prediction:
- The final estimated parameters are used to predict the rating for an unknown item.
- Prediction results will typically be provided as the highest rating or ranking for each user.

<Implementation in python>

Here is an example of a general implementation for rating prediction using the EM algorithm. This example uses Probabilistic Matrix Factorization (PMF), a simple matrix factorization-based model.

The following code is an example of implementing rating prediction with the EM algorithm using the NumPy library in Python.

import numpy as np

def em_algorithm(ratings, num_users, num_items, latent_dim, max_iterations=100, tol=1e-4):
    # Get the number of rows and columns of rating data
    num_ratings = len(ratings)

    # Initialize parameters for matrix factorization
    user_latent = np.random.rand(num_users, latent_dim)
    item_latent = np.random.rand(num_items, latent_dim)

    # Iterative EM Algorithm
    for iteration in range(max_iterations):
        # E-step: Estimate posterior probabilities of ratings
        # Calculate posterior probabilities for each user
        user_posterior = np.zeros((num_users, latent_dim))
        for user in range(num_users):
            relevant_ratings = ratings[ratings[:, 0] == user, :]
            item_indices = relevant_ratings[:, 1].astype(int)
            item_ratings = relevant_ratings[:, 2]
            user_posterior[user] = np.dot(item_latent[item_indices].T, item_ratings)

        # M-step: Re-estimate model parameters
        # Re-estimate user parameters
        for user in range(num_users):
            relevant_ratings = ratings[ratings[:, 0] == user, :]
            item_indices = relevant_ratings[:, 1].astype(int)
            item_ratings = relevant_ratings[:, 2]
            item_latent_sum = np.dot(item_latent[item_indices], item_latent[item_indices].T)
            item_rating_vector = np.dot(item_latent[item_indices].T, item_ratings)
            user_latent[user] = np.linalg.solve(item_latent_sum, item_rating_vector)

        # Re-estimate item parameters
        for item in range(num_items):
            relevant_ratings = ratings[ratings[:, 1] == item, :]
            user_indices = relevant_ratings[:, 0].astype(int)
            user_ratings = relevant_ratings[:, 2]
            user_latent_sum = np.dot(user_latent[user_indices], user_latent[user_indices].T)
            user_rating_vector = np.dot(user_latent[user_indices].T, user_ratings)
            item_latent[item] = np.linalg.solve(user_latent_sum, user_rating_vector)

        # convergence judgment
        if iteration > 0:
            delta = np.abs(prev_user_latent - user_latent).sum() + np.abs(prev_item_latent - item_latent).sum()
            if delta < tol:
                break

        # Save parameter updates
        prev_user_latent = user_latent.copy()
        prev_item_latent = item_latent.copy()

    return user_latent, item_latent

# Example of rating data
ratings = np.array([[0, 0, 5],
                    [0, 1, 4],
                    [1, 1, 3],
                    [1, 2, 2],
                    [2, 0, 1],
                    [2, 2, 5]])

num_users = 3
num_items = 3
latent_dim = 2

# Rating Prediction by EM Algorithm
user_latent, item_latent = em_algorithm(ratings, num_users, num_items, latent_dim)

# Display user and item parameters
print("user parameter:")
print(user_latent)
print("Item Parameters:")
print(item_latent)

In this example, the EM algorithm estimates the PMF model parameters, user parameters (user_latent) and item parameters (item_latent), based on the given rating data.

In the iterative process of the EM algorithm, the posterior probabilities of the ratings are estimated in the E step, and the user and item parameters are re-estimated in the M step. Iterative repetition of these steps results in convergence of the parameters and the final rating prediction.

reference information

The information on EM algorithm includes “Thorough Explanation of EM Algorithm“, “EM Algorithm from the Beginning to the End“, “Let’s be Friends with EM Algorithm“, and “Introduction to Machine Learning Theory for IT Engineers“, etc.