Overview and Implementation of Markov Chain Monte Carlo Methods

Machine Learning Artificial Intelligence Digital Transformation Probabilistic Generative Models Navigation of this blog Algorithm Natural Language Processing Deep Learning Topic Model Markov Chain Monte Carlo Method C/C++ and Machine Learning

Overview of Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo (MCMC) is a statistical method for sampling and integration from probability distributions. Markov Chain

A Markov chain is a stochastic process that randomly transitions from one state to another, and probabilistically determines the transition to the next state based on the current state. A Markov chain has the property of converging to a stationary distribution when the transition probabilities satisfy certain conditions.

Monte Carlo method is a general term for methods of integrating and sampling probability distributions. It is a method of analyzing problems probabilistically using random sampling and is widely used in statistical estimation and numerical computation.

MCMC uses Markov chains to generate samples from a certain probability distribution. It starts with a random initial state and updates the state according to the transition probabilities of the Markov chain, where the transition probabilities are computed based on the conditional probabilities of the current state and the next state. When the chain converges, the generated sample is an approximate distribution that follows the desired probability distribution.

The advantages and challenges of the MCMC method are as follows.

Advantages
- Flexibility: MCMC methods are well suited for sampling from complex probability distributions, which can be efficiently sampled using MCMC methods regardless of the shape or dimensionality of the probability distribution.
- Bayesian Statistical Modeling: MCMC methods are widely used in Bayesian statistical modeling. Bayesian statistical modeling requires estimating the posterior distribution using a prior distribution and a likelihood function, and the MCMC method is a powerful method for sampling this posterior distribution.
- Probabilistic interpretation: The MCMC method allows probabilistic interpretation because the sampling process is based on a Markov chain. The resulting sample reflects the true probability distribution and is useful for estimation and uncertainty assessment.
Challenges
- Convergence Problem: The MCMC method requires convergence to obtain a sufficient number of samples. If censored before convergence is achieved, accurate estimation results may not be obtained, and the time to convergence can be very long, especially for high-dimensional problems.
- Dependence on initial values: MCMC methods may depend on initial values. It is important to choose appropriate initial values, and if inappropriate initial values are chosen, convergence may take a long time.
- Effect of autocorrelation: In MCMC methods, autocorrelation may exist between consecutive samples. This autocorrelation can impair sample independence and affect the accuracy and reliability of the estimation results, requiring the introduction of methods to reduce autocorrelation.
- High-dimensional problems: MCMC methods tend to be computationally expensive for high-dimensional problems. Since efficient sampling can be difficult for high-dimensional probability distributions, improvements to MCMC methods and methods for dimensionality reduction of high-dimensional data have been utilized to address this problem.

MCMC is widely used in applications such as Bayesian statistics and statistical machine learning, and is particularly useful for parameter estimation, model selection, and inference in Bayesian networks.

Next, we describe the algorithms used in this MCMC.

About the algorithm used for MCMC

Various algorithms are used in MCMC. Typical MCMC algorithms are described below.

Metropolis-Hastings algorithm: The Metropolis-Hastings algorithm is the basic algorithm for MCMC. It is used to generate samples from a given probability distribution (target distribution).
1. Set initial values: Set random initial values for the variables you wish to sample.
2. Setting the Proposed Distribution: Based on the current value of the variable, a proposed distribution of the next value is set. The proposal distribution is a probability distribution that suggests values near the current variable.
3. Generate Proposed Value: Generate a new value from the Proposed Distribution by sampling independently.
4. Calculate the acceptance rate: Calculate the acceptance rate using the value of the current variable and the proposed value. The acceptance rate depends on the ratio of the probability density function based on the values of the current variable and the proposed value.
5. Sample Accept/Reject: Based on the acceptance rate, a decision is made to accept or reject the new value. A common method would be to generate a uniform random number and compare it to the acceptance rate.
6. Sample Update: If accepted, the new value is updated as the value of the variable. If rejected, keep the current value as the value of the variable.
7. Check for convergence: Set convergence criteria (e.g., sample autocorrelation or parameter variation threshold) and terminate the iteration if they are met.
8. Save samples: When convergence is achieved, save samples for each variable.

The Metropolis-Hastings algorithm is effective even when the state space is high-dimensional, but the appropriate choice of the proposal distribution and the method of calculating the acceptance probability are important.

Gibbs sampling: Gibbs sampling is an MCMC algorithm for generating samples from a multivariate distribution, and the algorithm procedure is as follows
Set initial values: Set random initial values for each variable.
1. Start Iteration: Start the iterations. The following steps are repeated a certain number of times (or until convergence).
2. Select the order of variables: Decide the order of variables to be sampled. The order of the variables may be chosen randomly, but usually a predetermined order is used.
3. Sampling each variable: Sample each variable according to the order chosen. In this case, the values of the other variables use the values in the current iteration.
4. Updating the sample: The values of the sampled variables are updated. This will affect the sampling of the other variables in the next iteration.
5. Check for convergence: Check whether the iterations have converged. Convergence criteria (e.g., sample autocorrelation or parameter variation threshold) are set and the iteration is terminated when they are met.
6. Save samples: If convergence is achieved, save samples for each variable. This will obtain the samples obtained as a result of the MCMC method.

As described above, the principle of operation is to generate samples from a conditional distribution by fixing the values of other variables when sampling each variable in turn; Gibbs sampling is suitable for parameter estimation and Bayesian inference for multivariate distributions because of its ability to handle conditional distributions.

Hamiltonian Monte Carlo (HMC): Hamiltonian Monte Carlo (HMC) is an MCMC algorithm based on the principles of Hamiltonian dynamics. The steps of the HMC algorithm are as follows
1. Set initial values: Set random initial values for the variables you wish to sample. Also, set the parameters of the Hamiltonian dynamics.
2. Momentum sampling: Sample the momentum of the same dimension as the variable you wish to sample, independently from the appropriate distribution. Momentum can be thought of as velocity or momentum.
3. Simulating Hamiltonian Dynamics: Simulate the Hamiltonian dynamics to obtain new variables and momentum values. Hamiltonian dynamics becomes a way to represent continuous motion between variables and momentum. Specifically, the Hamiltonian is defined as a function of variables and momentum, and the equations are solved numerically based on it.
4. Sample acceptance/rejection: The new values of variables and momentum obtained by simulating the Hamiltonian dynamics are used to determine whether to accept or reject the sample. The acceptance rate is calculated using the Metropolis-Hastings step.
5. Sample Update: If the sample is accepted, the values of the variables are updated to the new values.
6. Check for convergence: Set convergence criteria (e.g., sample autocorrelation or parameter variation thresholds) and terminate iterations when they are met.
7. Save samples: When convergence is achieved, save samples for each variable.

HMC is efficient in high-dimensional probability distributions and distributions of complex shapes, which are characterized by generating samples with low autocorrelation.

These algorithms are the basic MCMC methods and are used in a variety of applications. For a detailed theory of MCMC and specific algorithms, see “Markov Chain Monte Carlo (MCMC) Methods and Bayesian Estimation” and others.

Libraries and platforms available for MCMC

Various libraries and platforms are available to implement MCMC. The following describes some of the most representative libraries and platforms among them.

Stan: Stan is an open source platform for Bayesian statistical modeling and MCMC sampling. Stan is implemented in C++, but can also be used from languages such as R, Python, and Julia.
PyMC3: PyMC3 is a Bayesian statistical modeling library written in Python that will support MCMC sampling. and is used for stochastic modeling and parameter estimation.
emcee: emcee will be an MCMC sampling library implemented in Python. emcee uses the Affine Invariant Markov chain Monte Carlo (MCMC) algorithm. This algorithm is invariant to scaling and rotation transformations and thus has high efficiency and convergence.
JAGS: JAGS (Just Another Gibbs Sampler) will be an open source program for Bayesian statistical modeling and MCMC sampling. JAGS supports Gibbs sampling and uses the BUGS language (Bayesian Inference Using Gibbs Sampling), allowing users to control JAGS from R, Python, etc. to perform Bayesian inference.
Anglican: Anglican is a statistical programming language for easy probabilistic modeling and inference and is a probabilistic programming language based on Clojure. using Anglican, you can combine the powerful programming capabilities of Clojure with Anglican’s Anglican makes it possible to combine the powerful programming capabilities of Clojure with the probabilistic programming capabilities of Anglican, facilitating the description of probabilistic models and the implementation of inference algorithms.

Next, we discuss MCMC application examples.

About MCMC Application Examples

MCMC is widely used in a variety of domains, and we discuss some of its applications below.

Bayesian statistical modeling: MCMC is a particularly important method in Bayesian statistical modeling, as it allows estimating posterior distributions of parameters and model selection. Bayesian statistical modeling has been applied in a variety of fields, including medicine, ecology, economics, and machine learning.
Machine Learning: MCMC is also widely used in the area of machine learning. In particular, MCMC is used in Bayesian machine learning for parameter estimation and model selection; MCMC can be used to model uncertainty and generate samples from posterior distributions.
Economics: In economics, MCMC is also used to estimate and forecast economic models. Examples include the estimation of parameters in macroeconomic and financial market models and the application of MCMC to policy evaluation.
Bioinformatics: In the field of bioinformatics, MCMC is used in the analysis of gene expression data and genome analysis; MCMC can be used to identify gene expression patterns and related genes and to estimate evolutionary gene networks.
Physics: In physics, MCMC is used in statistical and quantum mechanical problems. This means, for example, that MCMC is used to sample the state of a system, such as spin or lattice models, or to analyze phase transitions.

In the following, we discuss specific implementations of the MCMC methods described so far.

Implementation Procedure for Markov Chain Monte Carlo Methods

The general implementation procedure for the Markov Chain Monte Carlo (MCMC) method is as follows

Problem definition: Clearly define the problem to which MCMC is to be applied. Specifically, clarify the parameters to be estimated, the structure of the model, and the objective function.
Model design: MCMC requires the construction of a stochastic model. Design the prior and conditional distributions of the parameters and the likelihood function appropriately.
Initial state setting: MCMC requires setting the initial state (initial value). The initial state is usually chosen randomly from an appropriate location in the entire search space.
MCMC Algorithm Selection: There are various MCMC algorithms. Depending on the problem to be applied, choose an algorithm such as Metropolis-Hastings, Gibbs sampling, Hamiltonian Monte Carlo (HMC), etc.
Perform sampling: sample in the search space using the selected MCMC algorithm. Update the states according to the aforementioned algorithm and decide whether to accept or reject the new state based on the acceptance probability.
Collecting samples: By running the MCMC algorithm, a series of states is obtained. This series is used to compute the posterior distribution of the parameters to be estimated, the value of the objective function, etc. It should be noted here that collecting a sufficient number of states will result in low sample autocorrelation.
Evaluation of convergence: evaluate the convergence of the MCMC. Since sampling should continue until convergence is achieved, the analysis of autocorrelation and convergence diagnostic methods (e.g., Germain-Rubin diagnostics) should be used to evaluate convergence.
Interpretation of results: The obtained samples are used to interpret the optimal values of the objective function, the posterior distribution of the parameters, etc. If necessary, further analysis and visualization will be performed.

Next, we describe the specific implementation using python.

On an example implementation of the Markov chain Monte Carlo method in python

Below is an example implementation of MCMC using Python. The example here covers the estimation of the mean of a 1D normal distribution using the Metropolis-Hastings algorithm.

import numpy as np

# Target probability distribution (1D normal distribution)
def target_distribution(x):
    return np.exp(-0.5 * (x - 5) ** 2) / np.sqrt(2 * np.pi)

# Proposal distribution (Gaussian)
def proposal_distribution(x, sigma):
    return np.random.normal(x, sigma)

# MCMC Sampling
def mcmc_sampling(iterations, sigma):
    samples = []
    x = 0  # Initial Settings
    for _ in range(iterations):
        # Sampling new values from the proposal distribution
        x_new = proposal_distribution(x, sigma)

        # Calculation of acceptance probability
        accept_prob = target_distribution(x_new) / target_distribution(x)

        # acceptance decision
        if np.random.uniform(0, 1) < accept_prob:
            x = x_new

        samples.append(x)

    return samples

# Perform sampling
iterations = 10000
sigma = 0.5

samples = mcmc_sampling(iterations, sigma)

# Display Results
print("Estimated Results:")
print("mean value:", np.mean(samples))
print("standard deviation:", np.std(samples))

In the above code, the target probability distribution (1D normal distribution) is defined in the target_distribution() function. In addition, the proposal_distribution() function uses a Gaussian distribution as the proposal distribution.

In the mcmc_sampling() function, MCMC sampling is performed for the specified number of iterations. New values are sampled from the proposal distribution, the acceptance probability is calculated and an acceptance decision is made, the sampled values are stored in the samples list, and finally, the mean and standard deviation are calculated and displayed from the sampled results.

Reference Books and Reference Information

For more detailed information on Bayesian inference, please refer to “Probabilistic Generative Models” “Bayesian Inference and Machine Learning with Graphical Models” and “Nonparametric Bayesian and Gaussian Processes.

A good reference book on Bayesian estimation is “The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant from Two Centuries of C“

“Think Bayes: Bayesian Statistics in Python“

“Bayesian Modeling and Computation in Python“

“Bayesian Analysis with Python: Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ, 2nd Edition“