MCMC and Bayesian estimation

Machine Learning Technology Artificial Intelligence Technology Digital Transformation Technology Probabilistic Generative Models Navigation of this blog Algorithm Natural Language Processing

MCMC and Bayesian estimation

MCMC stands for “Markov Chain Monte Carlo Method,” and is one of the algorithms for extracting samples (generating random numbers) from a multivariate probability distribution.

MCMC is just a calculation method, and in order to solve the task, we need to organize the problem so that it can be solved by MCMC (statistical modeling). This is similar to the way that “what to calculate” is more important than the calculation of addition, subtraction, and multiplication itself in arithmetic problems.

Matrix decomposition” and “optimization” are necessary to fit a curve using the least-squares method or to estimate the parameters of a probability distribution using the maximum likelihood method. In contrast, in Bayesian statistics, the “answer” is first given in the form of a “distribution” called the “posterior distribution. In contrast, in Bayesian statistics, the “answer” is first given in the form of a “distribution” called the “posterior distribution,” which makes it a perfect match for MCMC, which generates samples from the distribution. The final necessary information such as point estimates, errors, prediction intervals, etc. are obtained from the many samples generated by MCMC.

The difference between MCMC and optimization is that MCMC keeps moving and generating samples all the time, while optimization stops somewhere (the real optimal solution when it works, or the first time when it doesn’t). In fact, MCMC may also move close to optimization, but even in that case, it moves around the optimal solution forever, expressing error in the Bayesian sense.

MCMC is a general-purpose method for handling probability distributions, and is also used in statistical physics described in “Statistical physics and its application to artificial intelligence technology” and frequency theory statistics. On the other hand, in Bayesian statistics, various calculation methods such as Laplace approximation, Kalman filter, sequential Monte Carlo method, variational Bayesian method, etc. are used, and MCMC is one of them.

MCMC is one of them. Roughly speaking, MCMC generates uniform random numbers numerically, and changes the state (value of random variables) randomly and gradually by repeating simple operations with them. There are various approaches to this, such as “Gibbs sampler,” “Metropolis method,” and “Hamiltonian Monte Carlo method. By designing such iterations of probabilistic operations well, we can achieve sampling from the targeted distribution (or posterior distribution in the case of Bayesian statistics).

In using MCMC, it is important to note that MCMC is a method of “generating samples by looking for areas with a large probability density through simple repetition using random numbers,” so whether or not it will work well in actual examples may depend on the results. In the case of algorithms such as “sorting numbers in order of size”, the behavior can be completely understood by looking at the source code, but it is different from that. So, it is necessary to look back and check the relationship between the model, data, and MCMC behavior.

In this blog, we will discuss the following about MCMC.

Implementation

Markov Chain Monte Carlo (MCMC) is a statistical method for sampling from probability distributions and performing integration calculations. The MCMC is a combination of a Markov Chain and a Monte Carlo method. This section describes various algorithms, applications, and implementations of MCMC.

Analytically computing integrals of complex complex complex distributions, such as Bayesian estimation, is difficult, and the Markov Chain Monte Carlo (MCMC) method is often used. This is a type of basic random selection algorithm. The simplest Monte Carlo method generates random numbers as candidate parameters and calculates the probability (integral) corresponding to the random numbers, but such a brute-force method is inefficient because the computational complexity increases explosively as the number of parameters increases, while the MCMC method is not completely random and can be used to calculate the probability of a random number. The algorithm is not random, but rather generates the next random number next random number while gradually searching for a value with a larger probability (Markov probability field search) based on one previous random number.

We have previously introduced Rstan, and Pystan and PyMC3 are implementations of MCMC in Python. Also, since Rstan has not been updated since 2020 and the Stan team seems to be encouraging CmdStan, I would like to discuss clj-stan, a simple wrapper for CmdStan as well as CmdStanr, a wrapper for r in CmdStan.

Theory and application

    The MCMC method is a technique for numerical calculations that can be applied to a variety of fields. The MCMC method has historically been widely used in the field of physics, where it can be used to perform complex integrals and complex probability calculations. In recent years, it has become an important tool in statistics, and is also used in machine learning, finance, and other fields where statistical methods are important.

    In applied problems such as quantum physics, Bayesian statistics, and combinatorial optimization problems, even though there are differences in the fields, most problems ultimately come down to problems of probability and expectation, and the MCMC method shows its power. Therefore, if you understand the basics of the MCMC method to some extent, you will be able to write code for your own purposes and understand complex algorithms written by others when you find something you want to investigate, regardless of the field.

    In this article, I will first explain why the MCMC method is necessary and the basic theory of the MCMC method (probability, expected value, and Monte Carlo method (random number approach)) as a base for understanding them.

    In this article, we will discuss the application of Markov chains to Monte Carlo methods as described above in the general discussion of Markov chain Monte Carlo methods.

    There are four basic conditions for a Markov chain Monte Carlo method. (1) It is a Markov chain: that is, the probability of obtaining {x(k+1)} from {x(k)} does not depend on the past history {x(0)}, {x(1)}, …, {x(k-1)}, but only on {x(k)}. This probability is denoted as T({x(k)}→{x(k+1)}). (T represents the transition probability), (2) Vanishing: Every pair of variables {x} and {x’} can be shifted in a finite number of steps, (3) Acyclicity: It is possible to return from {x} to {x} in a number of steps ns. (3) Acyclicity: Suppose that it is possible to return from {x} to {x} with a number of steps in ns. There are many possible values of ns, and the greatest common divisor is called the period. When the period is 1 for all {x}, we say that such a Markov chain is acyclic. The detailed balance condition (detailed equilibrium condition): the transition probability T for all {x} and {x’} is P({x})⋅T({x}→{x’}) = P({x ‘})⋅T({x’}→{x}) → is satisfied.

    The Monte Carlo (random sampling) method is performed under a probability distribution function that satisfies these conditions.

    In this article, I will give an overview of the Metropolis method, which is a concrete algorithm.

    In the Metropolis method, x(1), x(2), …, x(k), x(k+1), … are constructed from the initial value x(0) by the following procedure. (1) Choose a real number Δx randomly and propose x’=x(k)+Δx as a candidate for x(k+1). However, it is assumed that Δx and -Δx appear with the same probability in order to satisfy the detail balance condition. (In this case, we will choose a suitable c>0 and generate a uniform random number between -c and +c.), (2) Metropolis test: generate a uniform random number r between 0 and 1, 𝑟<𝑒𝑆(𝑥( 𝑘))-𝑆(𝑥′), then accept the proposal and update it to x(k+1)=x’. Otherwise, reject the proposal and set x(k+1)=x(k).

    Furthermore, we will discuss the concrete code of the Metropolis method using the C language.

    We will apply the Metropolis method to simulations with two Gaussian distributions, one with a semicircle and the other with a Gaussian distribution, and discuss the relationship between the number of steps and the simulation results.

    In this article, I will discuss the multivariate case with the “curse of dimensionality” where the MCMC method shows its power, and I will discuss the cautions and new algorithms that arise from being multivariate.

    In order to extend the univariate Metropolis method to multivariate, first let the variables be (x1,x2,…,xn). Now that we are multivariate, there are two major options for collecting samples. One is “update all at once” and the other is “update one by one”.

    When there are a lot of variables, the update probability will become small unless the step width ci is small in the “update all at once” method. Compared to this, “update one by one” can keep the step width relatively large.

    In this blog, I describe the C code of the collective update algorithm in detail.

    In this article, we will discuss HMC (Hybrid Monte Calro or Hamiltonian Monte Carlo), which is an algorithm other than Metropolis method.

    The name “HMC” comes from the fact that it is a “hybrid” Monte Carlo method (MC) that combines two different methods, the Metropolis method and the Molecular Dynamics method. It is also called Hamiltonian Monte Carlo because it uses an analog of the Hamiltonian equation of motion and the Hamiltonian quantities that appear in physics.

    In this blog, I describe the specific C code for the HMC method.

    In this article, I will discuss Gibbs sampling and the MH method. Gibbs sampling (also called the heat bath method in the physics industry), which I will discuss in this article, can be applied in a limited number of situations, but when it can be applied, it is an efficient algorithm. It is often applicable to relatively straightforward distributions such as Bayesian statistics.

    First of all, I will describe the code in C language based on the algorithm for simple Gaussian distribution. In Gibbs sampling, the code is simple because there is no Metropolis test.

    In this article, we will discuss some applications of Markov chain Monte Carlo methods.

    First, we will discuss the application of Markov chain Monte Carlo methods to statistics. Given the action function of a probability calculation, the likelihood can be computed using Markov chain Monte Carlo methods. In this example, the likelihood can be easily calculated using the Metropolis method, the HMC method, or the Gibbs sampling method.

    The Metropolis and HMC methods are not only applicable to more complex distributions, but are also very good in that they do not require much brain power to write the program.

    The Gibbs sampling method is more time-consuming to write, but it does not require adjustment of parameters, so if you need to solve a similar problem many times, or if you can use a program already written by someone else, the Gibbs sampling method is effective.

    In this article, I will discuss Ising, combinatorial optimization, and applications in particle physics and other fields. I will discuss the Ising model which considers the optimization of lattice points with spins (small magnets), a standard topic in university physics (statistical mechanics) classes. (Simulation of phase change in physical phenomena)

    Next, I will describe a Markov chain Monte Carlo method applied to the traveling salesman problem, which is a typical problem in combinatorial optimization.

    In the Markov chain Monte Carlo (MCMC) method, a simple algorithm may not give a good answer for problems that are trapped by local solutions. The first countermeasure to these problems is to “devise a way to move many elements at the same time. For example, complex ways of moving elements have been studied using algebraic geometry for sampling contingency tables, and research has also been conducted on the Latin square.

    The easiest method to implement for this is the simulated annealing method, and the replica exchange Monte Carlo method, which is a group of methods (extended ensemble methods) that improve on these methods to correctly sample from a distribution.

    コメント

    タイトルとURLをコピーしました