Nonparametric Bayesian and Gaussian Processes

Machine Learning Technology Artificial Intelligence Technology Digital Transformation Technology Probabilistic Generative Models Navigation of this blog
  1. Nonparametric Bayesian and Gaussian Processes
    1. Overview
    2. Implementation
        1. Dirichlet Process (DP) Overview, Algorithm and Implementation Examples
        2. Hierarchical Dirichlet Process (HDP) Overview, Algorithm and Implementation Examples
        3. Chinese Restaurant Process Overview, Algorithm and Implementation Example
        4. Stick-breaking Process Overview, Algorithm, and Implementation Example
        5. Dirichlet Process Mixture Model (DPMM) Overview, Algorithm and Implementation Examples
        6. Overview of Bayesian Inference and Various Implementations
        7. Overview and Implementation of Markov Chain Monte Carlo Methods
        8. Overview of Kullback-Leibler Variational Estimation and Various Algorithms and Implementations
        9. Overview of Bayesian Structural Time Series Models and Examples of Applications and Implementations
        10. Black-Box Variational Inference (BBVI) Overview, Algorithm, and Implementation Examples
        11. Implementation of Gaussian Processes in Clojure
        12. GPy – A Framework for Gaussian Processes Using Python
        13. Implementation of a Bayesian optimization tool using Clojure
  2. nonparametric Bayesian theory
      1. Machine Learning Professional Series – Nonparametric Bayesian Point Processes and the Mathematics of Statistical Machine Learning Reading Notes
      2. Mathematical Overview of Nonparametric Bayesian Point Processes and Statistical Machine Learning
      3. On the various probability distributions used in stochastic generative models
      4. Overview of Stochastic Generative Models and Learning
      5. Overview of Bayesian Estimation with Concrete Examples
      6. Comparison of clustering using k-means and Bayesian estimation methods (mixed Gaussian model)
      7. Non-parametric Bayesian and clustering(1) Dirichlet distribution and infinite mixture Gaussian model
      8. Non-parametric Bayesian and clustering(2) Stochastic model of partitioning and Dirichlet processes
      9. Application of Nonparametric Bayesian Structural Change Estimation
      10. Nonparametric Bayesian Applications to Factor Analysis and Sparse Modeling
      11. Fundamentals of Measure Theory for Nonparametric Bayesian Theory
      12. Nonparametric Bayesian from the Viewpoint of Point Processes – Poisson Processes and Gamma Processes
      13. Nonparametric Bayes from the viewpoint of point processes – Gamma and Dirichlet processes
  • Gaussian process theory
      1. Content describing the relationship between Gaussian processes and deep learning
      2. Machine Learning Professional Series – Gaussian Processes and Machine Learning Reading Notes
      3. Gaussian Processes and Machine Learning – Introduction
      4. Linear Regression Model
      5. Gaussian distribution
      6. Overview of Gaussian Processes (1) Gaussian Processes and Kernel Tricks
      7. Overview of Gaussian Processes (2) Gaussian processes and kernels
      8. Overview of Gaussian Processes(3) Gaussian Process Regression Model
      9. Overview of Gaussian Processes(4) Hyperparameter Estimation of Gaussian Process Regression
      10. Overview of Gaussian Processes(5) Generalization of Gaussian Process Regression
      11. Stochastic Generative Models and Gaussian Processes(1) Basis of Stochastic Models
      12. Stochastic Generative Models and Gaussian Processes(2)Maximum Likelihood Estimation and Bayesian Estimation
      13. Stochastic Generative Models and Gaussian Processes(3) Representation of Probability Distributions
      14. Calculation of Gaussian processes (1) Calculation by the auxiliary variable method
      15. Calculation of Gaussian processes (2)Variational Bayesian Method and Stochastic Gradient Method
      16. Calculation of Gaussian processes (3) Gaussian Process Method Calculation Based on a Grid Arrangement of Auxiliary Points
      17. Spatial statistics of Gaussian processes, with application to Bayesian optimization
      18. Unsupervised Learning with Gaussian Processes (1)Overview and Algorithm of Gaussian Process Latent Variable Models
      19. Unsupervised Learning with Gaussian Processes(2) Extension of Gaussian Process Latent Variable Model
      20. Gaussian Processes Miscellany – The Advantages of Function Clouds and Their Relationship to Regression Models, Kernel Methods, and Physical Models
      21. Equivalence between Neural Networks (Deep Learning) and Gaussian Processes
      22. Probabilistic Programming with Clojure
  • Nonparametric Bayesian and Gaussian Processes

    Overview

    Nonparametric Bayes is a method of Bayesian statistics, an “old and new technique” that was already theoretically perfected in the 1970s, and is a statistical method for data analysis and forecasting using flexible, data-dependent probability models. Nonparametric Bayes is named “nonparametric” because it does not require a priori setting of deterministic parameters.

    Nonparametric Bayes allows one to build probability models from the data itself and to estimate probability distributions from the data instead of assuming the true probability distribution that generates the data. This allows the use of flexible models for the data and automatically adjusts the probability distribution to fit the data.

    There are several different nonparametric Bayesian methods, but one of the most common is the Dirichlet Process Mixture Model (DPMM) described in “Overview of the Dirichlet Process Mixture Model (DPMM), its algorithm and examples of implementation“, which uses Dirichlet processes. Dirichlet processes are stochastic processes for defining probability distributions of infinite dimension, which can be efficiently computed using modern search algorithms such as Markov chain Monte Carlo methods. DPMM is one of the leading nonparametric Bayesian methods and has been applied to various tasks such as clustering, structural change estimation using statistical models, density estimation, factor analysis, and sparse modeling.

    Gaussian Process (GP) is a nonparametric regression and classification method based on probability theory, and is a type of stochastic process used for modeling continuous data. Similar to nonparametric Bayesian processes, Gaussian processes perform data analysis and forecasting by defining a probability distribution of infinite dimension and estimating the probability distribution for the data.

    In the Gaussian process approach, the process of generating data is modeled by a probability distribution, and kernels (kernel functions, kernel function matrices) are used to express relationships among data points, and to estimate relationships and uncertainty among data points. The kernel of a Gaussian process can have a variety of shapes and can be customized to fit the characteristics of the data, allowing for a flexible model to be constructed to fit the data.

    Gaussian processes can estimate the uncertainty (confidence interval) of the forecast results, which allows one to evaluate the reliability of the forecast. It is also a technique that can be applied to a small number of data points.

    In this blog, we will discuss the details of nonparametric Bayesian learning and Gaussian processes.

    Implementation

    Dirichlet Process (DP) Overview, Algorithm and Implementation Examples

    Dirichlet Process (DP) Overview, Algorithm and Implementation Examples. The Dirichlet Process (DP) is a powerful tool for dealing with infinite-dimensional probability distributions and plays a central role in Bayesian nonparametric models, which are applied to clustering and topic modeling.

    Hierarchical Dirichlet Process (HDP) Overview, Algorithm and Implementation Examples

    Hierarchical Dirichlet Process (HDP) Overview, Algorithm and Implementation Examples. The Hierarchical Dirichlet Process (HDP) is a Bayesian nonparametric method for handling infinite mixture models. The Bayesian nonparametric method is one of the Bayesian nonparametric methods for dealing with infinite mixture models.

    Chinese Restaurant Process Overview, Algorithm and Implementation Example

    Chinese Restaurant Process Overview, Algorithm and Implementation Example. The Chinese Restaurant Process (CRP) is a probabilistic model used to intuitively explain the Dirichlet Process (DP), as described in “Overview of the Dirichlet Process (Dirichlet Process, DP), Algorithms, and Examples of Implementations. The Dirichlet Process (DP) is a probabilistic model used to intuitively explain the Dirichlet Process (DP). It is frequently used for clustering problems in particular.

    Stick-breaking Process Overview, Algorithm, and Implementation Example

    Stick-breaking Process Overview, Algorithm, and Implementation Example. The Stick-breaking Process is a typical method for intuitively understanding the Dirichlet Process (DP), as described in “Overview of the Dirichlet Process (DP), Algorithms, and Examples of Implementations. It is a typical approach to understand the Dirichlet Process (DP) intuitively, in which a bar of length 1 is infinitely and repeatedly divided at random to generate an infinite-dimensional probability distribution. This makes it a visually and mathematically beautiful way to construct discrete probability measures of Dirichlet processes.

    Dirichlet Process Mixture Model (DPMM) Overview, Algorithm and Implementation Examples

    Dirichlet Process Mixture Model (DPMM) Overview, Algorithm and Implementation Examples. The Dirichlet Process Mixture Model (DPMM) is one of the most important models in clustering and cluster analysis. The DPMM is characterized by its ability to automatically estimate clusters from data without the need to determine the number of clusters in advance.

    Overview of Bayesian Inference and Various Implementations

    Overview of Bayesian Inference and Various Implementations. Bayesian inference is a method of statistical inference based on a probabilistic framework and is a machine learning technique for dealing with uncertainty. The objective of Bayesian inference is to estimate the probability distribution of unknown parameters by combining data and prior knowledge (prior distribution). This paper provides an overview of Bayesian estimation, its applications, and various implementations.

    Overview and Implementation of Markov Chain Monte Carlo Methods

    Overview and Implementation of Markov Chain Monte Carlo Methods. Markov Chain Monte Carlo (MCMC) is a statistical method for sampling from probability distributions and performing integration calculations. The MCMC is a combination of a Markov Chain and a Monte Carlo method. This section describes various algorithms, applications, and implementations of MCMC.

    Overview of Kullback-Leibler Variational Estimation and Various Algorithms and Implementations

    Overview of Kullback-Leibler Variational Estimation and Various Algorithms and Implementations. Kullback-Leibler Variational Estimation is a method for estimating approximate probabilistic models of data by evaluating and minimizing differences between probability distributions. It is widely used in the context of Its main applications are as follows.

    Overview of Bayesian Structural Time Series Models and Examples of Applications and Implementations

    Overview of Bayesian Structural Time Series Models and Examples of Applications and Implementations. Bayesian Structural Time Series Model (BSTS) is a type of statistical model that models phenomena that change over time and is used for forecasting and causal inference. This section provides an overview of BSTS and its various applications and implementations.

    Black-Box Variational Inference (BBVI) Overview, Algorithm, and Implementation Examples

    Black-Box Variational Inference (BBVI) Overview, Algorithm, and Implementation Examples. Black-Box Variational Inference (BBVI) is a type of variational inference method for approximating the posterior distribution of complex probabilistic models in probabilistic programming and Bayesian statistical modeling. BBVI is called “Black-Box” because the probability model to be inferred is treated as a black box and can be applied independently of the internal structure of the model itself and the form of the likelihood function. BBVI is a method that can be used for inference without knowing the internal structure of the model.

    Implementation of Gaussian Processes in Clojure

    Implementation of Gaussian Processes in Clojure. A Gaussian process is like a box (stochastic process) that randomly outputs a function form. For example, if we consider that the process of dice generating the natural numbers 1, 2, 3, 4, 5, and 6 depends on the distortion of the dice, we can assume that the appearance of the function ( function that represents the probability that the dice will turn up) depending on the parameters (in this case, the skewness of the dice).

    Gaussian process regression is analyzed using correlation coefficients between data, so algorithms using kernel methods are used, algorithms using MCMC combined with Bayesian analytical methods, etc. are applied. The tools used for these analyses are open source in various languages such as Matlab, Python, R, and Clojure. In this article, we will discuss the approach in Clojure.

    GPy – A Framework for Gaussian Processes Using Python

    GPy – A Framework for Gaussian Processes Using Python. In this article, I will describe a framework for Gaussian processes using Python. There are two types of Python frameworks: one is based on the general-purpose scikit-learn framework, and the other is a dedicated framework, GPy. GPy is more versatile than scikit-learn, so we will focus on GPy in this article.

    Implementation of a Bayesian optimization tool using Clojure

    Implementation of a Bayesian optimization tool using Clojure. Bayesian optimization is an applied technology that fully utilizes the characteristics of Gaussian regression processes, which can make probabilistic predictions based on a small number of samples and a minimal number of processes.

    Specific examples include the sequential extraction of the optimal combination of experimental parameters to be used next while conducting experiments in experimental design for medicine, chemistry, materials research, etc., the sequential optimization of hyper-parameters in machine learning while rotating the learning/evaluation cycle, and the optimization of functions by matching parts in the manufacturing industry. It is a technology that can be used in a wide range of applications, such as in the optimization of functions by matching parts in the manufacturing industry.

    nonparametric Bayesian theory

    Machine Learning Professional Series – Nonparametric Bayesian Point Processes and the Mathematics of Statistical Machine Learning Reading Notes

    Machine Learning Professional Series – Nonparametric Bayesian Point Processes and the Mathematics of Statistical Machine Learning Reading Notes. Now, open the door to infinite dimensions!
    The book clearly explains the basics of probability distributions and their application to time series data and sparse modeling. The book is kindly designed to carefully explain the theoretical background of measurement theory from the basics as well. Written by an up-and-coming ace researcher with a lot to offer. A must-have for all Bayesians!

    Mathematical Overview of Nonparametric Bayesian Point Processes and Statistical Machine Learning

    Mathematical Overview of Nonparametric Bayesian Point Processes and Statistical Machine Learning. Nonparametric Bayesian techniques will be one of the “old and new” techniques that have already been theoretically perfected in the 1970s. This technology is still used in various fields more than 40 years apart, and the characteristics of the technology are (1) flexibility and breadth of modeling to represent phenomena and (2) development of algorithms to efficiently explore vast modeling spaces.

    Nonparametric Bayesian models are, in a word, stochastic models in “infinite dimensional” space, and they are also modern search algorithms represented by Markov chain Monte Carlo methods that can efficiently compute them. Its applications include clustering with flexible generative models, structural change estimation with statistical models, and applications in factor analysis and sparse modeling.

    On the various probability distributions used in stochastic generative models

    On the various probability distributions used in stochastic generative models. Overview of various stochastic models used as approximations of stochastic generative models (Student’s t distribution, Wishart distribution, Gaussian distribution, gamma distribution, inverse gamma distribution, Dirichlet distribution, beta distribution, categorical distribution, Poisson distribution, Bernoulli distribution)

    Overview of Stochastic Generative Models and Learning

    Overview of Stochastic Generative Models and Learning. A stochastic generative model is a mathematical model in which the generative process of data is represented by a stochastic model. In this article, we will describe the representation of the process of data generation used in stochastic generative models and statistical learning as an estimation problem for generative models.

    Overview of Bayesian Estimation with Concrete Examples

    Overview of Bayesian Estimation with Concrete Examples. Calculate the fundamentals of Bayesian estimation (exchangeability, de Finetti’s theorem, conjugate prior distribution, posterior distribution, marginal likelihood, etc.) used in stochastic generative models based on concrete examples (Dirichlet-multinomial distribution model, gamma-gaussian distribution model).

    Comparison of clustering using k-means and Bayesian estimation methods (mixed Gaussian model)

    Comparison of clustering using k-means and Bayesian estimation methods (mixed Gaussian model). In this article, we will discuss clustering by finite mixture models as a preparation for nonparametric Bayesian models. Clustering is a data mining technique that classifies similar data into identical classes. Clustering is the most basic application of nonparametric Bayesian models. We describe the K-means algorithm, which is a representative method for clustering, and describe its Bayesian model, the finite mixture Gaussian model.

    Non-parametric Bayesian and clustering(1) Dirichlet distribution and infinite mixture Gaussian model

    Non-parametric Bayesian and clustering(1) Dirichlet distribution and infinite mixture Gaussian model. In this article, we will discuss the extension of the Dirichlet distribution to dimensionless as an introduction to the nonparametric Bayesian model. The emphasis here is on intuitive understanding; more mathematical details will be discussed later.

    The Dirichlet process mixture model plays a central role in the nonparametric Bayesian model. Dirichlet process mixture models are also called infinite mixture models because they can be viewed as dimensionless extensions of finite mixture models.

    Why is such an infinite mixture model needed in the first place? In general clustering, it is necessary to determine the number of classes K in advance, which is equivalent to determining the number of dimensions of the Dirichlet distribution in advance as a prior distribution.

    However, in real-life problems, it is often difficult to know how to set the dimension of the Dirichlet distribution, and when the number of data changes dynamically, the number of clusters K may also need to change dynamically.

    Non-parametric Bayesian and clustering(2) Stochastic model of partitioning and Dirichlet processes

    Non-parametric Bayesian and clustering(2) Stochastic model of partitioning and Dirichlet processes. The Chinese Restaurant Process (CRP), a stochastic model of nonparametric Bayesian splitting, the Direchlet process behind the CRP, and the estimation of the concentration parameter α. Other technical topics include the stick-breaking process (SBP) and sequential Monte Carlo methods.

    Application of Nonparametric Bayesian Structural Change Estimation

    Application of Nonparametric Bayesian Structural Change Estimation. In this article, we will discuss structural change estimation of time series data as an application of nonparametric Bayesian models. One of the problems in the analysis of time series data is the estimation of changes in the structure of the data. The problem of analyzing changes in the properties of data is an important topic that has been extensively studied as change checking. Here, we describe a method using a statistical model such as the Dirichlet process.

    The basic idea is to assume that each data is generated from multiple models with a certain probability, and to estimate structural changes in the data by estimating changes in the generation process over time.

    Nonparametric Bayesian Applications to Factor Analysis and Sparse Modeling

    Nonparametric Bayesian Applications to Factor Analysis and Sparse Modeling. In this article, we will discuss nonparametric Bayes in factor analysis and sparse modeling. Here, we will focus on beta processes, another stochastic process that constitutes a nonparametric Bayesian model.

    The approach is to consider an infinite dimensional binary matrix generating process using the beta-Bernoulli distribution model. The specific algorithm is called the Indian buffet process (IBP). These are computed using Gibbs sampling.

    Fundamentals of Measure Theory for Nonparametric Bayesian Theory

    Fundamentals of Measure Theory for Nonparametric Bayesian Theory. This section describes the fundamentals of measurement theory as the basic theory for nonparametric Bayesian models. σ-additive families, Lebesgue measures, and Lebesgue integrals.

    Nonparametric Bayesian from the Viewpoint of Point Processes – Poisson Processes and Gamma Processes

    Nonparametric Bayesian from the Viewpoint of Point Processes – Poisson Processes and Gamma Processes. The establishment process that constitutes a nonparametric Bayesian model can be viewed in a unified framework called a point process. The development is a statistical model about a set of “points” that abstracts discrete events and some quantity that each point possesses. It is useful for analyzing the stochastic mechanisms of “point” arrangements on time scales, planes, and more generally in space. Here is an overview of point processes. (additive process, Poisson random measure, gamma random measure, discreteness, Laplace functional, point process)

    Nonparametric Bayes from the viewpoint of point processes – Gamma and Dirichlet processes

    Nonparametric Bayes from the viewpoint of point processes – Gamma and Dirichlet processes. In this section, we discuss the regularized gamma process, which is a regularization of the gamma process. The regularized gamma process is closely related to the Dirichlet process.

    Poisson random measure and gamma random speed can be understood in a unified way by the concept of complete random measure.

    Gaussian process theory

      Content describing the relationship between Gaussian processes and deep learning

      Content describing the relationship between Gaussian processes and deep learning. The paper 1711.00165] Deep Neural Networks as Gaussian Processes by researchers at Google Brain is introduced, as well as reference sites for Gaussian processes in general and Bayesian deep learning in particular.

      Machine Learning Professional Series – Gaussian Processes and Machine Learning Reading Notes

      Machine Learning Professional Series – Gaussian Processes and Machine Learning Reading Notes

      Gaussian Processes and Machine Learning – Introduction

      Gaussian Processes and Machine Learning – Introduction. Machine learning will be the machine’s computation and estimation of f(x), the output y(=f(x)) for input x, from real x and y. The simplest approach is the analytical one, which minimizes the error e by assuming y-f(x) to be the error e. However, this approach has the problem that the computation becomes more difficult as f(x) becomes more complex with more parameters. On the other hand, the approach in which the parameters of f(x) are considered random variables with probability distributions, and their prior probabilities (the state in which the estimated values of the parameters are unknown) and posterior probabilities (the state in which the range of estimated values is known based on actual data) are assumed, and the stochastic variables (parameters) are estimated by Bayesian estimation This is called a “qualitative” approach.

      The Gaussian process takes this stochastic approach further, making the selection range of f(x) flexible and considering “any function with a certain degree of smoothness (Gaussian process regression)” to obtain the probability distribution of the parameters of those functions by Bayesian estimation. A Gaussian process can be thought of as a “box that pops up a function f() when shaken,” and a cloud of posterior functions can be obtained by fitting this box to real data.

      Linear Regression Model

      Linear Regression Model

      Gaussian distribution

      Gaussian distribution

      Overview of Gaussian Processes (1) Gaussian Processes and Kernel Tricks

      Overview of Gaussian Processes (1) Gaussian Processes and Kernel Tricks. A Gaussian process can be viewed as an integral elimination of the weights of a linear regression model and can be thought of as a dimensionless Gaussian distribution. However, since the data are always finite, the Gaussian process is actually just a finite-dimensional multivariate Gaussian distribution. A Gaussian process is also a probability distribution that generates random functions.

      In this article, we will give an overview of Gaussian processes and their relationship to the kernel trick.

      Overview of Gaussian Processes (2) Gaussian processes and kernels

      Overview of Gaussian Processes (2) Gaussian processes and kernels. In this article, we will discuss the relationship between the kernels of Gaussian processes and the basis functions of linear models, and the cases of various kernel functions (Mattern kernel, character kernel, Fisher kernel, HMM’s peripheralized kernel, linear kernel, exponential kernel, periodic kernel, RBF kernel), respectively.

      Overview of Gaussian Processes(3) Gaussian Process Regression Model

      Overview of Gaussian Processes(3) Gaussian Process Regression Model. Previously, we discussed the derivation of Gaussian processes and their properties. From now on, we will discuss how regression problems can be solved based on Gaussian processes. Specifically, we will discuss the calculation of the predictive distribution of a regression problem in the case of a single predictor and in the case of multiple predictors, and we will also discuss the relationship between the predictive distribution and a neural network.

      Overview of Gaussian Processes(4) Hyperparameter Estimation of Gaussian Process Regression

      Overview of Gaussian Processes(4) Hyperparameter Estimation of Gaussian Process Regression. In the previous examples, the hyperparameters of the kernel were given by hand as θ1=1, θ2=0.4, and θ3=0.1. How can we estimate these parameters? If we put the hyperparameters together as θ=(θ1, θ2, θ3), the kernel depends on θ, so the kernel matrix K calculated from k also depends on θ and is Kθ.

      These can be optimized using the gradient descent method, SCG method, L-BFG gun, etc.

      Overview of Gaussian Processes(5) Generalization of Gaussian Process Regression

      Overview of Gaussian Processes(5) Generalization of Gaussian Process Regression. In this article, we will discuss the Cauchy distribution of Gaussian processes as a generalization of Gaussian processes, including robustness collateral, Gaussian process identification models, and Poisson distributions for machine breakdowns and decay of elementary particles.

      Stochastic Generative Models and Gaussian Processes(1) Basis of Stochastic Models

      Stochastic Generative Models and Gaussian Processes(1) Basis of Stochastic Models. The hypothesis that observation Y in the real world is obtained by sampling Y~p(Y) from some establishment distribution p(Y) is called a stochastic generative model of observation Y. In this section, we will discuss the basics of stochastic models (independence, conditional independence, simultaneous probability, marginalization and graphical models) as an approach to stochastic generative modeling.

      Stochastic Generative Models and Gaussian Processes(2)Maximum Likelihood Estimation and Bayesian Estimation

      Stochastic Generative Models and Gaussian Processes(2)Maximum Likelihood Estimation and Bayesian Estimation. The hypothesis that “observation Y in the real world is obtained by sampling Y~p(Y) from some probability distribution p(Y)” is called a probabilistic generative model or probabilistic model of observation Y. The probabilistic generative model is a hypothesis.

      It is important to note that a probabilistic generative model is a hypothesis. Because it is a hypothesis, there is no guarantee that it is true. Probabilistic generative models may be used to explain the win-loss record data not only in mahjong and backgammon games, which include a probabilistic component, but also in fully deterministic games such as Go and Shogi, which should not include a probabilistic component. Since a probabilistic generative model is a hypothesis, it does not matter if there are multiple hypotheses p(Y)=p1(Y), p(Y)=p2(Y), … for the same object.

      Furthermore, the difference between hypotheses may be represented by the parameter θ, and the stochastic generative model may be represented by the conditional probability p(Y|θ). In this case, the parametric conditional probability p(Y|θ) is called a parametric model. In a parametric stochastic generative model, determining the parameter θ based on the observation Y is called estimation of the parameter.

      Maximum likelihood estimation refers to the method of determining the parameter θ of a stochastic generative model p(Y|θ) with parameters so that the likelihood function L(θ)=p(Y|θ) of the observed data is maximized. Bayesian estimation is a method of estimating the number of variables in a sample.

      Bayesian estimation considers that “what we want to know (unknown parameter θ) is a random variable. In Bayesian estimation, “estimation” of the unknown parameter θ means updating the distribution of the random variable θ by obtaining the observation X. (There are various ways of expressing the distribution of a random variable, including distribution function, cumulative distribution function, and probability density. Here, unless otherwise noted, we equate the “distribution of a random variable” with the “probability density function” that represents it).

      Stochastic Generative Models and Gaussian Processes(3) Representation of Probability Distributions

      Stochastic Generative Models and Gaussian Processes(3) Representation of Probability Distributions. The primary goal of Bayesian estimation is to find the posterior probability distribution of unknown values. What does it mean to find the probability distribution? We will discuss the following. What exactly do we need to do to input data into a computer and compute the posterior probability? We will consider the following.

      There are two main methods for expressing the probability distribution of a random variable numerically on a computer, which are called parametric and nonparametric methods, respectively. These methods include weighted sampling, kernel density estimation, and distribution estimation using neural networks.

      Calculation of Gaussian processes (1) Calculation by the auxiliary variable method

      Calculation of Gaussian processes (1) Calculation by the auxiliary variable method. When the number of data points N is large, the calculation of Gaussian processes is bottlenecked by the computational cost of obtaining the kernel matrix and its inverse. This has led many people to believe that the Gaussian process method is theoretically interesting but not practical. However, there exist approaches that can significantly save computational cost through various innovations. Here, we describe a contrivance called the “auxiliary variable method,” which intersperses hidden variables that are not directly observed, and their developed forms.

      Calculation of Gaussian processes (2)Variational Bayesian Method and Stochastic Gradient Method

      Calculation of Gaussian processes (2)Variational Bayesian Method and Stochastic Gradient Method. In this article, we derive a stochastic gradient method algorithm based on the Variable Bayesian method (VB method).

      The variational Bayesian method is a method to attribute Bayesian estimation problems with complex hierarchical structures of unknown hidden variables and parameters to numerical optimization problems. The stochastic gradient method is an algorithm for solving numerical optimization problems by sequential updating of parameters, and is an essential method for efficient parameter determination for complex parametric models such as neural networks based on huge data. It can further improve the computational efficiency of the auxiliary variable method, especially when one wants to optimize the hyperparameters θ in the auxiliary variable method of Gaussian process regression models when the number of data N is large.

      Calculation of Gaussian processes (3) Gaussian Process Method Calculation Based on a Grid Arrangement of Auxiliary Points

      Calculation of Gaussian processes (3) Gaussian Process Method Calculation Based on a Grid Arrangement of Auxiliary Points. In this article, we will discuss a method to significantly reduce the computational cost of Gaussian processes by arranging auxiliary input points in a regular grid pattern. In the auxiliary variable method for Gaussian processes, there are cases in which the number of auxiliary input points must be set large, but if the total number of auxiliary input points M is increased, the computational efficiency of the auxiliary variable method, which assumes that M is small, is lost.

      In such cases, the calculation may be made more efficient even if M is large (even if M>N) by applying the device of lattice auxiliary input points. Three ideas, the Kronecker method, the Teblitz method, and local kernel interpolation, and the KISS-GP method, which summarizes them, are described below.

      Spatial statistics of Gaussian processes, with application to Bayesian optimization

      Spatial statistics of Gaussian processes, with application to Bayesian optimization. In this article, I will describe an approach to the problems of spatial statistics and Bayesian optimization as an example of a real-world application that effectively uses the characteristics of Gaussian processes.

      Specifically, I describe how to use a combination of ARD (a device that automatically performs dimensionality selection) and Matern kernels (a device that provides a diversity of covariance functions) to create a model of a function whose shape is unknown (black box model) in Gaussian regression. The combination of ARD and Matern kernel deserves comparative consideration not only for Bayesian optimization but also for any subject that requires a black box model with sufficient flexibility.

      Unsupervised Learning with Gaussian Processes (1)Overview and Algorithm of Gaussian Process Latent Variable Models

      Unsupervised Learning with Gaussian Processes (1)Overview and Algorithm of Gaussian Process Latent Variable Models. In this article, we will discuss unsupervised learning using Gaussian processes. By using a Gaussian process, the mapping from X to Y can be made nonlinear in a latent model in which the observed value Y is generated from the latent variable X. At the same time, the problem can be defined mathematically and prospectively. This is equivalent to considering the latent variable model with neural networks more mathematically. In addition, sampling of latent variables associated with Gaussian processes is also discussed.

      Specifically, the Gaussian Process Latent Variable Model (GPLVM) and the Bayesian Gaussian Process Latent Variable Model (Bayesian GPLVM) are described.

      Unsupervised Learning with Gaussian Processes(2) Extension of Gaussian Process Latent Variable Model

      Unsupervised Learning with Gaussian Processes(2) Extension of Gaussian Process Latent Variable Model. GPLVM assumed that the latent coordinates Z=(x1,…,xN) of the observed data are independent of each other and that the prior distribution of X is \(p(\mathbf{X})=\prod_{n=1}^N p(\mathbf{x}_n)\) as in equation (5). However, in practice, it is considered that X often has a cluster structure or is related as a time series. In this case, if we design an appropriate probability model for p(X), we can learn the structure in the latent space hidden behind the observed values.

      Specific extended models discussed are the infinite warp mixture model, the Gaussian process dynamics model, the Poisson point process, the log Gaussian Cox process, the latent Gaussian process, and elliptic slice sampling.

      Gaussian Processes Miscellany – The Advantages of Function Clouds and Their Relationship to Regression Models, Kernel Methods, and Physical Models

      Gaussian Processes Miscellany – The Advantages of Function Clouds and Their Relationship to Regression Models, Kernel Methods, and Physical Models. A Gaussian process is like a box that randomly outputs a function form, and such a box is called a stochastic process. There are three advantages of obtaining a cloud of functions as a stochastic process by a Gaussian process. (1) When a cloud of functions is obtained, the degree of unknowability is known. (2) When a function cloud is obtained, we can tell the difference between confident and unconfident regions. (3) When the cloud of functions is obtained, model selection and feature selection can be performed.There is a close relationship between regression models and correlation coefficients. In the kernel method, which has a close relationship with Gaussian processes, methods such as HSIC (Hilbert-Schmit Independence Criterion) using the kernel method have recently been proposed against this background, and it has been shown that nonlinear correlations that are not captured by the usual product factor correlation coefficient can be measured with high precision. The kernel method can be used to measure real-valued and realistic correlations. Another advantage of kernel methods is that they can be applied not only to real values and their vectors, but also to general data structures such as strings, graphs, and trees, and thus correlations between them can be captured. These kernels can also be used for Gaussian processes.Stochastic processes and Gaussian processes were originally developed as models for describing stochastically fluctuating processes of physical motion. Therefore, it is natural to think of Gaussian processes and particles in Brownian motion.

      Equivalence between Neural Networks (Deep Learning) and Gaussian Processes

      Equivalence between Neural Networks (Deep Learning) and Gaussian Processes. In “Bayesian Learning for Neural Networks,” Neal showed that a one-layer neural network is equivalent to a Gaussian process in the limit of hidden layers → ∞. Therefore, by considering a Gaussian process instead of a neural network, optimization of multiple weights in a neural network becomes unnecessary, and the predictive distribution can be obtained analytically. In addition, Gaussian processes have a natural structure as a stochastic model, and unlike neural networks, which cannot predict what will be learned, Gaussian processes can express prior knowledge about problems through kernel functions and can prospectively handle objects that cannot be trivially vectorized, such as time series and graphs. It has an advantage.

      Probabilistic Programming with Clojure

      Probabilistic Programming with Clojure. Stan, BUSGS, etc., which were previously described in probabilistic generative models such as Bayesian models, are also called Probabilistic Programming (PP). PP is a programming paradigm in which probabilistic models are specified in some form and inference on those models is automatically performed. Their purpose is to integrate probabilistic modeling and general-purpose programming to build systems combined with various AI techniques for various uncertain information, such as stock price prediction, movie recommendation, computer diagnostics, cyber intrusion detection, and image detection.

      In this article, we describe our approach to this probabilistic programming in Clojure.

      コメント

      Exit mobile version
      タイトルとURLをコピーしました