Probabilistic approaches in machine learning

Machine Learning Artificial Intelligence Digital Transformation Natural Language Processing Deep Learning Navigation of this blog

Implementation

Uncertainty (Uncertainty) refers to a state of uncertainty or information in which future events or outcomes are difficult to predict, caused by the limitations of our knowledge or information, and represents a state in which it is difficult to have complete information or certainty. Mathematical methods and models, such as probability theory and statistics, are used to deal with uncertainty. These methods are important tools for quantifying uncertainty and minimizing risk.

This section describes probability theory and various implementations for handling this uncertainty.

Negative Log-Likelihood (NLL) is a loss function for optimising the parameters of models in statistics and machine learning, especially those often used in models based on probability distributions (such as classification models). It is a measure of a model’s performance based on the probability that the observed data were predicted by the model, and its purpose is to optimise the parameters of the model so that the model can explain the observed data with a high probability.

Contrastive Divergence (CD) is a learning algorithm mainly used for training Restricted Boltzmann Machines (RBM), a generative model for modelling the probability distribution of data, and CD is a method for efficiently learning its parameters.

Noise Contrastive Estimation (NCE) is a method for estimating the parameters of a probabilistic model, and is a particularly effective approach for processing large data sets and high-dimensional data. Use.

Bayesian inference is a method of statistical inference based on a probabilistic framework and is a machine learning technique for dealing with uncertainty. The objective of Bayesian inference is to estimate the probability distribution of unknown parameters by combining data and prior knowledge (prior distribution). This paper provides an overview of Bayesian estimation, its applications, and various implementations.

Bayesian network inference is the process of finding the posterior distribution based on Bayes’ theorem, and there are several types of major inference algorithms. The following is a description of typical Bayesian network inference algorithms.

Forward Inference in Bayesian networks (Forward Inference) is a method for calculating the posterior distribution of variables and nodes in a network based on known information. Bayesian networks are probabilistic graphical models and are used to represent dependencies between variables. Forward Inference calculates the posterior distribution of the variable of interest through the propagation of information in the network.

  • Overview of Bayesian Multivariate Statistical Modeling and Examples of Algorithms and Implementations

Bayesian multivariate statistical modeling is a method of simultaneously modeling multiple variables (multivariates) using a Bayesian statistical framework, which allows the method to capture the probabilistic structure and account for uncertainty with respect to the observed data. Multivariate statistical modeling is used to address issues such as data correlation, covariance structure, and outlier detection.

Integration of inference and action using Bayesian networks is a method in which agents use probabilistic models to select the most appropriate action while interacting with the environment, and Bayesian networks are a useful approach for representing dependencies between events and handling uncertainty. In this section, the Partially Observed Markov Decision Process (POMDP) is described as an example of an algorithm based on the integration of inference and action using Bayesian networks.

Kullback-Leibler Variational Estimation is a method for estimating approximate probabilistic models of data by evaluating and minimizing differences between probability distributions. It is widely used in the context of Its main applications are as follows.

Bayesian deep learning refers to an attempt to incorporate the principles of Bayesian statistics into deep learning. In ordinary deep learning, model parameters are treated as non-probabilistic values, and optimization algorithms are used to find optimal parameters. This is called “Bayesian deep learning”. For more information on the application of uncertainty to machine learning, please refer to “Uncertainty and Machine Learning Techniques” and “Overview of Statistical Learning Theory (Non-Equationary Explanation).

Generalized Linear Model (GLM) is one of the statistical modeling and machine learning methods used for stochastic modeling of the relationship between response variables (objective variables) and explanatory variables (features). This section provides an overview of this generalized linear model and its implementation in various languages (python, R, and Clojure).

Maximum Likelihood Estimation (MLE) is an estimation method used in statistics. This method is used to estimate the parameters of a model based on given data or observations. Maximum likelihood estimation attempts to maximize the probability that data will be observed when the values of the parameters are changed. This section provides an overview of this maximum likelihood estimation method, its algorithm, and an example implementation in python.

The EM algorithm (Expectation-Maximization Algorithm) is an iterative optimization algorithm widely used in statistical estimation and machine learning. In particular, it is often used for parameter estimation of stochastic models with latent variables.

Here, we provide an overview of the EM algorithm, the flow of applying the EM algorithm to mixed models, HMMs, missing value estimation, and rating prediction, respectively, and an example implementation in python.

The EM (Expectation Maximization) algorithm can also be used as a method for solving the Constraint Satisfaction Problem. This approach is particularly useful when there is incomplete information, such as missing or incomplete data. This paper describes various applications of the constraint satisfaction problem using the EM algorithm and its implementation in python.

HMM is a type of probabilistic model used to represent the process of generating a series of observations, and is widely used for modeling series data and time series data in particular. The hidden state represents the latent state behind the series data, which is not directly observed, while the observation results are the data that can be directly observed and generated from the hidden state.

This section describes various algorithms and practical examples of HMMs, as well as a concrete implementation in python.

The Gelman-Rubin statistic (or Gelman-Rubin diagnostic, Gelman-Rubin statistical test) is a statistical method for diagnosing convergence of Markov chain Monte Carlo (MCMC) sampling methods, particularly when MCMC sampling is done with multiple chains, where each chain will be used to evaluate whether they are sampled from the same distribution. This technique is often used in the context of Bayesian statistics. Specifically, the Gelman-Rubin statistic evaluates the ratio between the variability of samples from multiple MCMC chains and the variability within each chain, and this ratio will be close to 1 if statistical convergence is achieved.

The Fisher information matrix is a concept used in statistics and information theory to provide information about probability distributions. This matrix is used to provide information about the parameters of a statistical model and to evaluate its accuracy. Specifically, it contains information about the expected value of the derivative of the probability density function (or probability mass function) with respect to its parameters.

  • Stochastic Gradient Langevin Dynamics (SGLD) Overview, Algorithm and Implementation Examples

Stochastic Gradient Langevin Dynamics (SGLD) is a stochastic optimization algorithm that combines stochastic gradient and Monte Carlo methods. SGLD is widely used in Bayesian machine learning and Bayesian statistical modeling to estimate the posterior distribution.

  • Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) Overview, Algorithm, and Implementation Examples

Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) is a type of Hamiltonian Monte Carlo (HMC), which is a stochastic sampling method combined with a stochastic gradient method and is used to estimate the posterior distribution of large data sets and high-dimensional parameter spaces. data sets and high-dimensional parameter space, making it suitable for Bayesian statistical inference.

NUTS (No-U-Turn Sampler) is a type of Hamiltonian Monte Carlo (HMC) method, which is an efficient algorithm for sampling from probability distributions, as described in “MCMC Method for Stochastic Integral Calculations: Algorithms other than Metropolis Method (HMC Method)”. HMC is based on the Hamiltonian dynamics of physics and is a type of Markov chain Monte Carlo method. NUTS improves on the HMC method by automatically selecting the appropriate step size and sampling direction to achieve efficient sampling.

  • Overview of Constraint-Based Structural Learning and Examples of Algorithms and Implementations

Constraint-based structural learning is a method of learning models by introducing specific structural constraints in graphical models (e.g., Bayesian networks, Markov random fields, etc.), an approach that allows prior knowledge and domain knowledge to be incorporated into the model.

  • BIC, BDe, and other score-based structural learning

Score-based structural learning methods such as BIC (Bayesian Information Criterion) and BDe (Bayesian Data Information Criterion) will be those used to evaluate the goodness of a model by combining the complexity of the statistical model and the goodness of fit of the data to select the optimal model structure. These methods are mainly based on Bayesian statistics and are widely used as information criteria for model selection.

  • Bayesian Network Sampling (Sampling)

Bayesian network sampling models the stochastic behavior of unknown variables and parameters through the generation of random samples from the posterior distribution. Sampling is an important method in Bayesian statistics and probabilistic programming, and is used to estimate the posterior distribution of a Bayesian network and to evaluate uncertainty. It is an important method in Bayesian statistics and probabilistic programming, and is used to estimate the posterior distribution of Bayesian networks and to evaluate certainty.

  • Variational Bayesian Analysis of Dynamic Bayesian Networks

A dynamic Bayesian network (DBN) is a type of Bayesian network for modeling uncertainty that changes over time. The variational Bayesian method is a statistical method for inference of complex probabilistic models, which allows estimating the posterior distribution based on uncertain information.

  • Overview of Variational Autoencoder Bayes (Variational Autoencoder, VAE) and Examples of Algorithms and Implementations

Variational Autoencoder (VAE) is a type of generative model and a neural network architecture for learning latent representations of data. The VAE learns latent representations by modeling the probability distribution of the data and sampling from it. An overview of VAE is given below.

  • Black-Box Variational Inference (BBVI) Overview, Algorithm, and Implementation Examples

Black-Box Variational Inference (BBVI) is a type of variational inference method for approximating the posterior distribution of complex probabilistic models in probabilistic programming and Bayesian statistical modeling. BBVI is called “Black-Box” because the probability model to be inferred is treated as a black box and can be applied independently of the internal structure of the model itself and the form of the likelihood function. BBVI is a method that can be used for inference without knowing the internal structure of the model.

Bayesian Structural Time Series Model (BSTS) is a type of statistical model that models phenomena that change over time and is used for forecasting and causal inference. This section provides an overview of BSTS and its various applications and implementations.

Simulation involves modeling a real-world system or process and executing it virtually on a computer. Simulations are used in a variety of domains, such as physical phenomena, economic models, traffic flows, and climate patterns, and can be built in steps that include defining the model, setting initial conditions, changing parameters, running the simulation, and analyzing the results. Simulation and machine learning are different approaches, but they can interact in various ways depending on their purpose and role.

This section describes examples of adaptations and various implementations of this combination of simulation and machine learning.

  • Differences between Hidden Markov Models and State Space Models

The Hidden Markov Model (HMM) described in “Overview of Hidden Markov Models, Various Applications and Implementation Examples” and the State Space Model (SSM) described in “Overview of State Space Models and Implementation Examples for Analysing Time Series Data Using R and Python” are statistical models used for modelling time-varying and serial data, but with different approaches. Model, SSM) are statistical models used for modelling temporal changes and series data, but with different approaches. The main differences between them are described below.

  • Overview of Dynamic Bayesian Networks (DBN) and Examples of Algorithms and Implementations

Dynamic Bayesian Network (DBN) is a type of Bayesian Network (BN), which is a type of probabilistic graphical model used for modeling time-varying and serial data. DBN is a powerful tool for time series and dynamic data and has been applied in various fields.

Dynamic Graph Neural Networks (D-GNN) are a type of Graph Neural Networks (GNN) designed to deal with dynamic graph data, where nodes and edges change with time. It is designed to handle data in which nodes and edges change over time. (For more information on GNNs, see “Graph Neural Networks: Overview, Applications, and Example Python Implementations. The approach has been used in a variety of domains including time series data, social network data, traffic network data, and biological network data.

A graph neural network (GNN) is a type of neural network for data with a graph structure. ) to express relationships between elements. Examples of graph-structured data include social networks, road networks, chemical molecular structures, and knowledge graphs.

This section provides an overview of GNNs and various examples and Python implementations.

Graph Convolutional Neural Networks (GCN) is a type of neural network that enables convolutional operations on data with a graph structure. While regular convolutional neural networks (CNNs) are effective for lattice-like data such as image data, GCNs were developed as a deep learning method for non-lattice-like data with very complex structures, such as graph data and network data.

ChebNet (Chebyshev network) is a type of Graph Neural Network (GNN), which is one of the main methods for performing convolution operations on graph-structured data. ChebNet is an approximate implementation of convolution operations on graphs using Chebyshev polynomials, which are used in signal processing.

Graph Attention Network (GAT) is a deep learning model that uses an attention mechanism to learn the representation of nodes in a graph structure. GAT is a model that uses a set of mechanisms to learn the representation of a node.

Graph Isomorphism Network (GIN) is a neural network model for learning isomorphism of graph structures. The graph isomorphism problem is the problem of determining whether two graphs have the same structure, and is an important approach in many fields.

GraphSAGE (Graph Sample and Aggregated Embeddings) is one of the graph embedding algorithms for learning node embeddings (vector representation) from graph data. By sampling and aggregating the local neighborhood information of nodes, it effectively learns the embedding of each node. This approach makes it possible to obtain high-performance embeddings for large graphs.

Bayesian neural networks (BNNs) are architectures that integrate probabilistic elements into neural networks, whereas regular neural networks are deterministic, BNNs build probabilistic models based on Bayesian statistics. This allows the model to account for uncertainty and has been applied in a variety of machine learning tasks.

Stan, BUSGS, etc., which were previously described in probabilistic generative models such as Bayesian models, are also called Probabilistic Programming (PP). PP is a programming paradigm in which probabilistic models are specified in some form and inference on those models is automatically performed. Their purpose is to integrate probabilistic modeling and general-purpose programming to build systems combined with various AI techniques for various uncertain information, such as stock price prediction, movie recommendation, computer diagnostics, cyber intrusion detection, and image detection.

In this article, we describe our approach to this probabilistic programming in Clojure.

In this article, I will describe a framework for Gaussian processes using Python. There are two types of Python frameworks: one is based on the general-purpose scikit-learn framework, and the other is a dedicated framework, GPy. GPy is more versatile than scikit-learn, so we will focus on GPy in this article.

  • Implementation of Gaussian Processes in ClojureA Gaussian process is like a box (stochastic process) that randomly outputs a function form. For example, if we consider that the process of dice generating the natural numbers 1, 2, 3, 4, 5, and 6 depends on the distortion of the dice, we can assume that the appearance of the function ( function that represents the probability that the dice will turn up) depending on the parameters (in this case, the skewness of the dice).Gaussian process regression is analyzed using correlation coefficients between data, so algorithms using kernel methods are used, algorithms using MCMC combined with Bayesian analytical methods, etc. are applied. The tools used for these analyses are open source in various languages such as Matlab, Python, R, and Clojure. In this article, we will discuss the approach in Clojure.

Bayesian optimization is an applied technology that fully utilizes the characteristics of Gaussian regression processes, which can make probabilistic predictions based on a small number of samples and a minimal number of processes.

Specific examples include the sequential extraction of the optimal combination of experimental parameters to be used next while conducting experiments in experimental design for medicine, chemistry, materials research, etc., the sequential optimization of hyper-parameters in machine learning while rotating the learning/evaluation cycle, and the optimization of functions by matching parts in the manufacturing industry. It is a technology that can be used in a wide range of applications, such as in the optimization of functions by matching parts in the manufacturing industry.

A CRP (Chinese resturant process) is a stochastic process that describes a particular data generating process. Mathematically, this data generating process is one that, at each step, samples a new integer from the set of possible integers, with a probability proportional to the number of times that particular integer has been sampled so far, with a constant probability of sampling a new integer that has not been seen before The probability is proportional to the number of times the particular integer has been sampled so far.

In this article, we describe the implementation of this CRP using Anglican, a framework for probabilistic programming of Clojure, and its combination with a mixed Gaussian model.

In this article, we describe an implementation of the Kalman filter, one of the applications of the state-space model, in Clojure. The Kalman filter is an infinite impulse response filter used to estimate time-varying quantities (e.g., position and velocity of an object) from discrete observations with errors, and is used in a wide range of engineering fields such as radar and computer vision due to its ease of use. Specific examples of its use include integrating information with errors from device built-in accelerometers and GPS to estimate the ever-changing position of vehicles, as well as in satellite and rocket control.

The Kalman filter is a state-space model with hidden states and observation data generated from them, similar to the hidden Markov model described previously, in which the states are continuous and changes in state variables are statistically described using noise following a Gaussian distribution.

Probabilistic Generative Models Theory

A probabilistic generative model is one that considers that data in the real world is backed by a mechanism (model) that generates the data, and that the data is not generated deterministically and strictly, but is generated with a certain variability and fluctuation.

This can be expressed mathematically in the following simple definition.

<Definition of probabilistic generative model> 
Data X is generated according to 
the probability density distribution p(x).

In contrast, the principle of machine learning for probabilistic generative models is to estimate p(X) using a concrete data set X. In order to find p(X), when considering a feature m, the probability of occurrence of m is p(m), and the probability that the attribute value of the sample is x at m is p(x|m).

While general machine learning techniques as represented by deep learning.obtain a single answer from a combination of feature variables, probabilistic generative models are characterized by having multiple probabilistic answers as solutions. Therefore, by using them, we can build more complex artificial intelligence systems. In this blog, we will discuss Bayesian inference as a theoretical base for these probabilistic generative models and their applications as follows.

    Probability Statistics

    It is said that the mathematical consideration of probability began with the correspondence between Pascal and Fermat regarding betting. Classical probability theory, based on the concept of combinations, made a great leap forward into “modern mathematics” based on set theory in the 20th century, thanks to Borel and Kolmogorov. This book is written to bridge the gap between classical and modern probability theory, and explains the meaning of abstract mathematical formulas in a way that is easy for readers to understand, while providing plenty of elementary concrete examples such as playing cards and throwing dice. This is an introductory book that allows students to relearn probability in depth, as they learn it in high school mathematics.

    From Pascal and Fermat to von Neumann and Keynes. The ideas of the pioneers of probability and statistics, who took on the challenge of predicting the uncertain future by measuring “chance,” are introduced in an easy-to-understand manner in the form of a virtual dialogue between them and the author.

    While probability and statistics problems are very familiar, easy to understand, and interesting, they can be quite difficult to solve because of the number of possible correct answers. In fact, there have been cases where the great mathematicians of the time made mistakes even in problems that can be answered correctly by today’s junior high and high school students. On the other hand, there are still elegant solutions to unique problems by great mathematicians with mathematical sense. The author, an actuary and mathematical puzzle designer, presents many such seemingly mysterious problems, matters requiring clever thinking, and interesting historical episodes from his unique perspective.

    Overview of various stochastic models used as approximations of stochastic generative models (Student’s t distribution, Wishart distribution, Gaussian distribution, gamma distribution, inverse gamma distribution, Dirichlet distribution, beta distribution, categorical distribution, Poisson distribution, Bernoulli distribution)

    The Dirichlet distribution is a type of multivariate probability distribution that is mainly used for modeling the probability distribution of random variables. The Dirichlet distribution is a probability distribution that generates a vector (multidimensional vector) consisting of K non-negative real numbers.

    Bayesian Inference Overview

    The history of Bayesian statistics begins with Thomas Bayes in the 1740s. A student of theology and mathematics, Bayes attempted to mathematically clarify the existence of God = order latent in the universe at the root of causality, based on the idea at the time that God is the first cause of all things.

    While Abraham de Moivre, a mathematician who discovered de Moivre’s law, a theorem on complex numbers and trigonometric functions, solved problems related to probability by proceeding from cause to effect, Bayes attempted to solve the problem of inverse probability from effect to cause, which is the opposite direction. He tried to derive the root cause (the order of the universe = God) from what appears to be true.

    The Bayesian concept of collecting information can be expressed by multiplying the probability of each event (the probability of the information to be collected is assumed to be calculable).

    Now that we have connected information and probability, we can further express the concepts of result and cause in terms of probability. These concepts can be rephrased as the occurrence of an effect under a cause. Here, Bayes’ theorem is defined by defining conditional probability as the probability that event B will occur under the condition that event A has occurred.

    The World of Bayesian Modeling

    In the following pages of this blog, we will take a look at the modern world of Bayesian modeling from the perspective of “modeling individual differences and heterogeneity. Using examples from ecology, medicine, earth science, natural language processing, and other fields, we discuss smoothing, hierarchical models, data assimilation, various language models, and more from a Bayesian modeling perspective.

    Bayesian inference and MCMC open source software

    Bayesian statistics means that not only the data, but also the elements behind the data are generated probabilistically. This is easy to understand if you think of the image of “a device that produces dice (which generate data with a certain probability) produces dice with a certain probability of fluctuation,” as mentioned previously. In other words, it is modeling that applies a meta-probability to a probability distribution.

    In order to calculate this Bayesian modeling, it is necessary to calculate the probabilities. MCMC (Markov Chain Monte Carlo Method) is an approach to this. This is one of the algorithms for extracting samples (generating random numbers) from a multivariate probability distribution.

    When complex probability distributions are calculated as Bayesian estimation, the number of parameters to be estimated increases, and the computational cost becomes large when calculated using MCMC or other methods. To solve this problem, the “Hierarchical Bayesian” method is used, which limits the number of parameters by imposing common constraints on “similar parameters.

    In addition, when applying probability models to time-series data, it is necessary to consider autocorrelation because time-series data examine changes in a single sample, whereas general probability models consider a large number of samples based on the assumption that each sample is independent.

    One approach to Bayesian estimation using a real tool is to use Stan, a kind of random simulation that uses MCMC to generate sample data. While ordinary machine learning mainly optimizes functions to minimize the gap between the model and actual data, the modeling system optimizes the parameters of the model by simulating the parameters that were initially set, which is a strong point in cases with little training data.

    In the following pages of this blog, we will discuss the basic theory of Bayesian inference, hierarchical Bayes, modeling of time series/spatial data, and actual computation using STAN.

      Learning by Bayesian inference

      Machine Learning with Bayesian Inference and Graphical Model

      Machine learning using Bayesian inference is a statistical learning method that calculates the posterior probability distribution for an unknown variable given observed data according to Bayes’ theorem, the fundamental law of probability, and then calculates estimators for the unknown variable and predictive distributions for new data to be observed in the future based on the obtained posterior probability distribution.

      In the following pages of this blog, we describe the basic theory, implementation, and graphical model approach to this machine learning technique based on Bayesian inference.

      Markov Chain Monte Carlo (MCMC) method

      Markov Chain Monte Carlo (MCMC) is a method for generating random numbers that follow a probability distribution using a state transition model called a Markov chain. MCMC is particularly effective for high-dimensional problems such as natural language processing, and is applied to estimation of Bayesian statistical models. MCMC is particularly effective for high-dimensional problems such as natural language processing.

      MCMC methods include (1)the Metropolis-Hastings method,(2)Gibbs sampling, (3) the Hamiltonian Monte Carlo (HMC) method, and (4) Slice-Hastings method.

      In the following pages of this blog, we discuss the basic theory of the Markov chain Monte Carlo method, the specific algorithm and the implementation code.

      Variational Bayesian Learning

      Bayesian learning is a statistical learning method that calculates posterior probability distributions for unknown variables (model parameters, hidden variables, etc.) given observed data according to Bayes’ theorem, the fundamental law of probability, and then calculates estimators for the unknown variables and predictive distributions for new data to be observed in the future based on the obtained posterior probability distributions.

      In order to perform this Bayesian estimation, it is necessary to calculate the expected value of the unknown variables. This calculation cannot be performed analytically except in special cases, and numerical calculation is also difficult when the unknown variables are high-dimensional.

      Variational Bayesian learning is one of the approximation methods for this calculation. It is a method that enables expectation calculation by selecting the posterior probability distribution from a set of functions that satisfy certain constraints, and has a wide range of applications. The key to deriving a variational Bayesian learning algorithm is to find a property called conditional conjugacy for a given probability model and to design constraints according to this property.

      In the following pages of this blog, we describe the basic theory of variational Bayesian learning, as well as specific algorithms and implementation codes.

      Nonparametric Bayesian and Gaussian Processes

      Nonparametric Bayesian models, in a nutshell, are stochastic models in “infinite dimensional” space, as well as modern search algorithms, such as the Markov chain Monte Carlo method, that can efficiently compute them. Its applications include clustering with flexible generative models, structural change estimation with statistical models, and applications to factor analysis and sparse modeling.

      The Gaussian process takes the probabilistic approach a step further, making the choice of the establishment distribution function f(x) flexible and allowing “any function with a certain degree of smoothness (Gaussian process regression)” to be used to obtain the probability distribution of the parameters of these functions by Bayesian estimation. A Gaussian process can be thought of as a box that, when shaken, produces the function f(), and by fitting this box to real data, a cloud of posterior functions can be obtained.

      The following pages of this blog describe the theory and implementation of this nonparametric Bayesian model and Gaussian process.

      Bayesian Model Applications

      Topic Model

      A topic model is a model for extracting what each document has and what topics are being discussed in a large set of document data. By using this technology, it is possible to find documents with similar topics and to organize documents based on topics, which can be used for search and other solutions.

      This topic model has been applied not only to the analysis of document data, but also to image processing, recommendation systems, social network analysis, bioinformatics, music information processing, and many other fields. This is due to the fact that information such as images, purchase histories, and social networks have a hidden structure similar to that of documents.

      For example, in a political article, the words “parliament”, “bill”, and “prime minister” tend to appear in the same sentence, while in a sports article, the words “stadium”, “player”, and “goal” appear. In the case of images, if there is a picture of a kitchen knife, there is a high probability that there is also a picture of a cutting board, and in the case of purchase history, people with similar interests buy similar products, and in social networks, people with similar interests tend to become friends.

      In the topic model, such tendencies are expressed using a model of probability. By using a probability model, uncertainty can be handled, and essential information can be extracted from data that contains noise. In addition, since various types of information can be handled within the framework of probability, many extensions of topic models that integrate various types of information have been proposed, and their usefulness has been confirmed.

      In the following pages of this blog, I discuss the basic theory and various applications of this topic model.

      Other Technical Topics

      Deep learning, which excels at analyzing large-scale, complex models, and probabilistic generative models based on probabilistic calculations, which actively introduce knowledge and structure that can be assumed in the data through the process of modeling and demonstrate their strength in cases where “all necessary data is not available,” such as missing data or undetermined values, have each developed independently.

      In the process, deep learning has mainly focused on the development of scalable models that can learn large amounts of data and on the improvement of prediction accuracy, while the evaluation of the interpretability and reliability of the basis for the prediction results has taken a back seat.

      It is quite natural for the two to complement each other’s weaknesses, and in deep learning, complex models have been devised for target data such as images and natural language, while in Bayesian estimation, efforts are being made to improve scalability by introducing techniques such as stochastic gradient, which was used in deep learning. The These are about to be connected within the larger theme of “designing large and complex models and efficient probability computation.

      Bayesian networks are a modeling technique that represents causal relationships (strictly speaking, probabilistic dependencies) between various events in a graph structure. Bayesian networks can be used in failure diagnosis, weather prediction, medical decision support, marketing, recommendation systems, or real-time processing of train congestion, disasters such as tsunami and earthquake, stock prices There are also Naive Bayes used in spam e-mail classification, recommendation systems, sentiment analysis, etc., speech recognition, bioinformatics, morphological analysis (natural language processing), music score tracking, partial discharge, etc., and hidden Markov models applied to the recognition of time series patterns, etc.

      Bayesian nets cannot be described together even if they are similar, and different variables require the creation of separate Bayesian nets, making it difficult to describe complex and huge models.

      In order to solve this problem, research on the automatic generation of Bayesian nets, called knowledge-based model building (KBMC), was conducted. KBMC uses predicate logic and a prolog-like declarative programming language to perform backward inference with reference to a knowledge base and to generate Bayesian nets for answering questions. A Bayesian net was dynamically generated and used to calculate probabilities and answer questions. This avoided the problem of creating huge Bayesian nets for all possible questions.

      Matching ontology attributes by naïve Bayes for ontology similarity assessment.

      A map in which each section on the map is painted with a color tone that matches the corresponding statistical values is called a cholopleth map. In some cases, the statistical values used for the coloring are based on data obtained in advance, while in other cases, the values are based on some kind of estimation from observed data. The cholopleth map is useful for visualizing the numerical values for each region in combination with geographic (spatial) information.

      In this article, we describe a method of estimation using a hierarchical Bayesian model that takes spatial correlation into account, using geographic adjacency information expressed on a map, and visualizing the results on a map as a colopres map.

      To protect privacy in location information, we describe a statistical method based on a Markov model for evaluating the security of anonymization methods.

      One of the factors that make the problem of anomaly detection difficult is that there are many variables to deal with. The naive Bayes method solves this problem with the simple idea of separating the problem by each variable (dimension of the variable).

      The value of the observed value x’ as an M-dimensional vector is the sum of the anomalies calculated for each of the M variables. This calculation does not require complicated inverse calculations, so if the assumption that the variables are independent of each other holds, it is effective in practical use. Even if the assumption is not necessarily correct, it is a useful formula for estimating the magnitude of the anomaly.

        コメント

        タイトルとURLをコピーしました