The World of Bayesian Modeling

Machine Learning Artificial Intelligence Digital Transformation Natural Language Processing Deep Learning  Probabilistic Generative Models Navigation of this blog
Overview

This presentation provides an overview of the contemporary world of Bayesian modeling from the perspective of “modeling individual differences and heterogeneity. Using examples from ecology, medicine, earth science, and natural language processing, the book discusses smoothing, hierarchical models, data assimilation, and various language models from a Bayesian modeling perspective.

Individual Technical Topics

Statistics will be representative of the population. In contrast, what does it mean to seek individuality? First, why do we need to “average”? First, let us discuss why we need to “average”.

As an example, let us assume that you went to a hospital and were advised to undergo a surgery, and that you felt a little relieved because your research showed that 90% of the surgeries are successful, but further research showed that the success rate varies depending on age, and that for people over 40 years old (say you are 47 now) the success rate is 70%.

It is natural to think that simply applying the average to oneself is not very reassuring, but if one pursues the closest to “oneself” to any extent, one will end up with “there is only one person who looks just like you. It is you.” That is you. And since you are the only you, there is nothing to learn from the data.

Each piece of data is like a star shining in the night sky by itself, with infinite darkness in between. In order to extract something from the data, it is necessary to compensate for the gaps by putting things together that are similar in some way. That is what we call modeling here. Without modeling, it is impossible to extract laws or make predictions. Statistical science begins with the acknowledgment that

In this article, we will discuss the hierarchical Bayesian model. When we consider a multi-stage data generation process by replacing constants with random variables, how can we make inferences using such a hierarchical model?

First, artificial data with unequal time intervals (or measurement points in space) are generated. The s.size of times are randomly selected between 1 and tmax (non-recoverable extraction), and the observed values are generated at each time. Specifically, the observation time (observation position) is generated as test.t, the “true value” is calculated with test.c.0, and the remaining test.c is generated by adding Gaussian noise (normal random number).

The KFAS package is originally designed to handle time series with Kalman filter, but it can also be used for 1D spatial data.

In science, we deal with data (a collection of numbers and symbols with a structure) obtained from observations and experiments. At this time, statistical methods are used to construct a good statistical model that can explain the patterns seen in the observed data. This allows us to combine the data and the model, and to estimate parameters that characterize the model.

Such statistical modeling is the essence of data analysis, but in many cases, data processing is treated like a prescribed procedure, without considering it as creative modeling. In contrast, this paper will consider modeling based on data examples such as those used in biology.

Biology is the study of the behavior of individual organisms and populations based on observation data obtained in the field where organisms live. Real-life examples of ecological research are too complex, so let’s say that we choose one individual of a plant and we want to know how many seeds it produces. Every plant has 10 embryos, which are the organs from which seeds are produced. This means that the minimum number of seeds observed is 0 and the maximum is 10. (In the above example, 4 seeds are produced.) The probability of an embryo becoming a seed is called the fruiting probability. There are various biological factors that determine the size of this fruiting probability. However, we assume here that they are not known.

Here is a simple example of how parameter estimation can go wrong if “individual differences” such as individual differences, differences between individuals, and differences between groups are ignored in the data. If we look at the data by group, we can see that x and y seem to be correlated within each group, but since there are not many observations in each group, we have to consider the problem that if we fit a straight line to each group separately, the variation of the slope will be large.

It is still fresh in our minds that the relationship between abnormal behavior during an influenza outbreak and the use of Tamiflu has been widely reported and has generated a great deal of public interest. In the first place, “the drug works” does not necessarily mean “the disease is cured,” nor does it work uniformly for all patients. It is extremely rare that all patients given the same drug at the same dosage and administration respond in the same way, with some showing early improvement and others, unfortunately, deteriorating. Moreover, it is difficult to predict in advance which patients will respond in which direction, and only observation after administration can tell which patients will respond in which direction. Thus, the problem encountered in the medical field is that it is unnatural to consider the effect of a drug to be estimated as a constant, and it is sometimes natural to consider it as a variable or random variable that varies from person to person.

In order to deal with such problems appropriately, a mixed model (mixed-effects model) that classifies the effects into two categories: fixed-effects, in which it is natural to assume that the effect is constant regardless of individuals, such as sex and age, and random-effects, in which the effect varies depending on individuals, such as the effect of drugs, is used. effect model) or a Bayesian model that assumes random variables (prior distribution) for all factor effects. In this section, we discuss the necessity and methods of incorporating individual differences through three specific examples from the field of medicine.

In this article, we will discuss how to construct a model that describes the whole based on a local model. Specifically, we will first take the temperature data for Tokyo as an example, while using a statistical model with a small number of fixed parameters to extract meaning from the data. The talk will start with data analysis using the overall model, so to speak. Next, we will introduce a local linear model to extract temporally localized information, and then extend it to a nonlinear model to see how the expressive power of the model can be enriched. This local nonlinear model becomes a “soft” model that allows for stochastic deviations from the constraints that are usually given by stochastic difference equations and expressed in terms of equations. Furthermore, by generalizing the distribution followed by the noise term that generates stochastic fluctuations to a non-Gaussian, or non-Gaussian distribution, we can better handle rarely occurring stochastic events, such as jumps and outliers. This characteristic, which cannot be expressed by a noise term following a Gaussian distribution, is called non-Gaussianity.

Objectively, language can be thought of as a sequence of symbols. Although, when looked at in detail, it is composed of letters, here we will assume that language, like English, is composed of words.

One thing that is immediately noticeable when looking at a language’s word sequence is that the frequency of words is highly skewed. The inverse relationship between word rank and word frequency is known as Zipf’s law, and is one of the basic facts discovered in the 1930s. In recent years, this law has become known as a power law common to many discrete phenomena in nature beyond language.

To express such indeterminacy, we need a probability distribution for the location of p itself. The simplest such distribution is the Dirichlet distribution as in the following equation.

Often we hear first-time students say, “I don’t understand statistics.” There are also mathematical scientists who make similar statements with a deep knowledge of statistics. This opinion of statistics cannot be said to be generally uninformed or unfair. Statistics is concerned with “methods of inference” and is based on the “concept of probability. But “what is rational inference?” is not an easy question to answer.

Furthermore, the concept of “probability” is difficult to grasp in a straightforward manner, which is also a matter of constant controversy. In view of this fact, those who feel that they do not understand statistics or probability may be the ones with sound common sense. If someone thinks that he or she can easily understand them, he or she may need to recheck his or her level of understanding. It is not without reason that statistics are difficult to understand.

In this article, we will discuss the methodological basis of statistical science as modern statistics, the significance and certainty of the inferential methods it offers, its historical background, and the place it occupies in scientific research and the direction in which it is developing.

Looking back on the development of the solution method for non-pertinent problems, we can see that the model for the solution method has progressed from an “equation model” to an “optimization (variational) model” and then to a “probability/statistical model. This is a direction of methodological development that is universally applicable to modeling in scientific research.

Today, the usefulness of “Bayesian statistics,” which assumes a model in which the number of parameters is larger than the number of data and assumes an appropriate “prior distribution” for the parameters, is widely recognized. Its utility is also being recognized in the fields of information processing, pattern recognition, and data mining. In recent years, “machine learning” such as support vector machines and neural networks have been attracting attention in the industrial world.

  • Bayes, Hierarchical Bayes, and Experience Bayes
  • The Two Faces of Hierarchical Bayes
  • Prior distributions to represent correlations
  • Outliers, clustering, missing data
  • About Bayesian Statistics and Machine LearningIn recent years, AI has become widely known for its ability to beat Go masters, but not many people are aware of the methodology behind machine learning, which is the brainchild of AI. Today, learning machines called “Deep Learning” are doing well, but even the engineers who manipulate the software of these machines as useful tools are not aware of the methodological implications of what they are capturing. Here we discuss the impact of machine learning on scientific methodology and engineering, and review the suitability of probabilistic approaches, especially Bayesian models, for machine learning design.

コメント

タイトルとURLをコピーしました