Uncertainty and Machine Learning Techniques

Machine Learning Artificial Intelligence Digital Transformation Probabilistic Generative Models Machine Learning with Bayesian Inference Small Data Nonparametric Bayesian and Gaussian Processes python Economy and Business Physics & Mathematics Navigation of this blog

What is uncertainty?

Uncertainty refers to a state of uncertainty or information in which future events or outcomes are difficult to predict and uncertain, and is caused by the limitations of our knowledge and information, and represents a state in which it is difficult to have complete information or certainty. Uncertainty manifests itself in a variety of ways, including

Outcome uncertainty: When future events or outcomes are unpredictable, uncertainty exists about those outcomes. For example, weather forecasts and stock price movements are influenced by many factors and are difficult to predict with complete accuracy.
Data Uncertainty: Uncertainty exists for information when current information is limited or data is unreliable. Data uncertainty can be especially high for unknowns or new events.
Decision uncertainty: Uncertainty exists with respect to a decision when the outcome cannot be predicted with certainty. This is often the case when assessing risk or considering future returns or losses.

Mathematical methods and models, such as probability theory and statistics, are used to deal with uncertainty. These methods are important tools for quantifying uncertainty and minimizing risk. It is also important to recognize uncertainty and consider appropriate measures to deal with it. While it can be difficult to eliminate uncertainty entirely, efforts can be made to minimize risk.

Probability Theory and Statistics

Probability Theory (Probability Theory) and Statistics (Statistics) are branches of mathematics that study concepts related to uncertainty and provide theories and methods for analyzing and predicting data. These disciplines are closely related and in many ways complementary to each other, but each focuses on different aspects.

Probability Theory: Probability theory is the study of mathematical properties related to probability by constructing probabilistic models of uncertain phenomena. Probability theory studies methods for quantitatively evaluating the probability that an event will occur, as well as concepts such as probability operations, probability distributions, conditional probability, and independence. Probability theory is a branch of mathematics that focuses primarily on theoretical aspects.
Statistics (Statistics): Statistics is the study of methods related to the collection, organization, analysis, and interpretation of data, which applies the concepts of probability theory to data analysis. Statistics uses data to infer information about a population (whole set) or to make statistical inferences from a sample. Statistics includes descriptive statistics (summarizing and visualizing data), inferential statistics (making inferences from samples from the population), and hypothesis testing (testing statistical hypotheses).

Statistics focuses on analyzing actual data and evaluating the results obtained from that data, and the concepts of probability theory are often used in statistics as the basis for probability distributions and inferences. In general, probability theory provides mathematical theories related to probability, while statistics is a practical discipline that provides methods for analyzing data and making inferences. These disciplines have much in common, and statistics can be viewed as an application of probability theory.

Next, we will delve deeper into probability.

Probability and its philosophical meaning

<About probability>

Probability is used in dealing with uncertainty and is an important concept applied in a variety of fields, including science, statistics, economics, engineering, game theory, and decision theory. Understanding probability allows us to predict the future and support our decisions.

Probability is a concept that expresses the likelihood or certainty of an event occurring as a numerical value, specifically, a value between 0 and 1, where 0 indicates a low probability of an event occurring and 1 indicates a high probability of an event occurring. Probability is dealt with in mathematics and statistics and is used to make probabilistic estimates and predictions about various phenomena and events.

Here are some specific examples of probability. For example, when considering the case of tossing a coin, there are two possible outcomes: either the coin will turn up Heads or tails, and the probability of a Heads outcome is 1/2 (0.5), and the probability of a tails outcome is 1/2 as well. This is the probability for a fair coin, and after a large number of trials over a long period of time, one would expect the heads and tails to appear with approximately the same frequency.

As yet another example, consider the case of rolling dice. A dice has six sides, and the probability of each side coming up is 1/6 (approximately 0.1667). This is the probability for a fair dice roll, and it is expected that the eyes 1 through 6 will appear with equal probability.

<The Philosophical Meaning of the Concept of Probability>

The concept of probability involves a reflection on the limits of human knowledge and decision-making beyond the framework of mathematics and statistics. Although probability is used as a means of expressing confidence in future events and outcomes, there are a variety of philosophical perspectives behind it. The following are the various perspectives relevant to the philosophical interpretation of probability.

<Subjective Probability>

Subjective probability refers to probability estimated based on an individual’s subjectivity and beliefs. Due to the limitations and imperfections of information, humans cannot have complete confidence in future events. Therefore, there are cases in which probabilities estimated by individuals based on their own knowledge and experience are treated as subjective probabilities. This can be said to be a subjective evaluation of the probability of occurrence of a particular event by an individual based on his or her own experience, knowledge, intuition, or feelings, rather than on objective data or statistical evidence. An expression of subjective probability is something like, “There is probably a 50% chance that it will rain tomorrow.

Since subjective probability depends on an individual’s subjective judgment, different people may have different probabilities for the same event. For example, one person may estimate a high probability that a particular product will sell while another person may estimate it lower. This occurs because people have different experiences and access to information.

Subjective probability may be used in the following situations

Decision-making: When making decisions in situations of uncertainty, individuals make choices based on their subjective probability assessment. For example, this is the case when evaluating the probability of success of a new business project and deciding whether to invest in it.
Rare event evaluation: When historical statistical data is limited or for an unknown event, an individual may subjectively evaluate the probability of its occurrence. This is especially true when events are so rare that statistical data are scarce.
Predictions and Forecasts: When objective data are not available for a future event, individuals may make predictions or forecasts based on their subjective views.

Subjective probabilities are different from objective probabilities, but can play an important role under certain circumstances. However, due to their highly subjective nature, it is often desirable to consider them in combination with objective data or statistical evidence.

<Frequency-based probability>

Frequency-based probability is one interpretation of probability, an approach that defines probability based on statistical frequency or number of trials over time. In this approach, the frequency of occurrence of a particular event is considered to represent the probability of that event. For example, the frequency-based probability approach states that when a coin is tossed, there is a 1/2 chance that the coin will turn up.

The characteristics of frequentist probability are as follows

Based on long-term observations: In frequency-based probability, statistical observations are made over a long period of time to evaluate the probability of a particular event. For example, the probability of getting a table is estimated by tossing a fair coin many times and observing the percentage of times it turns up.
Objectivity of probability: Because frequency-based probability evaluates probabilities based on objective data and statistical observations, it eliminates the influence of personal subjectivity and emotions.
Assumption of duplicability: Frequency-based probability applies when it is possible to repeat a trial. In other words, it is valid when the same experiment or trial can be repeated under the same conditions.
Stability assumption: Frequency-based probability assumes that probabilities converge and become stable as the number of trials increases. For example, as the number of coin tossing trials increases, the probability of getting a table becomes what is expected to converge to 0.5.

The frequentist probability approach is the foundation of statistics and provides the basic theory for a mathematically rigorous treatment of the concept of probability. The advantage of frequentist probability is that it allows objective and quantitative evaluation of probability through statistical observations over time, but it can be difficult to apply to specific phenomena or events. Therefore, it is important to consider it in combination with other interpretations of probability, such as subjective or Bayesian probability.

<Bayesian Probability>

Bayesian probability is an interpretation of probability, which is an approach to probability that combines statistical data and subjective prior beliefs to evaluate the probability of an event. In Bayesian probability, a hypothetical prior probability is initially given, and the probability is updated each time new information (e.g., observational data) is obtained. This method is said to provide a more reasonable probability evaluation by modifying the probabilities sequentially. This approach is based on the updating rule of probability proposed by Thomas Bayes.

The characteristics of Bayesian probability are as follows

Prior and posterior probabilities: In Bayesian probability, an initial probability, called the prior probability (prior probability), is established before the probability of an event occurring is evaluated. Then, after new data or information is obtained, an updated probability, called the posterior probability, is calculated. In other words, the method updates the probabilities sequentially each time data is obtained.
Bayes’ Theorem: The mathematical theorem at the core of Bayesian probability is Bayes’ Theorem. Bayes’ Theorem is a formula for calculating posterior probabilities using conditional probabilities, and is expressed as follows

P(A|B) = P(B|A) * P(A) / P(B)

where P(A|B) is the posterior probability, P(B|A) is the conditional probability of event B given event A, P(A) is the prior probability, and P(B) is the probability of event B.

Incorporation of Subjective Factors: Bayesian probability can incorporate subjective factors. Prior probabilities are set based on an individual’s subjective beliefs and knowledge, allowing for different probability evaluations for each individual.
Statistical inference: Bayesian probability is often applied to statistical inference, allowing for optimal probability evaluation while accounting for missing or uncertain data.

Bayesian probability is especially useful when data is limited or uncertain, and is also widely used when sequential information is required or in the fields of machine learning and artificial intelligence. However, caution must be exercised in dealing with subjective factors and fluctuations in results due to the setting of prior probabilities.

<Principle of Indifference>

The Principle of Indifference is one of the key concepts in the philosophical approach to probability. It is the principle that considers probabilities to be equal when no specific information is available or when there is no reason to make a distinction between alternatives.

The indicative principle is applied in the following situations

Multiple equal possibilities: When there are multiple equal possibilities for an event, the indicative principle evaluates these possibilities with the same probability. For example, in the case of rolling dice, the indicative principle can be applied to consider all the probabilities of a roll from 1 to 6 as 1/6.
Lack of information to be considered equally: When there is a lack of information related to a particular event, or when information is assumed to be equally distributed, the indicative principle treats the possibilities equally.

However, there are criticisms of the indifference principle, and in particular, it is pointed out that there are cases in which it is not appropriate to distribute probabilities equally. For example, this is the case when probabilities need to be considered in specific contexts or under specific conditions, or when the mechanisms or characteristics behind events differ.

Whether or not to apply the indicative principle in assessing probabilities depends on the situation and the nature of the problem. What is important is that the decision to evaluate probabilities will be based on appropriate information and logical rationale. The indicative principle may be used as a useful tool in certain simple situations, but it should be applied with caution to complex real-world problems.

<Summary>

These philosophical approaches to the concept of probability are important concepts that influence the definition and interpretation of probability, and the idea of probability is an important perspective when considering its application in applications in science, statistics, economics, decision theory, and many other fields.

Next, we discuss probability distributions, which are mathematical tools for dealing with probability and statistics in machine learning and other applications.

What is probability distribution?

In statistics and probability theory, a probability distribution is a function that indicates the probability of an event occurring. Probability distributions are used to express the probability that an individual event or random variable will take on a particular value.

There are two main types of probability distributions: discrete probability distributions and continuous probability distributions.

<Discrete Probability Distribution>

A discrete probability distribution will be that of a probability distribution used when the random variable takes a finite or countably infinite number of discrete values. This would be the probability of an event occurring for each individual event that may take on a specific value. Examples include the two sides of a coin, the roll of a dice, or the number of successes in a number of trials.

In a discrete probability distribution, probability is expressed using a Probability Mass Function (PMF). The probability mass function is a function that indicates, for each discrete value, the probability of that value occurring. The probability mass function is non-negative, and the sum of the probabilities for all values is 1.

Examples of typical discrete probability distributions include the following

Bernoulli distribution: the Bernoulli distribution is a probability distribution in which one of two possible outcomes (e.g., success and failure) will occur. If the probability of success is p and the probability of failure is q = 1 – p, the probability mass function of the Bernoulli distribution is expressed as

P(X = x) = p^x * q^(1-x)

where x can take the value 0 or 1.

Binomial distribution: The binomial distribution is the probability distribution that the number of successful trials follows when n independent Bernoulli trials are made. If the number of trials is n and the probability of success is p, the probability mass function of the binomial distribution is expressed as

P(X = k) = (n choose k) * p^k * (1-p)^(n-k)

where k is the number of successes.

Poisson distribution: The Poisson distribution is a probability distribution that is followed by the number of occurrences of rare events in time or space. If the average number of occurrences per unit time or unit space is λ, the probability mass function of the Poisson distribution is expressed as

P(X = k) = (e^(-λ) * λ^k) / k!

where k is the number of occurrences.

These are examples of the main discrete probability distributions, but there are various other probability distributions such as geometric, hypergeometric, negative binomial, and multinomial distributions.

<Continuous Probability Distribution>

Continuous probability distribution refers to a probability distribution used when random variables take on continuous values. As examples, height, weight, and time are treated as continuous random variables. In a continuous probability distribution, probability is expressed using a Probability Density Function (PDF). The probability density function is a function that indicates the probability density (density of probability) over a range of values rather than a specific value.

In a continuous probability distribution, the area of the interval representing the range of values represents the probability, and the property of the probability density function is that the probability density is non-negative and the integral (area) over the entire region is 1. The probability in an interval can be obtained by integrating the probability density function over that interval.

Examples of typical continuous probability distributions include the following

Normal (Gaussian) distribution: The normal distribution will be the probability distribution commonly found for many natural phenomena and measurements. It has a bell-shaped curve and its shape is determined by the mean and standard deviation. The probability density function of the normal distribution is expressed as

f(x) = (1 / (σ * √(2π))) * e^(-((x-μ)^2) / (2 * σ^2))

where μ is the mean and σ is the standard deviation.

Uniform distribution: A uniform distribution is a probability distribution where all values within a certain range have the same probability of occurring. The probability density function of the uniform distribution is expressed as

f(x) = 1 / (b – a) (a ≤ x ≤ b)

where a and b represent the minimum and maximum values in the range.

Exponential distribution: The exponential distribution is a probability distribution applied to time intervals or intervals between occurrences of events. The exponential distribution is often used when events occur continuously. The probability density function of the exponential distribution is expressed as

f(x) = λ * e^(-λx) (x ≥ 0)

where λ represents the rate parameter.

These are examples of continuous probability distributions, but there are various other probability distributions such as gamma, beta, Cauchy, Weibull, and logistic distributions.

<Summary>

It is important to choose the appropriate probability distribution according to the nature of the data and the requirements of the problem. These probability distributions can be used to model the characteristics of the data, which allows for statistical inference, prediction, or machine learning.

Probability Distributions and Machine Learning

Probability distributions are a very important concept in machine learning. Machine learning is a method for learning patterns and relationships from data and making predictions and decisions on unknown data, and probability distributions are used to model the probabilistic structure behind the data in this learning and prediction process.

The role and use of probability distributions in machine learning include the following

Modeling data: In machine learning, models are trained using training data. Given data, it is possible to capture the probabilistic structure behind the data by assuming a probability distribution that adequately represents it. For example, in the case of image recognition, it is important to estimate the probability distribution that represents each image class (dog, cat, car, etc.).
Probabilistic Prediction: Probability distributions can be used to make predictions that take into account uncertainty in the prediction results, for example, if data has a high probability of belonging to a certain class, it can not only be presented as a prediction, but also the probability of belonging to other classes.
Bayesian inference: Bayesian inference is a method of updating probability distributions by combining prior knowledge (prior information) with new data. In machine learning, Bayesian inference is used to estimate the posterior distribution of parameters.
Probabilistic Generative Models: Generative models using probability distributions are used to generate new data. Generative models generate similar samples of data by learning probability distributions of training data and sampling the new data. This is a form of unsupervised learning and is used for data generation and data completion.

On algorithms for dealing with probability distributions

There are two main approaches to algorithms for dealing with probability distributions: (1) generative models and (2) inferential models. These algorithms are used to estimate the probability distribution of data or to generate new data.

Generative Models: Generative models are used to learn the probability distribution of data and to sample new data. Generative models are a form of unsupervised learning and are used for data generation, data completion, and anomaly detection. Typical generative models include the following
- Gaussian Mixture Model (GMM): A model that assumes that data are generated from multiple Gaussian distributions.
- Autoencoders: Models that learn latent representations of the data and reconstruct the data using them.
- Bayesian Networks: Graphical models that represent conditional probability distributions for multivariate data.
- Bayesian Deep Learning: A method for modeling neural networks in a Bayesian manner.
Inference Models: Inference models are used to estimate the parameters of a model from given data or to make predictions about unknown data. Inference models are used for various machine learning tasks, including supervised and unsupervised learning. Typical inference models include
- Logistic Regression: A linear model for binary classification.
- Support Vector Machines (SVM): Powerful algorithms for classification and regression.
- Random Forest: An ensemble learning method as described in “Overview of Ensemble Learning and Examples of Algorithms and Implementations” combining multiple decision trees.
- Neural Networks: Deep learning to learn sophisticated feature representations, applied to a variety of tasks.

These algorithms are selected according to the characteristics of the probability distribution and the nature of the data, and are used in a variety of machine learning applications. By properly modeling probability distributions, it is possible to understand the probabilistic structure behind the data and make more accurate predictions and analyses.

Libraries and platforms that deal with probability distributions

Various libraries and platforms exist for working with probability distributions. These tools are used for a wide range of applications, including building probability models, estimating probability distributions, statistical analysis, and performing machine learning tasks. The following describes some of the most commonly used libraries and platforms.

<Python Libraries>

NumPy: A basic Python library that supports numerical computation and array manipulation, and is also used for manipulating probability distributions.
SciPy: A library based on NumPy that provides advanced scientific and technical computing, enabling statistical analysis of probability distributions, evaluation of probability density functions, and random number generation.
pandas: A library specialized for data analysis and manipulation, useful for organizing and visualizing data including probability distributions.
scikit-learn: A library that provides a rich set of algorithms for machine learning, enabling the implementation of probabilistic classifiers and regression models.

<R language packages>

stats: A package containing basic statistical functions in the R language, used for statistical analysis of probability distributions.
dplyr: A package for concise data manipulation, facilitating the processing of data containing probability distributions.
ggplot2: A package that excels in data visualization and is used to create graphs of probability distributions.

<TensorFlow / PyTorch>

TensorFlow and PyTorch are frameworks for deep learning that also support building probabilistic models and handling probability distributions. In particular, they can be used to implement probabilistic neural networks and probabilistic graphical models.

<Stan>

Stan is a probabilistic programming language and platform for Bayesian statistical modeling that integrates probabilistic model building and inference.

For examples of probability distribution applications

Probability distributions have been widely applied in various fields. The following are examples of probability distribution applications.

<Finance>

Prediction of stock prices: Stock prices fluctuate over time, and this fluctuation has a stochastic component. Statistical models using probability distributions (e.g., normal distribution and t-distribution) are used to model financial market data.
Risk Management: Financial institutions use different probability distributions to assess the likelihood of loss in order to minimize risk.

<Natural Sciences>

Physics: Particle motion and energy distributions are modeled using probability distributions. For example, the Boltzmann and Maxwell-Boltzmann distributions are used.
Ecology: The number of organisms, frequency of occurrence, individual body size, etc. may be modeled using probability distributions.

<Technology>

Pattern Recognition: Machine learning algorithms and deep learning models perform classification and regression based on the assumption of probability distributions that data features follow.
Natural language processing: In the generation and analysis of natural language data, word frequencies and sentence structure are treated as probability distributions.

<Medical>

Disease prediction: Probability distributions may be used to model disease progression and treatment success rates in order to assess a patient’s condition and risk.
Drug effects: Individual differences among patients may be considered using probability distributions to evaluate the effects and side effects of drugs.

Example implementation in python using probabilistic modeling for image recognition

A common approach to using probabilistic modeling for image recognition is to use Bayesian inference, which is a statistical method that estimates the posterior distribution of unknown parameters based on prior knowledge (prior distribution) and observed data (likelihood).

Below is an example of a common implementation of image recognition using probabilistic modeling in Python. Here, we use the PyTorch library and, for the sake of simplicity, we also target the MNIST dataset (dataset of handwritten numbers).

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Data Preprocessing
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Definition of a Bayesian neural network model
class BayesianNetwork(nn.Module):
    def __init__(self):
        super(BayesianNetwork, self).__init__()
        self.fc1 = nn.Linear(784, 128)  # Input image size: 28x28 = 784
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)  # 10 classes to classify numbers 0-9

    def forward(self, x):
        x = x.view(-1, 784)  # Conversion of 2D image data to 1D
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# training function
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 100 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

# main function
def main():
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = BayesianNetwork().to(device)
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    for epoch in range(1, 11):  # Training at 10 epochs
        train(model, device, train_loader, optimizer, epoch)

if __name__ == '__main__':
    main()

In this example, a Bayesian neural network is defined and the MNIST dataset is loaded using PyTorch. The network has three hidden layers, each applying the ReLU activation function and using Adam for optimization.

Bayesian approaches usually require obtaining the posterior distribution by inference, but here the posterior distribution is not computed rigorously; instead, stochastic gradient descent (SGD) or Monte Carlo dropout described in “Overview of Monte Carlo Dropout and Examples of Algorithms and Implementations” may be used as a way to approximate the posterior distribution In SGD, parameters in the network are trained with random noise, resulting in a probabilistic distribution of the parameters.

For an example implementation in python of probabilistic forecasting

In order to make probabilistic predictions, the output of the classifier must be converted to a probabilistic value. This is usually accomplished by using a softmax function as described in “Overview of softmax functions and related algorithms and implementation examples” to convert the output of the classifier to a probability. It will also be common to normalize the output values to make the output probabilities easier to read.

Below is an example of probability prediction using PyTorch. Here, the same neural network is used as in the MNIST dataset described earlier, but a softmax function is applied to the output of the model to compute probabilities.

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Data Preprocessing
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Definition of Neural Network Model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(784, 128)  # Input image size: 28x28 = 784
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)   # 10 classes to classify numbers 0-9

    def forward(self, x):
        x = x.view(-1, 784)  # Conversion of 2D image data to 1D
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Probability Prediction Function
def predict_probabilities(model, data):
    model.eval()
    with torch.no_grad():
        output = model(data)
        probabilities = F.softmax(output, dim=1)
    return probabilities

# main function
def main():
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = NeuralNetwork().to(device)
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    for epoch in range(1, 11):  # Training at 10 epochs
        model.train()
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            output = model(data)
            loss = F.cross_entropy(output, target)
            loss.backward()
            optimizer.step()

    # Example of probability prediction with test data
    test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)
    test_loader = DataLoader(test_dataset, batch_size=1, shuffle=False)
    model.eval()
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            probabilities = predict_probabilities(model, data)
            predicted_class = torch.argmax(probabilities).item()
            print(f"Predicted probabilities: {probabilities}")
            print(f"Predicted class: {predicted_class}")
            print(f"True class: {target.item()}")

if __name__ == '__main__':
    main()

In this example, the test data is used to make probability predictions, and the predict_probabilities function applies a softmax function to the model output to calculate probabilities. It also uses torch.argmax to indicate the class with the highest probability as a prediction. In this way, the probability of belonging to each class can be shown, and if a classifier has a high probability of belonging to a particular class, its probability for the other classes will be low.

For an example implementation of Bayesian inference in python

Bayesian inference is a statistical method that combines prior distribution and likelihood to estimate posterior distribution. Here we show an example implementation of Bayesian inference using PyMC3, a Python library that is a powerful tool to support the construction and inference of Bayesian statistical modeling, using Markov chain Monte Carlo (MCMC) sampling to estimate the posterior distribution. The following is a brief description of PyMC3.

First, install PyMC3.

pip install pymc3

Next, consider a simple example of dice using Bayesian inference. The probability that a dice is an illegal dice is estimated based on observed data obtained from a dice-rolling experiment.

import pymc3 as pm
import numpy as np

# Observation data of dice (number of times 1 is rolled, ... , number of times 6 is rolled) , , , 6)
observed_data = np.array([10, 15, 5, 12, 18, 8])

# Model Building
with pm.Model() as model:
    # Probability of dice being incorrect (uniform distribution of 0-1)
    p = pm.Uniform("p", 0, 1)
    
    # Dice roll (categorical distribution of 1-6)
    likelihood = pm.Multinomial("likelihood", n=np.sum(observed_data), p=[p, p, p, p, p, p], observed=observed_data)
    
    # MCMC Sampling
    trace = pm.sample(10000, tune=2000, chains=2, cores=2)

# Visualization of sampling results
pm.plot_posterior(trace)

In the above example, the probability p of getting a roll of 1 to 6 is estimated, and observed_data contains the observed data resulting from the dice roll. Bayesian inference uses MCMC sampling to approximate the posterior distribution, and sampling is performed using the pm.sample function. trace contains the sampling results. Finally, pm.plot_posterior(trace) is used to visualize the estimated posterior distribution.

For an example implementation in python of a stochastic generative model

A probabilistic generative model is a method that models the data generation process as a probabilistic model. Here, we show a simple example implementation of a probabilistic generative model using PyTorch, a library in Python.

As an example, consider a generative model that samples from a Gaussian distribution that uses a Gaussian distribution to generate one-dimensional data.

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import matplotlib.pyplot as plt

# True Parameters
true_mean = 2.0
true_std = 0.5

# Creation of generated data
def generate_data(num_samples):
    return torch.normal(mean=true_mean, std=true_std, size=(num_samples,))

# Definition of stochastic generative model
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.mean = nn.Parameter(torch.randn(1))
        self.std = nn.Parameter(torch.randn(1))

    def forward(self, num_samples):
        return torch.normal(mean=self.mean, std=torch.abs(self.std), size=(num_samples,))

# training function
def train(generator, data, optimizer, num_epochs):
    for epoch in range(num_epochs):
        generated_data = generator(len(data))
        loss = F.mse_loss(generated_data, data)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if epoch % 100 == 0:
            print(f"Epoch [{epoch}/{num_epochs}], Loss: {loss.item()}")

# main function
def main():
    num_samples = 100
    num_epochs = 1000
    data = generate_data(num_samples)

    generator = Generator()
    optimizer = optim.Adam(generator.parameters(), lr=0.1)

    train(generator, data, optimizer, num_epochs)

    # Visualization of true distribution and generated data
    plt.hist(data, bins=20, alpha=0.5, label='True Data')
    generated_data = generator(num_samples).detach().numpy()
    plt.hist(generated_data, bins=20, alpha=0.5, label='Generated Data')
    plt.legend()
    plt.xlabel('Value')
    plt.ylabel('Frequency')
    plt.show()

if __name__ == '__main__':
    main()

In the above example, the true distribution is given as Gaussian (specified by true_mean and true_std), and the data is generated according to the true distribution using the generate_data function. and the mean squared error (MSE) is used to train the model. The train function also learns the generative model and uses Adam for optimization, but other optimization methods can be used. Finally, a histogram of the true distribution and the generated data is visualized.

Reference Information and Reference Books

For more information on probabilistic approaches, see “Mathematics in Machine Learning” “Probabilistic Generative Models” and “Machine Learning with Bayesian Inference and Graphical Models,” among others.

For reference books on the theory and history of probability and statistics, see “Probability Theory for Beginners: A Reading Memo” ,”Introduction to Probability Theory: A Reading Memo” ,”Nine Stories of Probability and Statistics that Changed Humans and Society: A Reading Memo” and “134 Stories of Probability and Statistics that Changed the World: A Reading Memo. For specific implementations and applications, see “Statistical Modeling with Python” ,”Statistical Analysis and Correlation Evaluation Using Clojure/Incanter” ,”Probability Distributions Used in Probabilistic Generative Models” etc.

A good reference book on Bayesian estimation is “The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant from Two Centuries of C“

“Think Bayes: Bayesian Statistics in Python“

“Bayesian Modeling and Computation in Python“

“Bayesian Analysis with Python: Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ, 2nd Edition“