134 Stories of Probability and Statistical Magic that Changed the World Reading Notes

Machine Learning Digital Transformation Artificial Intelligence Mathematics Algorithms and Data Structure Life Tips Navigation of this blog

134 Stories of Probability and Statistical Magic that Changed the World Reading Notes

From Episode 134 of The Miracle of Probability and Statistics That Changed the World. Hirokazu Iwasawa

Probability theory began as a correspondence between two great mathematicians, and statistics began as “political arithmetic. How did they develop and reach the present day? This is a scientific reading book that lightly describes the unique ideas of these geniuses and the great achievements that emerged from them, along with abundant episodes.

Probability and Statistics” is a very familiar, easy-to-understand, and interesting subject, but it can also be quite difficult to find the right answer to it. In fact, there have been cases where the great mathematicians of the time made mistakes even in problems that can be answered correctly by today’s junior high and high school students. On the other hand, there are still elegant solutions to unique problems by great mathematicians, full of mathematical sense. The book presents many such seemingly strange problems, matters requiring clever thinking, and interesting historical episodes, all from the unique perspective of the author, who is also an actuary and mathematical puzzle designer.”

Chapter 1: A Geometric Spirit in Betting – The Beginning of Probability Theory-.

001 The Spaghetti Wheel
002 Weather Forecasting and Probability
003 The Birth Year of Probability Theory
004 The word “probability
005 A Winning Strategy for Casinos?
006 Cardano, the Pioneer
007 Cardano’s unsolvable problem – the distribution problem
008 Galileo’s dice problem
009 De Melle – the man who created the opportunity
010 Solving the Distribution Problem
011 The genius of Pascal
012 Fermat’s Magic – The Digestion Argument
013 Unsolved for more than 300 years
014 The Fearsome Gambler de Melle
015 The technical term of probability
016 What is an event?
017 Roulette bias
018 Division of events
019 Greek letters
020 Todhunter, “History of Probability Theory”
021 Huygens’ activity
022 Gambler’s bankruptcy problem
023 Huygens’ expected value
024 Chuck-a-luck
025 How to calculate expected value
026 Additivity of expected value
027 Spaghetti wheel answer
028 The beginnings of statistics
029 Political arithmetic in England
030 Insurance Mathematics Begins in the Netherlands
031 The heyday of the Dutch

Chapter 2: Toward the Birth of Mother Nature – The Completion of Classical Probability Theory

032 The Misfortunes of Probability Theory
033 The “Year of Wonders”
034 Newton’s Contact with Probability
035 The General Binomial Theorem
036 Leibniz’s Failure
037 The Middle Fathers of Classical Probability Theory
038 Jacob Bernoulli, “The Art of Reasoning”
039 Bernoulli Trial, Binomial Distribution
040 What is a probability distribution?
041 Weak law of large numbers
042 Tribulations of the genius de Moivre
043 The de Moivre trick
044 The trick continues
045 de Moivre’s “chance theory”
046 Independence
047 Fifty-two cards vs.
048 Discovering the normal distribution
049 Equation for Normal Distribution
050 Mean, variance and standard deviation
051 Logarithm
052 Napier’s own logarithm
053 Stirling’s formula
054 The technical term “probability
055 Attendance number and back order
056 Montmor, the aristocrat
057 Trades
058 Euler and probability theory

Harmonic series

059 Mathematicians during the French Revolution
060 Laplace, the Completer of Classical Probability Theory
061 Laplace, “Analytic Theory of Probability”
062 Theory of generating functions

063 Generating function

A function that, when subjected to a certain operation, produces a value of interest in probability or statistics.
The Japanese word “mother” is used because it is a generating function.
Probability generating function generates probability
Moment generating function generates moments
Cumulant functions produce cumulants
A mother function has all the information for the corresponding probability distribution
If the first through fourth order moments are known, the mean, variance, skewness, kurtosis, and other guidelines can all be calculated.
Moment generating functions for continuous types are a type of Laplace transform
The characteristic function for continuous types is the Fourier transform.

063 A familiar example of the use of a generating function – Sickerman Diez
064 Typical Uses of Mother Functions
065 Uses of characteristic functions

Chapter 3: Detecting Baker’s Fraud, Too – The Age of Normal Distributions

066 Unevenness of the Normal Distribution
067 Called “Gaussian Distribution
068 Stigler’s law
069 The three great mathematicians
070 Prince of Mathematics
071 How to remember the year of birth
072 Gauss at 24
073 “Oligopoly, but ripe”
074 Normal distribution as error distribution
075 Central limit theorem
076 Gaussian integrals and pi
077 Who was the first to accomplish the Gaussian integral
078 Gauss and probability
079 Gauss-Kuzmin distribution
080 Anecdotes of Poincaré
081 The true story of Ketley
083 The Patriarch of Statistics – Ketley
084 Maxwell distribution
085 Gorton’s Normal Distribution for Everything
086 The story of population
087 Correlation and regression
088 Rank correlation coefficient

Chapter 4: Historical Afternoon Tea – The People Who Made Mathematical Statistics

089 Skewed Distributions and Karl Pearson

Four parameters are defined to capture probability distributions
Mean (position)
Moment of first order
Variance (scatter)
Second order moment
Skewness (skewness)
Non-equilibrium between left and right sides of distribution
Skewness of normal distribution is 0
Third order moment
Sharpness (kurtosis)
Example: Central sharpness of the distribution
Degree to which both hems are long-lived
The kurtosis of the normal distribution is 3
Moments of the fourth order
Normal distribution has two parameters: mean, variance
Existence of distributions other than the normal distribution
Binomial distribution
Poisson distribution
Gamma distribution
Moments of the kth order of a sample x1,x2…xn is the mean of each observation multiplied by k. xk=(x1k+x2k…xnk)/n

090 Karl Pearson Chronology
091 Pioneer of Mathematical Statistics – Thiele

Unwritten unknown coefficients of the expansion equation, also solving the concept of cumulant, and obtained by a recursive method.
Finding the density function of a “skewed” distribution based on the Gram-Charlier expansion, using the method of moments

092 Speaking of Thiele

Thiele’s differential equation

093 Edgeworth

Edgeworth Expansion

094 Cumulant

A group of fundamental numbers determined for a probability distribution
Defined as the cumulant of order k for k=1,2, …
The kth cumulant of the distribution according to X is written as kk|X|.

095 Cumulants and the Central Limit Theorem
096 Inferential Statistics

Inferring characteristics or properties of a population from a sample drawn from the population
Used in product quality control

096 Inferential Statistics is used for quality control of products.

097 Postwar Japanese Reconstruction and Inferential Statistics
098 The History of Statistics in the 20th Century, Full of Fights
099 Pseudonyms
100 Student’s t Distribution

Discovery of t-distribution by Gossett
Estimating population distribution with only a few dozen data
Prepared many small samples following the same normal distribution, calculated sample mean X and standard deviation s, took x/s and plotted them, and found the distribution to be the t distribution
If there is a t-distribution, the probability error can be estimated even for a small sample.

101 Sample Distribution Theory

Sample distribution theory studies the distribution of a statistic (mean, standard deviation, etc.) from a sample as a random variable
Chi-square test: Tests a statistic that follows a chi-square distribution
T-test, F-test

102 The Father of Statistics – Fisher

Fisher’s z-transform
Design of Experiments
Analysis of variance method
F-test

103 Most famous experiments

Design of Experiments
Statistical Tests
Non-Regression Hypothesis
Reject the no-regression hypothesis if “the probability of an extreme result equal to or greater than that actually observed is quite low (e.g., less than 5%)” if the no-regression hypothesis is correct.
If the regression hypothesis is rejected for “I can’t tell the difference”, then it is judged to be “I can tell the difference”, and
If the no regression hypothesis is not rejected, “I can tell the difference” is not supported, but “I can’t tell the difference” is also not supported.
A hypothesis is judged to be statistically significant only when the no-regression hypothesis is rejected
The upper bound of probability (e.g., 5%), which is the criterion for determining “low probability” in a statistical test, is called the significance level.

104 Book of Random Numbers

Randomness is necessary for statistical experiments

105 Create Random Numbers.

Chi-square test to determine if it is random or not

106 Neyman-Pearson Style Test Theory

Developed Fisher’s test theory
Fisher’s dominance test
Hypothesis Tests
Deals with alternative hypotheses
An alternative hypothesis is a hypothesis that is supported when the null hypothesis is rejected
Non-regression hypothesis “is the norm” Opposition hypothesis “is greater than the norm”
If it is smaller, it is not rejected
The alternative hypothesis provides direction for rejecting the null hypothesis
Errors of statistical tests
Error of the first kind
The error of rejecting the nonregression hypothesis when it is correct
Errors of the second kind
Failure to reject the null hypothesis even though the alternative hypothesis is correct
EXAMPLE
When the null hypothesis is “coins are equally likely to come up heads or tails” and the alternative hypothesis is “heads are more likely to come up tails”, the probability of five times coming up heads is (1/2)5=1/32=3.125%, the null hypothesis is rejected, and there is a 3.125% chance that the first type of error occurs.
When the coin has a 75% probability of showing its face, the no-regression hypothesis is not rejected for the above hypothesis at (3/4)5=23.7%, but the opposing hypothesis is correct because the face is more likely to show up at 75% probability.
The probability of error decreases as the number of times is increased.
Power: Statistical Power
Error of the second kind subtracted from 1
Neyman-Pearson’s complement
Show how to maximize power under a given level of significance when the null and alternative hypotheses are simple

107 Confidence Intervals

Neyman and Pearson’s method of “interval estimation
The result of the estimation is expressed as an interval by specifying a “confidence coefficient.

108 Theory of Point Estimation

Pinpoint Estimation
The estimated value obtained from the result of point estimation is called “point estimate
Point estimate is the value of a function of observed values
The random variable corresponding to a point estimate is called a point estimator

109 Maximum Likelihood Method

A method to determine the most plausible population value based on observed values
Likelihood is measured by something called “likelihood
Assuming that the population is a discrete distribution, the number of unknowns is θ, and the probability function of the population distribution is f(x,θ), when samples x1,…,xn are obtained, the probability function of the population distribution is f(x,θ). Ln(θ) is expressed as Ln(θ)=f(x1,θ)f(x2,θ)⋅f(xn,θ)
The maximum likelihood method is to find the θ with the maximum likelihood and use it as the point estimate.
Log Likelihood=LogLn(θ)=logf(X1,θ)+…+logf(Xn,θ)

110 Year of birth of the maximum likelihood method
111 Properties of the point estimator

When a cheap dice is rolled 1000 times and 159 1’s are rolled, the point estimate of probability p is 159/1000 = 0.159
Point estimator
A random variable whose realization is used as a point estimate
Properties to have
Unbiasedness
The expected value of the point estimator agrees (theoretically) with the true value of the population
A point estimator that satisfies unbiasedness is called an “unbiased estimator
Sufficiency
When the sample size is fixed at n, information on samples other than the current point estimator is (theoretically) never sufficient for the point estimate
An estimator that satisfies sufficiency is called a “sufficient estimator.
Consistency
As the sample size is increased, the point estimate converges to the true value.
A point estimate that satisfies congruence is called a “congruence estimator
Minimum variance
A point estimator that satisfies unbiasedness and whose variance is the smallest.
The smaller the variance, the more difficult it is to expect the point estimator to depart from the true value.
A point estimator that satisfies minimum variance is called a “minimum variance unbiased estimator
If it is an effective estimator, it is a minimum variance unbiased estimator
From the Kramer-Rao inequality
General-purpose point estimation method
Method of Moments
Karl Pearson.
Point estimation of the required number of samples assuming that the moments of the sample coincide with the true moments of the population.
Simple to use, but unbiasedness and minimum variance cannot be expected
Maximum Likelihood Method
Fisher
Maximum likelihood is a universal statistic for large samples

112 Data Censoring

Problem
The number of days until failure of a certain mechanical product follows an exponential distribution where the population mean is an unknown value of μ.
Survey 10 of these products and estimate the value of μ
After 1000 days of investigation, 6 products fail and the number of days until failure is 409,544,590,725,782,948
The remaining 4 have not yet failed after 1000 days
Find the expected value of the number of days until failure
Value to be obtained
Expected number of days until failure = total number of days until failure / number of failed units
=(409+544+590+725+782+948+1000×4)/6=1333

113 Cramer-Rao inequality

The unbiased estimator with the smallest variance is the ideal point estimator
Law that theoretically answers how small the unbiased estimator can be made

114 Harold Kramer

Chapter 5: No Model is Correct – Statistics in the Computer Age -.

115 John Tukey.

The Ham-Sandwich Theorem
Multiple Comparison Tests
Tewkey’s Complement
Producing the Definition of Bit

116 Tukey time
117 Fast Fourier Transform

Proposed by Tukey and James Cooley
Fast way to compute the Discrete Fourier Transform
Discrete Fourier Transform
A kind of mother function
Convolution
Binary operation of superposing a function g on a function f while moving in parallel
The discrete Fourier transform of the sum of independent distributions is the product of the discrete Fourier transforms

118 Exploratory Data Analysis

Diagrams play an important role in preliminary data analysis
Tewkey’s “exploratory data analysis”
Captures the information that data hold from multiple perspectives

119 Robust Statistics

George Box’s concept of robust
How statistical criteria meet the needs of the experimenter
Sensitivity to variation in the particular element being tested is high
Low sensitivity to variation of the magnitude that actually occurs for external factors

120 Nonparametric

Does not make assumptions about the population distribution
Does not result in parameter guesses
Pearson’s chi-square test is nonparametric
Kolmorgov-Smirnov test is also nonparametric
Kramer-Von Mises test is also nonparametric

121 Jackknife Method

A method for approximate correction of bias

122 Bootstrap method
123 Efron’s dice
124 Prehistory of Bayesian statistics
125 Actuaries and Bayesian Statistics
126 Bayesian Statistics and Computers

BUGS
Bayesian Statistics with Gibbs Sampling
MCMC
Markov chain Monte Carlo

127 Model Correctness

Essentially none of the models are correct, but some are useful
In choosing a model, it is 144 more important to be useful than to be true

128 Akaike Information Criterion

AIC
Criteria for selecting a statistical model
Let L be the likelihood calculated using the estimates by that model and k be the number of parameters of that model
AIC=-2logL+2k
The smaller the AIC, the better.
Basically for large samples
Small sample is also available
Amount of information (Kullback-Lively information) is important
In addition to AIC, there are BIC, CIC, EIC, GIC, and PIC

129 Cross-validation method

Cross-validation validation
Two groups of usable data, one for model estimation, one for testing
Method to check for goodness-of-fit and other love
Look at the mean of the square of the difference between the predicted and actual values
Validation of model correctness
LOOCV (Leave-one-out cross validation)
Given n data, test only one and generate a model with the rest

130 Generalized Linear Model

Generalized Linear Model
One of the statistical models
In a linear model, a possible outcome (expressed as some quantity) is expressed as the sum of the effects of several factors.
Some factors are expressed in terms of quantity and some are not (expressed in terms of quality).
Link function

131 Generalized Linear Models and Statistical Tools

To find the answer in a linear model, we only need to solve a certain type of simultaneous linear equations
GLMM : Generalized Linear Mixed Model
Generalized Linear Mixed Model
GAM : Generalized Additive Model
Generalized Additive Model

132 Class Child Accident Rates and Generalized Linear Models
133 A Living Legend – Rao
134 Every Decision is a Statistic