Introduction to Probability Theory Reading Notes

Machine Learning Digital Transformation Artificial Intelligence Mathematics Algorithms and Data Structure Life Tips Navigation of this blog

Introduction to Probability Theory Reading Notes

From Introduction to Probability Theory

「It is said that the mathematical consideration of probability began with the correspondence between Pascal and Fermat regarding betting. Classical probability theory, based on the concept of combinations, made a great leap forward into “modern mathematics” based on set theory in the 20th century, thanks to Borel and Kolmogorov. This book is written to bridge the gap between classical and modern probability theory, and explains the meaning of abstract mathematical formulas in a way that is easy for readers to understand, while providing plenty of elementary concrete examples such as playing cards and throwing dice. This is an introductory book that allows students to relearn probability in depth, as they learn it in high school mathematics.」

thread of life

There are times in this world when you can repeat an experiment under nearly identical conditions and not always get the same results
draw a card to the eye when it comes out
roll the dice
Experiments where the results are governed by chance and vary in many ways
Individual things that could conceivably be the result of a thought
The “degree of likelihood” of a thing happening is sometimes an issue around a trial.
How to determine the probability of a matter?
Laplace’s approach
There are N root events, and they have exactly the same degree of likelihood of happening
If the number of convenient root events (root events such that if they occur, the matter will occur) for a matter is R
The probability of the matter is R/N
What is the structure of the theory that can be applied more widely in various fields?
Using “relative frequency
What method does one use to test whether one dice is perfect or not?
Actually repeat the roll over and over again to see how well it does what it says it will do
In general trials
Do it over and over again, and let the resulting sequence of root events be a1,a2,…. an.
If a good root event for a matter E appears r times, then the number r/n is called the relative frequency of matter E in the above sequence.
Now take any number of relative frequencies, r1/n1,.r2/n2,…,rk/nk,. rk/nk,… and observe the sequence
If those numbers tend to cluster near a certain value of α
we say that the probability of the matter E is α, and write p(E).
The probability of occurrence of a matter E is α.
E occurs at a rate of 1:α over a number of trials.
In one trial, E has a likelihood of about α when the total is 1
Applications of Probability Theory
Statistical Mathematics
The study of inferring the properties of a large population of subjects from a very small sample

1 Concept of Probability

1 Events

A trial is a kind of experiment
If an experiment is repeated under the same conditions
Even if an experiment is repeated under the same conditions, the results can vary
Such an experiment is called a trial.
The individual things that can be the result of a trial are called
The “root event” of the trial.
How many “things” are possible for one trial?
What is a “matter” in the first place?
Any “matter” can be reworded as a “group”.
Example
The matter “2 comes out” is
The result of the trial belongs to the group {2 of spades, 2 of clubs, 2 of diamonds)(-a3+b3) and 2 of hearts}.
The “no clubs” is the result of the trial in the group {2 of spades, 2 of clubs, 2 of diamonds
The result of a hand belongs to the group of all cards in spades, diamonds and hearts.
Every event can be rewritten as “the root event of the result of the trial belongs to this or that group”.
A group is a collection of root events corresponding to a single trial.
A “matter” can be expressed in the form “the root event of the result of a trial belongs to this event”.
There are as many matters as the number of groups formed by collecting the root events.
For simplicity, instead of “the result of the trial belongs to event E”, we say “event E occurs
Only one root event, all events, empty event

2 Probability

To organize the concept of probability in relation to the concept of events
Consider a trial and let E be an arbitrary event in the trial.
Repeat the trial many, many times and record the resulting root event and denote it as a1,a2,…,an. and let an be the resultant root event.
If something convenient for E appears r times in this root event, the number r/n is called the “relative frequency” of event E in the above sequence.
Repeat “repeat the trial, record the resulting root event on a piece of paper, and then calculate the dual frequency” many times.
The sequence of dual frequencies r1/n1,r2/n2,…,rk/nk,. ,rk/nk,… tends to densify to a certain number α, then
As a special case
The total number of root events is n
all of which have an equal chance of appearing.
Assume that there are a total of r number of root events that are convenient for event E
The dual frequency r/n of E is often close to R/N
Moreover, they tend to cluster near R/N
Therefore, it is permissible to set p(E) = R/N in this case.
For all events, every root event is convenient
If we create a sequence a1,a2,…,an of the results of the trial If we create a1,a2,…,an
The relative frequency of all events in this is n/n=1
This gives a probability of 1 for all events (Ω)
An empty event never occurs.
Any root event is bad for it.
If we create a sequence of trial outcomes, their relative frequency will always be 0/n=0
Therefore, the probability of an empty event (∅) is equal to 0

3 Concept of a set

An event is a group of several root events
In developing probability theory, the property of a group is problematic in a number of ways
A group of things is called a “set
An empty event is an “empty set
Members of a set are called the “origin” of the set
The root event that is convenient for an event is the source of that event
That a is a source of the set M
That a is not a source of M is
The set of ex-ons a, b, c,…. the set consisting of a, b, c,….
The set of the whole x such that the condition C(x) for thing x is satisfied is
The set consisting of the whole real number x such that (x-3)2<5 is
A is a subset of B if every source of set A also belongs to B
B encloses A and A encloses B.
The empty set is a subset of any set (including itself)
Sets A and B are said to be identical if their originals are completely overlapping

4 Set Operations

The set A, B,… The set A, B,… is called the union set of A, B,…. is called the union set of
A set A,B,… is called the common part of A,B,. is called the common part of A,B,…
When all the elements of set A that belong to set B are subtracted from set A, the remaining set is called the difference between A and B.
When we consider
Sometimes we fix a set and consider only its subsets
When we fix a set Ω and consider only its subsets, the set of the form Ω-A is called the coset of A.

5 Stochastic events

Even when the same trial is performed, the direction of interest in the resulting root event may be different.
Example
Without playing cards
Only the mark, not the number of the result card, is of interest.
Conversely, only the number, not the mark, is of interest.
When performing a trial, the aspect of the root event that is of interest
When a trial is being made, the aspect of the root event to which the interest is directed is the indicator of the trial.
Example: when the indicator of a playing card is the mark
“the whole of a heart”, “the whole of a spade”, “the whole of a red card” are probability events
Whole of 2″, “Whole of J,Q,K”, etc. are not
All events are “the whole of red and black cards
Empty event is “the whole of red and black cards
The phrase “an event that can be described by a concept related only to the indicator” is ambiguous.
A little more clarification is needed.
What does it mean to describe an event?
Whether or not an event can be described by a concept related only to an indicator
If the indicators are I, J,… then we write I(a) as the value of the root event a, according to the indicator I.
Example
Whole heart = {x|I(x)=heart}
Entire red card = {x|I(x)=heart} or (I(x)=diamond)}
Example
When I is the mark in a trump removal trial
Among the conditions C(x) for the root event x, the following four are the basic temporal requirements for I
I(x) = spades, I(x) = clubs, I(x) = diamonds, I(x) = hearts
If one of the indicators in a trial is I, then a condition on the root event x of the form I(x) = α is called
Definition
In one trial
Let its indicators be I, J,… .
Then the condition C(x) on the root event x is of the form I,J,…. The finite number of basic conditions on
… or … , … and … , … If C(x) can be obtained as a combination of the three words
C(x) can be expressed in terms of concepts related only to I, J,… C(x) can be expressed in terms of concepts related only to I, J,…
In general, the following formula holds
(1) {x|C(x) or D(x)}={x|C(x)}⋃{x|D(x)} (2) {x|C(x) and D(x)}={x|C(x)}⋂{x|D(x)} (3) {not x|C(x)}={x|C(x)}c
Using (1), (2), and (3), we find that
(1)’ If A and B are establishment events, then A⋃B is also an establishment event (2)’ If A and B are establishment events, then A⋂B is also an establishment event (3)’ If A is an establishment event, then Ac is also an establishment event
Using the above, if we have several indicators in one trial
we can find all the establishment events for them.

6 Basic Properties of Probability

Once the indicators of a trial are determined
There is no need for an event that cannot be an established event
It is sufficient to consider probability only for stochastic events
0≤p(E)≤1, p(Ω)=1, p(∅)=0
If the product event E⋂F of two stochastic E,F is an empty event, then
Let E and F be two mutually exclusive stochastic events
Repeat the trial over and over again, and then add the resulting sequence of root events
Let r1 and r2 be the number of terms convenient for E and for F, respectively
Thus the relative frequency of E⋃F is
Thus the relative frequency of E⋃F densifies toward the above
If E⋂F=∅ then p(E⋃F)=p(E)+p(F)

7 Probability Space

To summarize what has been said so far
(1) Each possible outcome of a trial is called the root event of the trial.
This whole set is denoted by Ω.
(2) A subset of Ω is said to be an event
(3) Some of the events are named establishment events, and they satisfy the following properties
(1) Ω (all events) and ∅ (empty events) are stochastic events.
(2) If E and F are stochastic events, then E⋃F and E⋂F are also stochastic events.
(3) If E is an established event, then Ec is also a stochastic event
(4) Each stochastic event E corresponds to a probability p(E), satisfying the following conditions
(a) 0≤p(E)≤1
(b) p(Ω)=1 and p(∅)=0
(c) p(E⋃F)=p(E)+p(F) if E⋂F=∅
Basis of Argument
(1) Given one set Ω, its source is called the root event
(2) A subset of Ω is said to be an event
(3) A given event is designated as a stochastic event
(4) Each stochastic event E is assigned a number p(E), called its probability, which satisfies (a), (b), and (c).
In general, the set Ω such that it satisfies the above “basis of discussion” is
6 Axioms of Probability Space

8 Examples of probability spaces

Example 1: Cards in a deck of cards
Let Ω denote the 52 cards, E1 the whole spades, E2 the whole clubs, E3 the whole diamonds, E4 the whole hearts, and
Discrete type probability space
Finite discrete type
Infinite discrete type
A probability space that gives an appropriate probability for a random event
How to give its probability
(A)
Take a finite or infinite sequence of real numbers
and associate with each of them a real number pn such that
For any random event E, find the number of finite or infinite sequences of numbers belonging to it that are αi,αj,… , αk,… if it is.
Such a probability distribution is called discrete type
For example.
0→e-α, 1→αe-α, … , n→αn/n!xe-α
e-α+αe-α+… +αn/n!xe-α=e-α(1+α+… +αn/n!+…) =e-αe-α=1
(B)
Take an integrable continuous function f(x) as above and
and set it as above
Such a thing is called a real probability distribution of continuous type such that the probability density is f(x).
Example 1
Figure 1
Example 2: (m,σ)-Gaussian distribution
Figure 2: (m,σ)-Gaussian distribution

2 Properties of Probability

1 Event Relationships

In preparation, we will discuss various relationships that can be established between general events.
Fixing an arbitrary probability space Ω
Theorem 1 Let E, F, and G be arbitrary events, then the following formula holds (1) E⋃F=F⋃E (2) (E⋃F)⋃G=E⋃(F⋃G) (3) E⊆E⋃F and F⊆E⋃F (4) If E⊆F then E⋃F=F (5) If E⋃F=F then E⊆F (6) E⋃∅=E (7) E⋃Ω=Ω
Theorem 2. If E, F and G are arbitrary events, then the following formula holds (1) E⋂F=F⋂E (2) (E⋂F)⋂G=E⋂(F⋂G) (3) E⋂F⊆E, E⋂F⊆F (4) If E⊆F then E⋂F=E (5) If E⋂F=E then E⊆F (6) E⋂∅=∅ (7) E⋂Ω=E
Theorem 3 The following formula holds (1) (E1⋃E2⋃…) ⋂G=(E1⋂G)⋃(E2⋂G)⋃… (2) (E1⋂E2⋂…) ⋃G=(E1⋃G)⋂(E2⋃G)⋂…
Theorem 4 (1) If E⊆F then Ec⊇Fc (2) Ecc=E (3) Ωc=∅ (4) ∅c=Ω (5) E⋃Ec=Ω (6) E⋂Ec=∅
Theorem 5 (1) (E1⋃E2⋃…) c=E1c⋂E2c⋂… (2) (E1⋂E2⋂…) c=E1c⋃E2c⋃…

2 Basic Properties of Probability

State the basic properties of probability
Definition 1 A finite collection of probabilistic {E1,E2,… En} are generally referred to as families or simply families of random events.
Definition 2 A family {E1,E2,…,En} of stochastic events is a family of stochastic events that belongs to the family {E1,E2,…,En}. En}, if any two of the probabilistic events belonging to it are sleeves of each other, the family itself is said to be exclusive.
Theorem 1 If the family {E1,E2,…,En} of stochastic events En} is an exhaustion, then p(E1⋃E2⋃… ⋃En)=p(E1)+p(E2)+… +p(En)
Definition 3 Let E be a random event and the families {E1,E2,… ,En} be the backreaction if E=E1⋃E2⋃… ⋃En is established then the family {E1,E2,. ,En} is said to be a division of E. Also in this case, E is said to be a division of E1,E2,. En} is a division of E. Also, in this case, we may use the term E is divided into E1,E2,…,En.
Definition 4 For one event E, two partitions {E1,E2,. En},{F1,F2,…,Fn}. Fn}, any Ei is a subset of some Fj, then the previous partition is a “subdivision” of the later partition.
Figure 1.
Theorem 2 (Additivity Theorem) p(E⋃F)+p(E⋂F)=p(E)+p(F)

3 Independence of two events

Explain the concept of independence of two events
Example
From a set of playing cards, exclude the following cards
(1) Any K
(2) A through 6 of spades
(3) 7 to Q of hearts
The remaining 36 cards are used as the basic cards, and one card is removed at random from them, well cut.
The probability of event F consisting of the whole of the red cards is
The total number of basic cards is 36
The number of cards belonging to F is 18
p(F)=18/36=1/2
Repeat this trial over and over again, and let the resulting root event examples be a1,a2,…. ,an.
Let b1,b2,…,br be all the cards of, say, A that appear in the card. Let b1,b2,…,br
All of these bi are cards of A.
There are three possibilities for their symbols: hearts, diamonds, and clubs.
There is no preference in the degree to which they appear.
Red cards appear approximately 2 out of every 3 times, and black cards 1 out of every 3 times.
This points to the following fact
Assuming that I do not know the result.
Whether it was a red card or not has exactly the same degree of addition to scale
If I am informed that the result card is A
The probability that it was a red card is twice as strong as the probability that it was not
If we are informed that it was a Q
It could be a spade, a diamond, or a club, so the likelihood that it was a red card is half as great as the likelihood that it was not.
the likelihood that it was a red card is reduced by half.
Notification that the resulting number is an A or a Q
The mark has some effect on the likelihood of a river match.
In contrast, in a normal trial of the trump crisis, where (1), (2), and (3) are not ruled out
prior notice has no effect.
Given a single trial and its indicators
We focus on two random events E and F
E and F correspond to the whole A card and the whole red card in the above example, respectively
p(E) is assumed to be non-zero
(A) Repeat the trial many times, and let the resulting sequence of root events be a1,a2,…. Let an be
(B) Extract the source of E appearing in (A) and let b1,b2,…,br Let br
r/n is the dual frequency of E in (A), which tends to cluster near the probability p(E) of E
(C) Extract the sources belonging to F from (B) and denote them as c1,c2,…,cs. Let c1,c2,…,cs
(C) is a sequence created by extracting the root events belonging to E⋂F from (A)
Thus, s/n corresponds to the relative frequency of E⋂F in (A)
which is dense near p(E⋂F).
In this case, let s/r be the dual frequency of F with respect to E in (A).
The probabilities are compact near the above values.
From the above
If someone else performed this trial and the result is not known at all
and the result is not known at all
the probability that it belongs to F is
p(F)
If we are informed that the result belongs to E
The probability that it belongs to F is
probability
If above, then there is a change in the likelihood that the result belongs to F between the time when no information about the result is known and the time when it is known to belong to E.
If above, there is no such change
F is independent of E
Definitions for random events in general probability space
Definition 1 When E and F are random events and p(E)>0, the number p(E⋂FF)/p(E) is called the “relative probability” of F with respect to E or the conditional probability of F when E occurs. pE(F) or p(F|E)
Definition 2.1 When E and F are random events and p(E)>0, F is independent of E if p(F)=pE(F).
Definition 2.2 If P(E)=0, then any F is independent of E. However, the relative probability pE(F) is usually not supplemented with a definition.
Theorem 1 A necessary and sufficient condition for event F to be independent of E is that p(E⋂F)~p(E)p(F) holds.
If F is independent of E, then E and F are also independent

4 Independence of many events

Not only the family {E,F} consisting of two events
Extend to the general family {E1,E2,…,En} consisting of a finite number of stochastic events. En} of a finite number of random events
Definition 1 For a family {E1,E2,…,En} En}, a family confidence is said to be independent if any of its random events is independent of any other product of the remaining random events. E2 is independent of any product event among E1, E3,…,En E2 is independent of any product event among E1,E3,…,En In general, Ei is independent of any product event among E1,…,Ei-1,Ei+,Ei+,Ei+. Ei-1,Ei+1,…,En En} are independent, then the family {E1,E2,…,En} is independent. En} are independent.
Theorem 1 The families {E1,E2,…,En} are independent. En} are independent, the necessary and sufficient condition for the independence of the families {E1,E2,…,En} is that the events Ei(1),Ei(2),…,Ei(2),…,Ei(3),…,Ei(4) are independent. For Ei(r), p(Ei(1)⋂Ei(2)⋂… ⋂Ei(r))=p(Ei(1))p(Ei(2))…. . p(Ei(r)) is that p(Ei(r)) holds.
In general, for a family of random events {E1,E2,…. En} for the family {E1,E2,…,En}.
The following families {F1,F2,. Fn} are said to be “homogeneous” with it
Any Fi(i=1,2,…. n) is equal to Ei, or Eic if there is no difference.
For example, {E1c,E2,E3c} and {E1,E2c,E3c} are of the same family as {E1,E2,E3}
Theorem 2 The families {E1,E2,. The necessary and sufficient condition for a family {E1,E2,…,En} to be poison fishing is that for any family {F1,F2,…,Fn} of the same lineage as it, p Fn} for any family {F1⋂F2⋂…. ⋂Fn)=p(F1)p(F2)…. .p(Fn) is that p(F1)p(F2)….p(Fn) holds.

5 Properties of relative probabilities

In the utility of the case where F is not necessarily independent of E
Revenge so far
Given a single trial and its indices
Repeat that trial over and over again
Let the sequence of root events of the outcome be a1,a2,… Let an
(A)
From this, extract those belonging to event E and denote them as b1,b2,…,br br, and
(B) Let b1,b2,…,br be the events belonging to event E, and c1,c2,…,cs be the events belonging to event F. cs.
At this point, let the number s/r be the relative frequency of F with respect to E in (A).
The relative probability pE(F) is
It just agrees with the above
The above considerations are based on the fact that, given a concrete trial and an index, it is possible to
The above considerations can be practically applied to the determination of the probability of a random event in a specific trial and given an index.
Examples
There are three jars, Example A, B, and C.
They contain red and blue balls in the proportions m1:n1, m2:n1, and m3:n2, respectively.
Select one of the urns at random, and then remove one of the balls at random from the selected urn and examine its color.
The root event is a pair (X,x) of the chosen acupuncture point X and the color x of the ball extracted from it
There are the following 6 combinations
The root event itself is used as an indicator.
The event {(B,red)} is the product of the root event E (i.e. {(B,red), (B,white)}) such that the chosen urn X is B and the root event F (i.e. {(A,red), (B,red), (C,red)}) such that the color of the chosen ball is red, E⋂F
p({B,red})=p(E⋂F)=P(E)pE(F)
The probability of any jar being chosen is exactly equal.
The columns created by extracting the ones belonging to E from the columns resulting from repeated trials are
If s/r is made by taking s as the number of those whose xi is red, they will tend to cluster around the above.
Therefore, it is reasonable to assume the above.
Therefore, p({B, red)} is the above
Extension of the most basic property of relative probability (above)
Theorem 1 (Multiplication Theorem) p(E1⋂E2⋂…. If ⋂En-1)>0, then the following formula holds p(E1⋂E2⋂… ⋂En)=p(E1)pE1(E2)PE1⋂E2(E3)…. .pE1⋂E2⋂…. ⋂En-1(En)
Example 2
Consider a trial in which a single person is extracted from a certain group of people at random and asked about gender, home state, and year of birth.
The root event is a combination of three items (a,b,c)
Example (male, Tokyo, Sho 10)
Consider the probability of a root event of the form (male, b, c)
The entire root event of the form (man, b, c) is E1
E2 for the entire root event of the form (a, Tokyo, c)
Let E3 be the whole of the root event of the form (a, b, Sho10)
p({(man, Tokyo, Sho10)})=p(E1⋂E2⋂E3)=p(E1)pE1(E2)pE1⋂E2(E3)
Another important property of relative probability
Theorem 2 (Complete Probability Theorem) If the family {E1,E2,… En} is a partition of (1)Ω. That is, if i≠j, then Ei⋂Ej=∅ and all Ei plus Ω (2)p(Ei)>0(I=1,2,. ,n), then for any event E, p(E)=p(E1)pE1(E)+p(E2)pE2(E)+…. +pEn)pEn(E) is established.
Example 3
Let F be the totality of root events (X,x) such that the color x of the chosen ball is red
Let E1, E2, and E3 be the totality of root events (X,x) such that the chosen crucibles are A, B, and C, respectively.
p(E1)=p(E2)=p(E3)=1/3
{E1,E2,E3} is a partition of Ω
From the theorem of perfect probability, the theorem derived
Theorem 3 (Bayes’ Theorem) The family {E1,E2,…,En} is the same as Theorem 2. If {E1,En} satisfies the same conditions as Theorem 2, then for any random event E such that p(E)>0, PE(Ei)=p(Ei)pEi(E)/(p(E1)pE1(E)+… +p(En)pEn(E)] =p(Ei)pEi(E)/∑p(Ei)pEi(E)
Example 4
Example 3 with Bayes’ Theorem
When someone else performs the trial, the resulting urn is not known, but when the color of the ball is known, which was the urn, a calculation of the degree of Kano Ise
Bayes is used to guess the unknown using what is actually known

3 Multiple Attempts

1 Double trial

Repeat one trial twice and record the two results
Root event
The combination of the result a of the first trial and the result b of the second trial (a,b)
Treat (a,b) and (b,a) as different root events because the order in which the results appeared is different
Example
Repeat the trial without playing cards twice.
(A of spades, A of spades), (A of spades, 2 of spades), . , (9 of diamonds, J of clubs), lulu
Two dice throws.
(1,1),(1,2),(2,1)…
Consider two repetitions of one trial together as one trial.
Any combination of the root events of the original trial (a,b)
The whole of it is Ω(2)
The (2) on the shoulder implies a double trial
In (a,b), a is the first component and b is the second component
Example
Toss a coin and see how far it is tilted from north
The combination of the two angle measurements a,b (a,b)
The square itself is Ω(2)
In a double trial, it is common sense to adopt the same index as the original
Example
Trial without playing cards
Assuming the indicator is the mark of the root event
The root event (a,b) of the double trial is also marked, but not as a number.
The mark of the root event of the original trial is I
Let I1 and I2 be the marks of the first and second components of the root event of the double trial, respectively.
I1((alb))=I((alb)’s first component)=I(a) I2((alb))=I((alb)’s second component)=I(b)
Example
In general, when the indices of a single trial are I, J,. then the following indices I1,I2,J1,J2,… are obtained in the double construction are obtained as follows
What events are stochastic events in a double trial (Ω(2))?
The simplest event in a double trial is
Strict definition
Consider two events in a single trial and let them be E and F, respectively
Of the root events (a,b) in a double trial
The whole of such that the first component a belongs to E and the second component b belongs to F
(E,F)={(a,b)|a∈E, b∈F
Example
Trial of playing cards
Let E be the whole of a heart
Let F be the whole of 2.
(E,F) is the whole set of pairs of cards of hearts and 2’s
If the above indicator is a mark, then the first event (E,F) is not a probability event
(E,F) is a mark and a number because
If (E’,F’) is a club and a heart, it is a probability event
If E,F is an establishment event for indices I,J,. If (E’,F’) is an establishment event for the indices I, J,… of one trial, then
The event (E,F) with them as edges is a probability event in the double trial
A probability event in a double trial is
The probability event in a double trial is formed by combining a finite number of random events with the three operations ⋃, ⋂, and c.

2 Standard form of stochastic events

A convenient way to represent probability events in a double trial in an easy-to-understand manner.
In a double trial, the family {(E1,F1),(E2,F2),… (En,Fn)} in a double trial is called a probability event of type U in general.
In particular, considering the case n=1, the direction probability event (E,F) is of type U.
Furthermore, the total event Ω(2) and the empty event ∅ are both of type U since they can be expressed as Ω(2) = (Ω, Ω) and ∅ = (∅, ∅), respectively.
Theorem 1(2). Any random event in Ω(2) is U-shaped
Definition 1(2). A stochastic event in Ω(2) is a sum of two stochastic events (E1,F1)⋃(E2,F2)⋃…. ⋃(En,Fn), then the above equation is called the “standard form” of the stochastic event.
Note 1: There is not just one standard form.
Auxiliary Theorem 1(2). If E1, E2, F1, F2 are arbitrary events (not necessarily random events) in Ω, then (E1, F1)⋂(E2, F2)=(E11⋂E2, F1⋂F2) holds
Auxiliary Theorem 2(2). Let E and F be arbitrary events in Ω. Then (E,F)c=(Ec,Ω)⋃(Ω,Fc)

3 Multiple trials

Extend to “n” multiple trials
A single trial is performed n times in a row, and the n result pairs (a1,a2,…. an) of the n results are recorded.
Denote the entirety of the root event by Ω(n)
ai is called the i-th component of the n-fold trial sequence
E1,E2,… Let En be any n events in the original trial
Let a1∈E1,a2∈E2,… Let a1∈E1,a2∈E2,…,En be the root event (a1,a2,…,an) such that a1∈E1,a2∈E2,…,an∈En. an) such that the whole of the root events (a1,a2,…,an) is
E1,E2,… If En are all establishment events, then (E1,E2,… En) are the establishment events in n-fold trials.
An event that is the sum of a finite number of random events is called a “U-shaped random event.
Auxiliary Theorem 1(n). E1,E2,…,En;F1,F2 ,En;F1,F2,… Let Fn be any event in Ω. Then (E1,E2,… ,En)⋂(F1,F2,. ,Fn)=(E1⋂F1,E2⋂F2,. En⋂Fn) holds.
Auxiliary Theorem 2(n). Let E1,E2,… Let E1,E2,…,En be arbitrary events in Ω. Then (E1,E2,… ,En)c=(E1c,Ω,… ,Ω)⋃(Ω,E2c,Ω,… ,Ω)⋃… ⋃(Ω,. ,Ω,Enc)
Theorem 1(n). Any stochastic event in Ω(n) is of type U
Definition 1(n). Let (E1,E2,…,En)∂∂∂(E1,E2,…,En) be a sum of random events of type U. ,En)⋃(F1,F2,… ,Fn)⋃… ⋃(G1,G2,…,Gn) Gn), this is called the “standard form” of the established event.
Note: A stochastic event (E1,E2,…,En) is a probabilistic event. En) is, of course, “E1 happens the first time, E2 happens the second time, …, En happens the nth time. En happens the nth time.
Example 1
(E,Ω,…,Ω) corresponds to the fact that (E,Ω,…,Ω) occurs once. Ω) represents that E happens the first time, Ω happens the second time and so on, and Ω is always angry, so E simply happens the first time.
Similarly (Ω,E,Ω,…. . Ω) indicates that E happens the second time
Example 2
In a trump-less trial where the mark is an indicator
If E is a whole heart, then
(U-shaped) stochastic event
Corresponds to the matter that at least one heart will occur between 1 and n times

4 Calculation of probabilities

Consider what the above probabilities will be.
First, consider a double trial
First, find the probability of the one random event (E,F)
We will proceed by dividing the steps into 4 parts
(1)
By auxiliary theorem 1(2), (E,F)=(E,Ω)⋂(Ω,F)
(E,Ω) is the whole of (a,b) such that the first result a belongs to E
(Ω,F) is the whole of (a,b) such that the second result b belongs to F
(2)
Repeat the double trial m times and the result is
(1) (a1,b1),(a2,b2),… (am,bm), then
(2) a1,a2,… am is considered to be the sequence of the results of m iterations of the original trial
If r is the number of items belonging to (E,Ω) in (1), then r is also the number of items belonging to E in (2).
Therefore, the relative frequency r/m in (1) is equal to the relative frequency of E in (2)
Therefore, the dual frequency of E tends to cluster around p(E)
Therefore, the dual frequency of (E,Ω) must have the same tendency
Hence, p((E,Ω))=p(E)
(3)
Similarly to (2), p((Ω,F))=p(F)
(4)
P((E,Ω)⋂(Ω,F))=p((E,Ω))p((Ω,F))
Principle (2) p((E,F))=p((E,Ω)⋂(Ω,F)) =p((E,Ω))p((Ω,F)) =p(E)p(F)) i.e. the probability of (E,F) is the product of the probabilities of E and F
How can we compute the probabilities of general random events?
Take any stochastic event A, and let its standard form be (E1,F1)⋃(E2,F2)⋃…. ⋃(Em,Fm).
Suppose that the families {(E1,F1),(E2,F2),… ,(Em,Fm)} if the clan {(Em,Fm),(Em,Fm)} is an abolished clan.
Any stochastic event in Theorem 2(2) Ω(2) can be partitioned into a finite number of square-established events More precisely, among its standard forms, the clan {(E1,F1),(E2,F2),… (Em,Fm)} exists such that it is an abolished clan.
For every establishment event, it becomes possible to compute the probability
Application to general Ω(n)
Principle(n)
p(E1,E2,…. ,En))=p(E1)p(E2)…. . p(En) is confirmed and the following theorem is further proved
Theorem 2(n)
Every established event in Ω(n) can be partitioned into a square probability event, or more precisely, its standard form (E1,E2,…,En)∂∂∂(E1,E2,…,En). ,En)⋃(F1,F2,… ,Fn)⋃… ⋃(G1,G2,… ,Gn) of the family {(E1,E2,…,En),(F1,F2,…,Fn)⋃…. ,En),(F1,F2,…,Fn),… ,Fn),… ,(G1,G2,…,Gn)} ,Gn)} exist such that they are abolished clans.
Thus Ω(n) becomes a single probability space
Example 3

5 Multiple Probability Spaces

For general probability spaces without concrete background, we can also
Define n-weighted probability spaces
For any probability space Ω
we write Ω(n) as the whole of its n original pairs (a1,a2,…,an). an) is denoted by Ω(n).
Its sources are called root events
Its subset is called the event
Let E1,E2,…,En be the events of Ω. En be the events of Ω.
Let {(a1,a2,…,an)|a1∈E1,a2,…,En be events in Ω(n). ,an)|a1∈E1,a2∈E2,… ,an∈En}
Ei(I=1,2,… If Ei(I=1,2,…,n) are all stochastic events, then this is called a directional stochastic event.
Definition 1 An event in Ω(n) such that it is the sum of a finite number of tetrahedral probability events is called an established event in Ω(n).
Theorem 1 Probabilistic events satisfy axioms (1), (2), and (3).
Theorem 2 A stochastic event can be partitioned into a finite number of random events, i.e., (E1(1),E2(1),…,En(1)) ,En(1))⋃(E1(2),E2(2),. ,En(2))⋃… ⋃(E1(r),E2(r),… En(r)), and the family {(E1(1),E2(1),… En(1)),(E1(2),E2(2),…,En(2)),… ,En(2)),… (E1(r),E2(r),…,En(r))} En(r))} can be expressed as if it were yes.
Definition 2 When a stochastic event A is written in the form as in Theorem 2 p(A)=p(E1(1))p(E2(1))…. .p(En(1))+…. +p(E1(r))p(E2(r))…. .p(En(r))) =∑p(E1(i)p(E2(i)). .p(En(i)) is called the probability of A.
When an event A is represented as in Theorem 2, there is more than one way to represent it
Which way to compute p(A) is the same?
When there is concrete thinking in the background, there is only one answer
Not if there is no concrete trial in the background
Theorem 3
Definition 3 The probability space Ω8n) obtained by the above is called the n-fold probability space of Ω

6 Infinite multiple trials

The concept of “infinite multiple trials” is used when infinite repetitions of a single trial are lumped together.
Suppose we take a single trial and fix an indicator
The sequence of results (a1,a2,…,an,…) of an infinite number of repetitions of that trial is called the “infinite multiple trial”. an,…) is recorded
Denote the whole of the root event by Ω(∞)
The indices are taken to be the same as in the original trial.
Let I, J,… be the indices in the original trial. Then
Note: An infinite number of dice rolls is practically impossible.
What are the probability events in this trial?
As in the case of N-multiple trials, the events E1,E2,…,En,…,En,…,En,…,En,…,En,…. ,En,… for each event E1,E2,…,En,…
an∈En(n=1,2,…) such that the source (a1,a2,…,an,…) of Ω(∞) ,an,…) and consider the event consisting of the whole of
E1,E2,…. En,… is a stochastic event, then the above should be a stochastic event
It is not.
Points different from Ω(n)
What will be the stochastic events of Ω(∞)
The probabilistic events (E1,E2,…,En,…) are the same as the stochastic events (E1,E2,…,En,…) ,En,…) such that all Ei from a certain number forward are equal to Ω.
In this case, the following things hold
(1)
(E1,E2,… ,En,Ω,Ω…) be a cylinder event, then
E1,E2,… En are the establishing events of Ω, then
(E1,E2,… En,Ω,Ω,…) is a probability event of Ω(∞)
The probability event obtained in this way is called a “cylinder” with E1,E2,. picture n is called a “cylinder probability event” with n as an edge.
The tube probability field substitutes for the direction probability event in Ω(n)
(2)
Every establishment event in Ω(∞) is a
obtained by combining finite number of tube probability events with three operations ⋃, ⋂, c
Definition A probability event of type U is a sum event of a finite number of tube probability events
Auxiliary Theorem 1(∞) (E1,E2,… ,En,…) ⋂(F1,F2,…,Fn,…) ,Fn,…) =(E1⋂F1,E2⋂F2,… ,En⋂Fn,…)
Auxiliary Theorem 2(∞) (E1,E2,… ,En,Ω,Ω,…) c=(E1c,Ω,Ω,…) ⋃(Ω,E1c,Ω,Ω,…) ⋃… ⋃(Ω,… ,Ω,Enc,Ω,Ω,…)
Theorem 1(∞) Every stochastic event of Ω(∞) is of type U
Definition 1(∞) A stochastic event in Ω(∞) is called the “standard form” of a stochastic event if it is written as the sum of a finite number of tube stochastic events.
Principle (∞) p((E1,E2,. En,Ω,Ω,…)) =p(E1)p(E2). .p(En)
Theorem 2(∞) The stochastic event of Ω(∞) is partitioned into a finite number of tube-established events Kanrinike, that is (E1(1),. ,En(1)(1),Ω,…) ⋃(E1(2),… ,En(2)(2),Ω,…) ⋃… ⋃(E1(r),… ,En(r)(r),Ω,…) and also the family {(E1(1),…,En(1)(1)(1),… ,En(1)(1),Ω,…) ,(E1(2),… ,En(2)(2),Ω,…) ,… ,(E1(r),… ,En(r)(r),Ω,…)} can be expressed as if it were an abolished domain
Then, given any stochastic event A in Ω(∞), we can express it in the form of Theorem 2(∞), and by applying Principle (∞), we obtain the above.
From the above, Ω(∞) is a single probability space.
Theorem 3

4 Stochastic variables

1 stochastic variable

1 Stochastic variable
In a nutshell.
A kind of rule that gives one value to each root event in a trial
Example
Trial dice throw
1 point for an even number of eyes, -1 point for an odd number of eyes
The rule is a single random variable
If the following conditions are satisfied, then the rule for giving the value of
If we divide Ω into a suitably finite number of random events
(1)
Root events belonging to the same establishment event are given the same value
(2)
Root events belonging to different establishment events are given different values from each other
In other words
A random variable on Ω is
rule that gives one value to each root event.
It is nothing but the one that satisfies the following conditions
(1)
The total number of distinct values is finite.
(2)
The total number of root events for which any value of α is given constitutes
constitute a single random event.
A random variable is represented by a Latin capital letter as above
The value given to the root event a by the random variable X is
The value of X at a
If X(a)=α, then X takes the value α at a
Example 1
Consider a trial without playing cards with the mark as an indicator
Let Ω be the corresponding probability space.
If we make the rule X as above, this is one random variable in Ω
Y as above is not a random variable
The entirety of cards above 6 and below 5 is not a random event in this probability space, because the entirety of cards below 5 is not a random event
Example 2
A lottery box contains one first prize, two second prizes, four third prizes, and eight fourth prizes.
Consider the trial “Rummage through this box, draw 1-y lots, and check their grades.
The root event is the grade of the drawn lots.
The indicator is also the grade
The probabilities are
Assume that the prizes are: 1st prize 10,000 yen, 2nd prize 5,000 yen, 3rd prize 1,000 yen, and 4th prize 500 yen.
Example 3
A single person is selected from a certain group of people in a random draw.
The root event is all the members of the group.
The index is the root event itself.
Let X be the assignment of each root event to its height
Example 4
If we decide to assign the same person value α to every root event in the probability space Ω
Such a random variable is called a “constant value random variable
Example 5
Let E be a random event on the probability space Ω and let the above
The random variable X thus obtained is called the “defined random variable” of event E, denoted by “XE
Let X be a random variable on the probability space Ω
In this case, for any real number α
the events {a|X(a)=α}, {a|X(a)<α}, and {a|X(a)>α}, respectively.
The following theorem holds for these
Theorem 1 {X=α}, {X<α}, and {X>α} are all stochastic events.
Note 3 If the possible different values of X are α1,α1…,αn and αn, then the family {{X=α1},{X=α2},…,{X=αn},…,{X=αn}. {X=αn}} is a partition of Ω.
Theorem 2 Let X be a random variable in the probability space Ω, and let α1, α2,…,αn be its possible different values. Let p({X=αi})=pi (I=1,2,…,n). ,n), the following two things hold (!) 0≤pi≤1 (I=1,2,…,n) ,n) (2) ∑pi=1
Definition Let Ω be a probability space and X be a random variable on it. Xi(n)(a1,a2,…,an) in Ai is called the “ith copy” of X on Ω(n), and Xi(n)(a1,a2,…,ai,…,an) denoted by Xi is called the “ith copy” of X on Ω(n). ,ai,… ,an)=X(ai) (I=1,2,. ,n) The following theorem holds for this concept
Theorem 3 The copy Xi(n) of X is a random variable
Theorem 4 The distribution of Xi(n) is the same as that of X
Example 9

2 Operations on random variables

We now describe a method for creating new random variables from already known random variables
Theorem 1 Let Ω be a probability space and X,Y be two random variables on it. Given the rule that for each element a of Ω we have X(a)+Y(a), this is one random variable.
Definition 1 The random variable obtained by the method of Theorem 1 from the random variables X and Y is called the sum of X and Y and is written X+Y, i.e., (X+Y)(a)=X(a)+Y(b).
Example 1
Consider a dice throwing trial.
Let Ω be the entirety of the root event
Adopt the root event itself as an indicator
Let X and Y be random variables as follows
The sum of X and Y, X+Y, is a random variable as follows
Example 2
Let E be a random event on an arbitrary probability space Ω.
Let X1(n),X2(n),…,Xn(n) be copies on Ω(n) with X=XE. Xn(n).
Then, by the above, (X1(n)+X2(n)+… +Xn(n))(a1,a2,… an) is
a1,a2,… an, is equal to the number of items belonging to E
Hence.
If there is a concrete trial in the background, this is the result of n trials (a1,a2,…,an) in the result of n trials (a1,a2,…,an). an) in the results of n trials.
Distribution S(n)=X1(n)+X2(n)+… +Xn(n) (X=XE) for the distribution
Possible values of S(n) are 0,1,2,. .n (n+1) values.
If
S(n)(a1,a2,…. ,an)=k (0≤k≤n), then
the number of ai belonging to E is equal to k
In this case
(A1,a2,… F1,F2,…,Fn) clearly belong to the following random events (F1,F2,…,Fn). Fn) such that
(1) Fi(1)=Fi(2)=… =Fi(k)=E
(2) j≠I(1), j≠I(2),… Fj=Ec if j≠I(k)
The total number of the above way stochastic events is
1,2,… from the n numbers that become 1,2,…,n.
The kth number i(1),i(2),…,i(k) The total number of combinations that pick out the kth number i(1),i(2),…,i(k) from the n numbers 1,2,…,n is the number of days
This type of directional probability event is an exhaustion if it is collected as above
Example: Assuming n=3,k=2
Also
Therefore
If p(E)=p and 1-p=q, the distribution of S(n) is
Such a real probability distribution is called (n,p)-binomial distribution
Example 3
In a given human population
Suppose that exactly 1% of the population is left-handed.
In this case, a trial is conducted to find out whether one person is left-handed or not by randomly selecting one person
The two root events are left-handedness and right-handedness
Let left-handedness be 1 and right-handedness be 0
Let E={1} and X=XE
Now, suppose that the above trial is repeated 200 times.
The result a1,a2,… Out of a200,a200
The number of left-handers, i.e., 1, is the random variable X1(200)+X2(200)+… +X200(200) in (a1,a2,…,a200) … + X200(200) equal to the value in (a1,a2,…,a200).
From this, using the result of this section, we can calculate the probability that it takes various values
Theorem 2 Let Ω be a probability space and X and Y be two random variables on Ω. Then, for each element a of Ω, if we consider the rule that gives the values X(a)-Y(a) and X(a)Y(a), then these are also random variables on Ω.
Definition 2 Given random variables X and Y, the random variables obtained by the method of Theorem 2 are called the difference and product of X and Y, denoted by X-Y and XY, respectively, i.e., (X-Y)(a)=X(a)-Y(a) and (XY)(a)=X(a)Y(a)
Example 4
For X and Y in Example 1
Example 5
Random variable S(n)=X1(n)+X2(n)+… in Example 2 +Xn(n) and the product of 1/n (above).
In this case, S0(n)(a1,a2,…. an) i.e., the above is
a1,a2,…,an E among a1,a2,…,an is equal to the number r of those belonging to E divided by n
If p(E)=p and 1-p=q, the distribution of S0(n) is as above

3 Independence of stochastic variables

Probability events E and F are independent if the notification of the occurrence of E as a result of the trial does not cause any change in the degree of probability of the occurrence of F
Probability variables X and Y, when being informed of the value of X in the root event that occurred as a result of the trial does not change the degree of likelihood of each possible value of Y.
Definition 1 Let Ω be a probability space and X and Y be random variables on it. m) and βj(j=1,2,…,n), respectively. In this case, the families {{X=αi}, {Y=βj}} are i,j(I=1,2,…,m;j=1,2,…,n). ,m;j=1,2,… n) are independent of each other, then X and Y are independent of each other.
Definition 2 Let Ω be a probability space and let X1,X2,. Xn be the random variables on it, and let αi(1)(I=1,2,…,m(1)) denote their possible values, respectively. m(1)), αi(2)(I=1,2,…. ,m(2)), … , αi(n)(I=1,2,… m(n)), where the families {{X1=αi(1)(1)}, {X2=αi(2)(2)}, … I(1), I(2),…, I(n),…, {Xn=αi(n)(n)}} X1,X2,…,Xn are independent regardless of the regret of I(n). Xn are independent.
Theorem 1 If X1,X2,…,Xn are independent, then X1,X2,…,Xn are independent. The necessary and sufficient conditions for Xn to be independent are I(1),I(2),…,I(n). Regardless of the regret of I(n), p({X1=αi(1)(1)}⋂{X2=αi(2)(2)}⋂… ⋂{Xn=αi(n)(n)}) =p({X1=αi(1)(1)})p({X2=αi(2)(2)})… . p({Xn=αi(n)(n)}) holds.
Example 1
If X is a random variable on a probability space Ω
If the copies X1(n), X2(n), …, Xn(n), …, Xn(n), …, Xn(n) over that Ω(n), …, …, Xn(n), … Xn(n), Xn(n) are independent
If anything, αi(1), αi(2),… Let αi(n) be any possible value of X.
Theorem 2 The families {E1,E2,…,En} of stochastic events are independent. En} are independent, the necessary and sufficient condition for their defining random variables XE1,XE2,…,XEn to be independent is that XE1,XE2,…,XEn are independent. XEn to be independent.

4 Mean value

Consider a single trial and let Ω be the corresponding probability space.
Suppose that the above random variables X are defined on that Ω.
Let α1,α2,…,αs Let αs be the possible different values of X
Ei={X=αi}(I=1,2,…,s) are all possible values of X. s) are all stochastic events
and the families {E1,E2,…,Es} are all stochastic events Es} are partitions of Ω
Repeat the trial over and over again, and let the resulting sequence of root events be the above
X(an) by the random variable X.
Definition 1 Let X be a random variable on any probability space Ω and let X(a)=αi (a∈Ei) (I=1,2,…,s). ,s). However, α1,α2,…,αs In this case, the number ∑αip(Ei)=∑αip({X=αi}) is called the “average, expected or desired mathematical value of X” and is written as M(X) or E(X).
Example 1
Let E1, E2, E3, and E4 denote the whole spades, whole clubs, whole diamonds, and whole hearts, respectively.
p(E!)=p(E2)=p(E3)=p(E4)=1/4
Therefore, M(X)=1・1・1/4+2・1/4+2・1/4+3・1/4+4・1/5=5/2=2.5
Example 2
Example 3
Example 4
Example 5
Suppose that two random variables X and Y are given on a probability space Ω. At this point, the following theorem holds
Theorem 1 M(X+Y)=M(X)+M(Y)
Theorem 2 Let X be any random variable and α be any constant M(αX)=αM(X)
Example 6
Theorem 3 If the random variables X and Y are independent M(XY)=M(X)M(Y)

5 Variance, standard deviation, and covariance

After a number of trials, the arithmetic mean of the values of the random variable X is predicted to be roughly close to M(X)
Need a measure to distinguish between the above two examples
Definition 1 For a random variable X, the mean value M((X-M(X))2) of the random variable (X-M(X))2 is called the “variance of X” and is denoted by V(X).
Definition 2 The positive square root of the variance, √(V(X)), is called the “standard deviation of X” and is denoted by σ(X).
Variances and standard deviations of probability distributions with equal distributions are equal to each other.
The variance and standard deviation of a random variable X are called
Example 1
Let E be a random event and X = XE.
In this case, since M(X)=p(E)
V(X)=M((X-M(X))2) =(1-p(E))2p(E)+(0-p(E))2p(Ec) =(1-p(E))2p(E)+p(E)2(1-p(E)) =p(E)(1-p(E))(1-p(E))+p(E)) =p(E)(1-p(E))
Theorem 1 V(X)=M(X2)-M(X)2
Theorem 2 If α and β are constants (value random variables), then V(αX+β)=α2V(X)
Definition 3 If X and Y are random variables on the probability space Ω, M((X-M(X))(Y-M(Y))) is called the “covariance” with X and Y, denoted by C(X,Y)
This satisfies the following property
Theorem 3 (1) C(X,Y)=C(Y,X) (2) C(X,X)=V(X) (3) C(X,Y)=M(XY)-M(X)M(Y)
Theorem 4 If X and Y are random variables on the establishment space Ω, then V(X+Y)=V(X)+V(Y)+2C(X,Y) In particular, V(X+Y)=V(X)+V(Y) if X and Y are independent
Example 2
Theorem 5 If X is any random variable and ε is any positive number, then p({|X-M(X)|<ε}≥1-V(X)/ε2
The expression of this theorem is called “Chebyshev’s inequality

6 Law of large numbers

The intuitive meaning of the mean value M(X) of a random variable X is
Let X be a random variable on the probability space Ω for a given trial
Let the trial be repeated n times and let the sequence of outcomes be a1,a2,… Let an be a sequence of results.
If N is large enough, then X(a1),X(a2),…. X(an), the arithmetic mean of (X(a1)+X(a2)+…. +X(an))/n tends to cluster around a certain value
That value is M(X).
The right-hand side is nothing but common sense.
Theorem 1 If X is a random variable on any establishment space Ω and Ε is any positive number, then the above equation holds.
This is called the law of large numbers.
Theorem 2 Let E be a random event on an establishment space Ω, and X=XE.
This is called Bernoulli’s theorem

7 Poisson’s law of small numbers

Let E be a random event in a probability space Ω, X=XE, and p=p(E).
However, the above formula is quite troublesome when n is large
Approximation used when p is sufficiently small
Theorem 1 The mean value of (N,p)-binomial distribution is np, but if we consider only binomial distributions such that it is equal to a constant value α, the above formula holds.
This is called Poisson’s minority rule.
Example
Example
Number of soldiers kicked to death by horses in Bortkiewicz’s Prussian army: r
Probability of one soldier dying from being kicked by a horse in one year: p=p(E), probability event E={1}, X=XE
One legion consists of n soldiers
Perform n-weighted trials
The number r is
If we examine any number of legions, the arithmetic mean of r is X1(n)+X2(n)+… + Xn(n), which is expected to be close to the average of Xn(n), i.e. np
The arithmetic mean of r examined for the 200 corps is
Np ≈ 0.61
The event that, in one legion, r soldiers are kicked to death by horses in one year.
If we do that n-weighted trial 200 times
When you take one soldier, it is extremely rare for it to be kicked to death by a horse within a year
Hence
Calculating the right-hand side of this equation, we get

8 Laplace’s theorem

The standard deviation σ(X) (or variance V(X)) of the random variable X is
a measure of how much the value of X deviates from M(X) on average.
The random variable above represents the discrepancy of X from M(X), measured in units of σ(X).
Ẋ is useful for examining various properties of X
From the defined random variable X=XE for a random event E
S(n)=X1(n)+X2(n)+…. +Xn(n) and make
Given the above, what happens to these distributions for large n? Consider the following
Theorem 1 For any real numbers α and β (α≤β), the above holds.
This is called “Laplace’s theorem
Stirling’s formula
Extension of Laplace’s theorem to general random variables
Example
Consideration by Krofton
When we make a measurement
If it is done in a totally ideal way
We would naturally obtain the true value m as the measured value
In reality, this is not the case
Because there are so many intervening errors in measurement
A measurement has so many “sources” of error
Each time a measurement is made, it produces a very subtle error of either epsilon or – epsilon (epsilon is small enough).
The minuscule error a1,a2,…,an produced by each such minamenata is the true value. an added to the true value m
Upon measurement
Unknowingly, a kind of incidental trial is automatically performed
The root event is a set of n (n is sufficiently large) pairs (a1,a2,…,an) of ε and -ε. an)
Omitted hereafter.
Probability of interval [a,b

5 Markov Chain

Introduction

There is a series of experiments.
The degree of likelihood of various things happening as a result of a nuclear experiment is
vary according to the transfer of the results of the experiments that preceded it.
Example
There are two jars, A and B.
Each jar contains several black and white balls of the same size.
From each jar, one ball is taken out at random and put from A to B and from B to A. The number of black balls in A is counted.

1 Simple Markov Chain

Consider the series of experiments on the jars described above.
For simplicity, let the total number of black balls, the total number of white balls, and the total number of balls in A and B be all equal to a constant α(>0)
The result for each 𝔈n is the number of black balls in jar A 0,1,2,… ,α.
This is called the state of the series (*) and is denoted by ω1,ω2,… ωα+1.
At this point, we can proceed with the following analysis
(1)
If 𝕰k-1 (k>1) results in ωi=i-1
In 𝕰k, for example, the probability that a white ball is extracted from A and a black ball from B is above
Therefore, the probability that the number of black balls in A as a result of 𝕰k is i, i.e., the probability that ωi+1 occurs is equal to (**)
Similarly, when ωi is determined as a result of 𝕰k-1
For any ωj(j=1,2,…,α+1), the probability that ωi+1 occurs is equal to (**). ,α+1) for any ωj(j=1,2,…,α+1).
we can compute the probability that it occurs as a result of 𝕰k
We denote this by p(i→j)
Thus, each 𝕰k can be thought of as a trial such that, once the outcome ωi of 𝕰k-1 is determined
(1) The root events are ω1,ω2,… ,ωα+1.
(2) The indicator is the root event itself
(3) p({ωj})=p(I→j)
(2)
Experiments below 𝕰2 are performed directly on the jar in which the previous experiment was performed
There is no experiment before 𝕰1, so before starting 𝕰1, it is necessary to decide how the balls will be placed against each jar
Example
(a) all black balls in A, all white balls in B
(2) Divide the 2α balls into half parts and place them in A and B.
(3) Put only 2 black balls in A and α-2 white balls in B.
We may assume that 𝕰1 is a trial such that the following conditions are satisfied
(1) The root events are ω1,ω2,… ,ωα+1.
(2) The indicator is the root event itself.
Suppose that all possible outcomes of the nuclear experiment are common, and that there are a finite number of them.
Let ω1,ω2,…,ωN be (ω1,ω2,…,ωN) for each of them. ΩN are called (*)-states.
Denote by Ω the totality of the states Ω={ω1,ω2,…,ωN}. ,ωN}
For such a sequence (*), we say that (*) is a “simple Markov sequence” if there exist numbers p(i) and p(i→j) such that the following (A) and (B) are satisfied
(A)
𝕰1 is a trial such that
(1) The root events are ω1,ω2,…,ωN ωN.
(2) The indicator is the root event itself
(3) p({ωi})=p(i)
p(i) is called the “initial trial” of ωi in the series (*)
(B)
Each 𝕰k(k>1) is the result of the previous experiment 𝕰k-1 if ωi is determined
can be thought of as the following trials
(1) The root events are ω1,ω2,… ,ωN.
(2) The index is the root event itself
(3) p({ωj})=p(i→j)
p(I→j) is called the “transition probability” from ωi to ωj in (*)
Transition matrix
Example 1
The series of experiments described in the previous section 1 is a simple Markov series
The transition probabilities are as above
Example 2
Two players, A and B, play a certain game.
The probability that A and B win is 1/2 each
The two players have 3 points each at the beginning
The loser pays 1 point to the winner
When the number of points is zero, the loser does not have to pay.
Two players play this game and record the score of A
Simple Markov Sequence
(1) The states are 0,1,2,… ,6(these are ω1,ω2,…,ω7) ω7)
(2)p(3)=p(5)=1/2, p(1)=p(2)=p(4)=p(6)=p(7)=0
The transition probabilities are
Example 3
Sequence of operations to cut playing cards
The state is any possible permutation of the 52 rice cards of the playing cards
Given a playing card of one permutation ωi, the degree of likelihood that each permutation ωj is obtained by cutting it once is roughly constant from person to person
Example 4
Example 5

2 Multiple Probability Space

Let the above be a simple Markov sequence
Let the corresponding simple Markov chain be the above
Let Ω={ω1,ω2,…,ωN}. ωN}.
It is clear that 𝕰1 is the following trial
(1) The root event is the source of Ω, ω1,ω2,…,ωN ωN
(2) Every event is a stochastic event
(3) p({ωi,ωj,…,ωk})=p({ωi,ωj,…,ωk}) ,ωk})=p(i)+p(j)+… +p(k)
Thus, Ω can be thought of as a single probability space
The probability space Ω(n) is called an n-weighted probability space of 𝔐
Theorem 1
Theorem 2
Theorem 3
Theorem 4.
For a simple Markov chain {Ω, p(i), p(i→j)}, we can define an infinite multiple probability space Ω(∞)

3 Infinite distribution

Consider the simple Markov chain of “trump cutting” shown in Example 3.
Purpose of this series
To repeat the operation of cutting playing cards, thereby ensuring that the cards are not biased toward any particular row or column.
The above is expected
For a simple Markov chain 𝔐={Ω, p(i), p(i→j)} (Ω={ω1,ω2,…,ωN}) ,ωN}) for
I → p(r)(i) (I=1,2,…,N) N) for a real probability distribution
Example 1
Example 2
Theorem 1
Theorem 2
Example 3
Example 4

4 Markov Chains

The sequence 𝕰1,𝕰2,…,𝕰n of real hair surfaces ,𝕰n,. depends on more past results.
Markov series of memory length l
Example 1
The task of reading each character one by one and recording it in some form is
all Markov series
Example 2
Reading and subtracting notes one by one is also a Markov series
corresponding to a Markov chain of memory length l.
Markov chain of memory length l

5 Entropy

In the series of experiments of cutting playing cards
Let the initial probability be p(1)=p(2)=… =p(N)=1/N (N=52!), then
For any r, p(r)(j)=p(j)
Markov chain of memory length l, regardless of the regret of r. p(r)(i1,i2,…,il)=p(i1,i2,…,il)=p(i1,i2,…,il) ,il)=p(i1,i2,…,il) il), then the chain is stationary.
the chain is said to be stationary.
Theorem 1
Theorem 2
Example 1
Example 2
Theorem 3
Definition 1
Entropy
Example 3
Example 4
Definition 2
Theorem 4
Shannon’s theorem
Entropy of language

6 Attention to predicates

Equal Markov chains
Equal Markov Chains

7 Stochastic Processes

Introduction to the concept of stochastic processes
Theorem 1
Theorem 2
Theorem 3
Definition 1
Definition 2
Ergodic part of a Markov chain

6 Borel type probability space

Introduction

The previous sections have been “Classical Probability Theory.”
In this chapter, we will focus as much as possible on the beginning of modern probability theory

1 Necessity of stochastic event diffusion

The probability space of a finite number of stochastic events E1,E2,…,En En sum and product events are also stochastic events
An infinitely many sequence of stochastic events E1,E2,… En,… s sum and product events are not necessarily stochastic events
Example 1
In the probability space of Chapter 1, 8, Example 4, the above are not probability events
Example 2
In any probability space Ω, if we take a probability event E such that E≠∅ and E≠Ω
Cylinder probability event in infinite multiple probability space Ω(∞)
the sum or product event is not a stochastic event
Example
In the infinite multiple trial of diarrhea thinking
None of the events corresponding to “at least one tableau” or “a tableau forever” is a probability event.
However
It is useful to extend the range of random events, i.e., events that can be given a probability, so that all sum and product events of an infinite sequence {En} of random events in the new sense are also random events.
Example
Sometimes the above raises the question of what the probability of the above will be
Let {Xn} be a stochastic process on a single probability space Ω.
Sometimes we want to consider the probability of an event that makes the whole a such that the extremal value limXn(a) is equal to some constant number α
I want the sum and product events of an infinite sequence of stochastic events to also be stochastic events
The above conditions can be rewritten as follows
(1)
For any ε > 0
If we take an appropriate natural number n0
for all n such that n≥n0|Xn(a)-α|<ε
(2)
Paraphrasing (1)
For any natural number m
If we take an appropriate natural number n0
for all n such that n≥n0|Xn(a)-α|<1/m
Ε for 1/m where m is a natural number
The event that makes the whole a such that|X(a)-a|<1/m for all n such that n≥n0 is the above.
Therefore
For some n0, the event of the whole a such that |Xn(a)-α|<1/m for all n such that n≥n0 is equal to the above
From this, the event of the whole a satisfying (2), i.e., {limXn=α}, is expressed as above.
Therefore
In order to consider the probability of an event of type {limXn=α}, the sum or product event of an infinite sequence of random events must be a random event.

2 Axiom of continuity

Extend the range of events that can be given a probability
The events that come into this new range are called “probabilistic events in the broad sense
The totality of the original random events in the probability space is called 𝔄
Let 𝔅 be the totality of probabilistic events in the broad sense
they must satisfy the following conditions
(1)
𝔄⊆𝔅
i.e., all original stochastic events are stochastic events in the broad sense
(2)
En∈𝔅(n=1,2,…) Then the above is ∈ 𝔅
(3)
If E ∈ 𝔅, then Ec ∈ 𝔅
Given 𝔄, there exists at least one range 𝔅 such that (1), (2), and (3) are satisfied
Example.
The whole of any event 𝔛 in Ω satisfies (1), (2), and (3)
To take the smallest possible α-expansion
Theorem 1 Any σ-expansion 𝔅 of 𝔄, 𝔅’, 𝔅”,,,, is also a σ-expansion of 𝔄
𝔅0 is called the “Borel” expansion
The probability that B(𝔄) should be given to the source E is
(A)
If E ∈ 𝔄, then p(E) is equal to the original probability
(B)
1≥p(E)≥0 is satisfied
(C)
En∈B(𝔄)(n=1,2,…) and if these are mutually abolishing clans.
Example 1
Consider the following probability space Ω
(1) Ω=(1,2,3,… ,n,…)
(2) The list of stochastic events is as in the month
(a) Finite sets (including empty sets)
(b) Sets of the form {n|n≥n0} (n0 is arbitrary)
(c) The union set of events of the form (a) and (b)
(3) The probability is determined as follows
(a) p(E) = 0 if E is a finite set
(b) p(E) = 1 if E is an infinite set
Let A={n1,n2,…,ni,…) be any infinite subset of Ω. ,ni,…) then
(the following is omitted)
In general, to be able to define a probability event in the broad sense, i.e., a probability such that (a), (b), and (c) are satisfied under B(𝔄)
(c), the following conditions must first be satisfied
En∈𝔄(n=1,2,…) and that these events are mutually exclusive.
Axiom of Continuity
Theorem 2 If the probability space Ω satisfies the axiom of continuity, then we can define probabilities such that (a), (b), (c) are satisfied for each element of B(𝔄).
Here are some examples of spaces satisfying the axiom of continuity
Example 2
A stochastic phenomenon, i.e., a probability space Ω such that 𝔄 has only a finite number of elements, satisfies the axiom of continuity.
Example 3
Example 4

3 Porrel-type probability spaces

From now on, we will only consider probability spaces that satisfy the axiom of continuity
When we speak of probability events, we promise to refer to probability events in the broad sense
From now on, we assume that a probability space always satisfies the following conditions
(1) If En(n=1,2,…) is an establishment event, then
(2) If stochastic events En(n=1,2,…) are mutually exclusive
Borel-type probability space
In contrast to Borel-type probability spaces, probability spaces of the old meaning are called Jordan-type suspension spaces.
Theorem 1
Theorem 2
Theorem 3

4 Strong law of large numbers

Theorem 1
Auxiliary Theorem 1
Kolmogorov’s inequality
Auxiliary Theorem 2
Borel-Cantelli’s theorem
Theorem 2
Borel’s Law

5 Expansion of random variables

Expanding a random variable from finite to infinite
Definition
Theorem 1
Theorem 2
Theorem 3
Theorem 4
Theorem 5

6 From the theory of countable functions

To develop a theory of general random variables
To adopt some knowledge from the theory of measurable functions and their Lebesgue integrals
For this purpose, we will extract the necessary items from those theories.
Theorem 1
Theorem 2
Theorem 3
Theorem 4
Theorem 5
Theorem 6

7 Properties of random variables

The sum, difference, and product of general random variables X and Y: X+Y, X-Y, and XY are
X,Y are defined as the sum, difference, and product when X,Y are considered as measurable functions
This is a direct extension of the concept of sum, difference, and product of random variables taking finite values.
Theorem 1
Definition 1
Theorem 2
Example 2
Example 3
Example 4
Definition 2
Example 6
Example 7
Theorem 4
Chebyshev’s inequality
Theorem 5
Law of Large Numbers
Theorem 6
Laplace’s theorem and the strong law of large numbers
Theorem 7
Gaussian distribution
Example 8

Appendix

1 Gaussian Distribution and Stirling’s Formula
2 Multiple Probability Space
3 Calatheodori Theorem