Fundamentals of Computer Mathematics

Machine Learning Artificial Intelligence Digital Transformation Natural Language Processing Deep Learning Information Geometric Approach to Data Mathematics Navigation of this blog
Fundamentals of Computer Mathematics

Mathematics is at the root of computer science. For example, machine learning, such as deep learning, starts from functions and uses optimization calculations using differential/integral calculus, and symbolic approaches use set theory as the basis for equation evaluation. It is an important task to organize the knowledge of the basic elements of each before considering their “applications”.

Introduction to Mathematics” by Hiroyuki Kojima is a literal introduction to mathematics that starts with the Pythagorean theorem and describes geometry, functions that often appear in the world of machine learning, differentiation, algebra, integration, and finally sets, which are the foundation of basic mathematics.

Chapter 1:”An Adventure Beginning with the Pythagorean Theorem”

It talks about the theorem of trigonometric functions, the fusion of geometry and algebra, the negative numbers that can be imagined thanks to the fusion with geometry, and finally the concept of vectors and their calculation.

A more detailed theory of the geometric approach to data is described in “A Geometric Approach to Data. If you are interested, please refer to this article as well.

Chapter 2: “An Adventure Starting with Functions”

This is a chapter on functions, which are familiar in the world of computers. First of all, a function is defined as “something that describes the laws of the world. There are many “rules” and “laws” in our world, and they can be clearly defined using mathematics.

A function links one quantity x to another quantity y. These links can be cause and effect, a conversion from one unit to another, or a change related to the passage of time.

The relationship between two quantities can be roughly divided into “deterministic laws” (where one is determined with certainty and the other is determined with certainty) and “statistical laws” (where a tendency is roughly expressed), and functions correspond to the former.

A function itself is not a “number” but a “mechanism (algorithm, whole system)”. The English word function indicates exactly this function/function. When this function was imported to China, the pronunciation was followed and it was called “Hanshu”, which was then imported to Japan and called “function”.

Here, “character expressions” came to be used to express the mechanism (e.g. 4x+2y). The character expression is the algorithm itself, and by looking at the character expression, it becomes necessary to read “what kind of procedure it shows”.

In the second half of Chapter 2, the concept of functions is extended to the concept of vectors and matrices that appeared in Chapter 1.

Further details on these functions are given in the “About Functions” section.

Chapter 3: “Adventures in the World of Infinitesimals”

In this chapter, infinitesimal is assumed to be “a length that is smaller than any positive number but not zero,” and the curve is approximated by connecting the infinitesimal.

As a simple example, consider a linear function (e.g. y=3x+2). 3×1+2=5, so O'(1,5) is located on y=3x+2, and the point O’ is the intersection point of the coordinates. At this time, the relationship between p and q is q=3p. (y=3p+2 is a proportional relationship when viewed from the point p,q with O’ as (0,0).) These p and q are called local coordinates and are expressed as Δx and Δy.

Let’s apply this simple example to curves. For example, when y=x2, assuming a point P(1,1) on y=x2, the local coordinates are Δx=x-1 and Δy=y-1. Applying this to the original equation, we get (Δy+1)=(Δx+1)2→Δy=2Δx+(Δx)2. Since (Δx)2 is very small compared to Δx in the very neighborhood of point P (e.g., (Δx)2 is 0.01 when Δx = 0.1, and (Δx)2 = 0.001 when Δx = 0.01. When Δx becomes small, (Δx)2 becomes very small and can be ignored), we can approximate Δy = 2Δx in the very neighborhood of point P. In the very neighborhood of point P, we can approximate Δy=2Δx.

Now, to generalize a bit more, consider a local approximation linear function that approximates the function f(x) near the point P(a,f(a)). Since it is a (local approximation) linear function, we can express it as Δy=mΔx, so substituting them into f(x), we get Δx=x-a, Δy=f(x)-f(a)=f(a+Δx)-f(a), and the difference between the approximation and the actual value is the error, so the error=f(a+x)-f(a)-mΔx. Dividing this by Δx is the error rate (the degree of approximation), which is (f(a+Δx)-f(a)-mΔx)/Δx, which is (f(a+Δx)-f(a))/Δx-m. If we assume that the error rate becomes zero when Δx is close to zero ((f(a+Δx)-f(a))/Δx-m=0), the above equation can be expressed as follows

\[ m=\lim_{\Delta x \rightarrow 0}\frac{f(a+\Delta x)-f(a)}{\Delta x}\]

Now, we have successfully derived the expression for the derivative. This m is called the coefficient of differentiation, and it tells us the “local properties” of the function (properties that can be determined by looking only very close on the graph). Specifically, it tells us whether the curve is in an increasing or decreasing state at a certain point, and solving function optimization problems by using this to find the extreme values of the function is the starting point of “machine learning technology.

Chapter 4: “Adventures in Linear Equations”

First of all, let’s start with the classic crane and tortoise arithmetic: “There are five cranes and five tortoises together. There are 5 cranes and 14 turtles in total. How many of each are there? In the actual calculation, the number of legs is 2 x 5 = 10, assuming that all the cranes are cranes, and the turtles have 2 more legs than the cranes.

If we solve this as a system of equations, we have 2x+4y=14 with x vines and y turtles, and x+y=5. 2Kramer’s formula for simultaneous equations with two unknowns (when ax+by=p and cx+dy=q, x=(dp-cq)/(ad-bc), y=(aq-bp)/ad-bc) The answer can be calculated using This can be done by expressing the simultaneous equations as vectors like the following

\[ x\begin{pmatrix} a\\b\end{pmatrix}+y\begin{pmatrix} c\\d\end{pmatrix}=\begin{pmatrix} p\\q\end{pmatrix} \]

The ad-bc appearing in Kramer’s law is the area of the parallelogram represented by the two vectors (a,b) and (c,d). This is expressed by the following formula using det:determinal.

\[ det(\vec{a},\vec{b})=det\left(\begin{pmatrix} a\\b\end{pmatrix},\begin{pmatrix} c\\d\end{pmatrix} \right)=ad-bc \]

By connecting geometry and mathematics, we can handle vectors and trigonometric functions, and use them to calculate matrices and linear equations. Furthermore, they lead to optimization problems such as spectral clustering and the world of machine learning.

For “linear algebra,” which is mathematics in the area of matrices and linear equations, please refer to “Outline of Linear Algebra and Reference Books“.

Chapter 5: “Adventures in Sets”

A set is a collection of things. The objects can be numbers, shapes, or coordinates. In the first place, sets were conceived to deal with infinity. Infinite means uncountably many, and sets were used to check if this infinite is real and reachable.

In order to think about them, we need to think about the comparison between infinities. The natural numbers and the (positive) even numbers are both infinite. Normally, it seems obvious that the natural number is larger than the even number because it is made up of even and odd numbers. On the other hand, Cantor concluded that the natural numbers and even numbers are “the same” because there is no stir when there is a one-to-one pair between them. He defined the thing to be compared between natural numbers and even numbers as “concentration” as opposed to the usual number of numbers, and said that the “concentration of the set of natural numbers” and the “concentration of the number of even numbers” are the same.

Next, when we assume that there is a one-to-one pair for the “concentration of natural numbers” and the “concentration of real numbers” in the same way, we can generate numbers that do not exist, which proves that the assumption is contradictory and that the “concentration of natural numbers” is greater than the “concentration of real numbers. We also proved that the “set of all” (the set of sets) has a larger concentration than the “set of real numbers”.

Going further, I tried to prove the hypothesis that there is no set with a concentration greater than the concentration of the natural numbers and less than the concentration of the real numbers (the continuum hypothesis), but this has not yet been achieved. Based on this idea, it became possible to define what numbers are (real numbers and natural numbers), the concept of “topology” to see the continuity of functions in various spaces, and the definition of probability.

As for probability theory, it can be assembled using set theory and integral theory. Probabilistic phenomena are defined as “phenomena that are uncertain at the moment because they will happen in the future” or “phenomena that have already had results but are not clearly understood because of lack of information. When the possibility of an event or outcome is represented as a set, it is called a “sample space,” and a subset of the sample space is called an “event. Here, the degree of likelihood of an event is called the “probability of the event,” and probability is introduced.

We can also use sets to define relationships among things. For example, {Taro, Jiro, Hanako For example, {Taro, Jiro, Hanako, Sachiko}
or a+b=5 in the set of {natural numbers}, we can define a~b as the relation that a likes b or a and b are of the same sex.

Here, a binary relation is defined as an “equivalence relation” when it has the following three properties. (1) a~a, (2) b~a if a~b, and (3) a~c if (a~b and b~c). Applying this to the example above, in the case of the like relation, (1) holds, but (2) and (3) do not hold, so the like relation is not an equivalence relation. a+b=5 is also not an equivalence relation because (1) and (3) do not hold although (2) holds.

When the binary relation “˜” is an equivalence relation, “˜” can be thought of as implying some kind of similarity. By using this equivalence relation, we can group sets, and the elements of the divided subset are called “equivalence by equivalence relation ~”. For example, if we create subsets based on the same sex relationship, we can create a “set of men” and a “set of women. There is no common element at all between the two different equivalence sets created in this way. Binary relations are also a concept that appears in algebra, as described in “Structures, Algorithms, and Functions.

In set theory, the concept of equivalences is used to define numbers and further mathematical concepts, and there are several other topics as well. For an overview of set theory and reference books, please refer to “Overview of Set Theory and Reference Books“.

コメント

タイトルとURLをコピーしました