Mathematics as a basis for programming and algorithms for artificial intelligence and machine learning

Artificial Intelligence Digital Transformation Machine Learning Programming Navigation of this blog

Mathematics in Machine Learning

Mathematics in Machine Learning

Overview

Machine learning is a technology built on mathematical theories and methods, and machine learning algorithms are those that use mathematical models to analyze data and discover patterns. Mathematics is the study of concepts such as numbers, quantities, structure, space, change, and uncertainty that are handled, logically reasoned about, and modeled. The basic fields of mathematics include arithmetic, which deals with basic calculations; algebra, which deals with mathematical expressions and equations; geometry, which deals with shapes, sizes, and positions in space; and analysis, which deals with limits and calculus. In addition to these fields, there are various application areas such as probability theory, statistics, mathematical logic, mathematical optimization, and topology.

Mathematical topics related to machine learning include probability theory, statistics, linear algebra, calculus, and optimization. This is the case, for example, in machine learning algorithms such as linear regression, logistic regression, decision trees, random forests, and neural networks, where concepts of linear algebra and calculus are used and built upon, and tasks such as data preprocessing, model evaluation, and hyperparameter tuning knowledge of statistics and optimization is required.

In order to master machine learning, it is important to acquire a basic knowledge of mathematics; moreover, the field of machine learning is constantly evolving, so it is necessary to constantly learn about the latest techniques and topics.

This blog discusses a variety of topics for these mathematics. In particular, this blog focuses on information that is considered necessary for artificial intelligence, machine learning, and other computer applications.

General Theory

Fundamentals of Computer Mathematics

Fundamentals of Computer Mathematics. Mathematics is at the root of computer science. For example, machine learning, which is used in deep learning and natural language processing, starts with functions and uses differential/integral optimization calculations, while symbolic approaches used in artificial intelligence are based on set theory to evaluate expressions. Before considering their digital transformation applications and IT system applications, it is important to organize our knowledge of the basic elements of each.

Introduction to Mathematics” by Hiroyuki Kojima is a literal introduction to mathematics, starting with the Pythagorean Theorem and covering geometry, functions that often appear in the world of machine learning, differentiation, algebra, integration, and finally sets, which are the foundation of basic mathematics.

Structures, Algorithms, and Functions

Structures, Algorithms, and Functions. According to the wiki, structure is “the way in which the parts of a thing are put together to make it. It is a general term for the relations of conflict, contradiction, and dependence among the elements that make up a whole [2]. In the world of mathematics, the term “complex” refers to the arrangement and relationship of parts and elements of a complex thing. In the world of mathematics, the basic approach is to abstract the “parts that make up a thing” as much as possible and find the relationship between them.

For example, geometry is the study of space and shapes, not necessarily of numbers. For example, geometry is the study of space and shapes, not necessarily of numbers. In response to this, Toyama cites mankind’s ability to recognize patterns as the origin of mathematics. According to this book, it is this ability to recognize patterns that has enabled us to cut the complex world into understandable forms, sublimate it as a discipline, and make it into mathematics. This means, for example, that the simple concept of numbers, such as 1, 2, 3,…. For example, the simple concept of numbers, 1, 2, 3, …, is not simply a number, but an abstract concept of 2, derived from the abstraction of common forms or patterns between different things, such as two apples, two oranges, two people, and two dogs.

Abstract Algebra and AI Technology.

Abstract Algebra and AI Technology. Abstract Algebra (Abstract Algebra) is a field of mathematics that studies abstract “structures” about numbers and operations, generalizing properties of concrete numbers such as integers and real numbers to construct universal theories that can be applied to a wider range of objects. The goal of abstract algebra is to classify and study mathematical structures based on what axioms (rules) are satisfied by combinations of sets and operations, and in this field, “properties of operations” play a central role and various algebraic structures appear.

Linear and Nonlinear Models and Their Uses.

Linear and nonlinear models and their use. Linear and nonlinear models are to be distinguished on the basis of the difference in the relationship between inputs and outputs. Linear models have a simple structure in which the relationship between inputs and outputs can be represented by a “straight line” or “plane,” specifically, y = a₁x₁ + a₂x₂ + … + aₙxₙ + b, where the variables consist of linear expressions (addition and multiplication only). This makes calculations relatively simple and fast, and the results are easy to interpret. Typical examples include linear regression, logistic regression, SVM (linear kernel), etc. These models have the advantage of being difficult to overtrain, but have the disadvantage of being difficult to represent complex relationships and nonlinear data structures.

What is a function – its history, programming and machine learning

What is a function – its history, programming and machine learning. A function will generally be mathematically defined as a rule that assigns to each element in one set the only element in another set. This concept of a function was invented by mathematicians Gottfried Leibniz and Isaac Newton in the 17th century.

Furthermore, a function is a block of code that can be reused in a program and called to perform a specific operation. Functions divide the structure of a program and can be combined with multiple functions to achieve complex processing.

And functions play an important role in machine learning optimization.

Set Theory Overview and Reference Books

Set Theory Overview and Reference Books. Set theory is one of the fundamental fields of mathematics, which deals with the concept of sets. A set is a collection of elements, or mathematically, a collection of objects that satisfy certain conditions.

A set may be expressed in various forms. For example, one can list a sequence of elements enclosed in {}, or list elements that satisfy certain conditions. For example, the set {1, 2, 3, 4} represents a collection consisting of the elements 1, 2, 3, 4, and the set {x | x is a natural number and x is even} represents a collection consisting of elements that are natural and even.

What is information geometry?

What is information geometry?. Information geometry is a field that studies the geometrical structure of probability distributions and statistical models used in statistics, information theory, machine learning, etc. Its essential idea is to regard probability distributions and statistical models as geometric spaces and to analyse the properties of these models by introducing geometric structures (distance, curvature, connection etc.) are introduced to analyse the properties of these models.

Riemannian Optimisation algorithms and implementation examples

Riemannian Optimisation algorithms and implementation examples. Riemannian Optimisation (Riemannian Optimisation) is an approach where the usual optimisation methods are performed on Riemannian manifolds. A manifold here is a mathematical tool that represents ‘a space that is locally simple but overall complex’, such as a circumference that looks like a straight line but is closed overall, or a sphere that looks like a plane but has no edges and a closed structure, which locally is a simple structure but overall It will represent a complex structure. A Riemannian manifold is a space with a smooth geometric structure in which each point of this manifold has a defined inner product, which makes it possible to define measures such as distances and angles.

Bayesian inference and Information Theory On Bayes and Shannon’s Theory

Bayesian inference and Information Theory On Bayes and Shannon’s Theory. How did the idea of Bayesian statistics, which differs from classical statistics, emerge? How do they relate to Shannon’s information theory?

The difference between causation and correlation

The difference between causation and correlation. There seems to be a clear correlation between the amount of chocolate consumed and the number of Nobel Prize winners, and one could conclude that eating chocolate is the only way to win a Nobel Prize.

In fact, Switzerland is home to many famous international research institutes that have produced many Nobel laureates, and this has almost nothing to do with the fact that it is the home of chocolate and has a high per capita consumption. If you perform machine learning and only look at the results without considering the original data, you may end up assuming that the results are “causal” or “correlative”.

To avoid such mistakes, in general natural science experiments, the preconditions (experimental conditions) are clearly defined and fixed, and then the causal relationship is determined by looking at the results when the conditions predicted as causes are changed. In the natural sciences, it is relatively easy to fix the preconditions in this way.

The Art of Mathematical Decision Making: A Probabilistic Approach to Decision Making

The Art of Mathematical Decision Making: A Probabilistic Approach to Decision Making. There are four criteria for decision making that are also used in reinforcement learning algorithms: the Max-Min criterion, the Max-Max criterion, the Expected Value criterion, and the Multiple Prior belief.

Gendai Shiso July 2020 Special Feature: The World of Category Theory — The Forefront of Modern Mathematics Reading Memo

Gendai Shiso July 2020 Special Feature: The World of Category Theory — The Forefront of Modern Mathematics Reading Memo. Sphere theory is one of the most important fields in modern mathematics. Its high level of abstraction and generality is now being applied to a wide range of fields, including physics, computer science, biology, linguistics, and aesthetics. In this special issue, we introduce and discuss the basics of sphere theory, its various applications, and its philosophical implications, in order to get to the heart of this mathematical way of thinking that is attracting so much attention today.

Linear Algebra and Tensor

Linear Algebra Overview, Library and Reference Books

Linear Algebra Overview, Library and Reference Books. Linear algebra is a field of mathematics that uses vectors and matrices to analyze linear relationships, and is one of the most important basic mathematical tools in machine learning. In order to perform such linear algebra on computers, libraries that handle linear algebra have been developed in various programming languages. The following is a representative list. In addition, reference books on linear algebra are described.

Overview of Singular Value Decomposition (SVD) and examples of algorithms and implementations

Overview of Singular Value Decomposition (SVD) and examples of algorithms and implementations. Singular Value Decomposition (SVD) is a method for decomposing a matrix into a product of three matrices.

Overview of non-negative-valued matrix factorisation (NMF) and examples of algorithms and implementations

Overview of non-negative-valued matrix factorisation (NMF) and examples of algorithms and implementations. Non-negative matrix factorisation (NMF) is a method for decomposing a given non-negative matrix into the product of two non-negative matrices. Specifically, the non-negative matrix \(V\) of a given\(m \times n\) is decomposed as follows.

Overview of Alternating Least Squares for Matrix Factorisation (ALS-MF) and examples of algorithms and implementations

Overview of Alternating Least Squares for Matrix Factorisation (ALS-MF) and examples of algorithms and implementations. Alternating Least Squares for Matrix Factorization (ALS-MF) is a matrix factorisation technique that extracts the latent structure of a given matrix by decomposing it into a product of several submatrices. Specifically, the given matrix \(R\) (usually a user-item valuation matrix) is decomposed as follows.

Overview of the Gauss-Zeidel method and examples of algorithms and implementations

Overview of the Gauss-Zeidel method and examples of algorithms and implementations. The Gauss-Zeidel method is one of the iterative methods for finding solutions to a system of linear equations, and is particularly effective when the coefficient matrix has non-zero diagonal elements and diagonal dominance. In this method, each variable in the equation is assumed in turn, the solution is calculated with the other variables known, then the next variable is updated using the calculated solution, and so on until all variables converge.

CP (CANDECOMP/PARAFAC) Decomposition Overview, Algorithm and Implementation Example

CP (CANDECOMP/PARAFAC) Decomposition Overview, Algorithm and Implementation Example. CP decomposition (CANDECOMP/PARAFAC) is a type of tensor decomposition and is one of the decomposition methods for multidimensional data. CP decomposition approximates a tensor as the sum of multiple rank-1 tensors. It is usually applied to tensors of three or more dimensions, and we will use a three-dimensional tensor as an example here.

Overview of Non-Negative Tensor Factorization (NTF) and Examples of Algorithms and Implementations

Overview of Non-Negative Tensor Factorization (NTF) and Examples of Algorithms and Implementations. Non-Negative Tensor Factorization (NTF) is a method for obtaining a representation of multidimensional data by decomposing a tensor (multidimensional array) into non-negative elements. and signal analysis, feature extraction, and dimensionality reduction.

Tucker Decomposition Overview, Algorithm and Implementation Example

Tucker Decomposition Overview, Algorithm and Implementation Example. Tucker decomposition is a method for decomposing multidimensional data and is a type of tensor decomposition; Tucker decomposition approximates a tensor as a product of several low-rank tensors.

Mode-based Tensor Decomposition Overview, Algorithm, and Implementation Example

Mode-based Tensor Decomposition Overview, Algorithm, and Implementation Example. Mode-based tensor decomposition is a method for decomposing a multidimensional data tensor into a product of lower-rank tensors, which are specifically used to decompose the tensor and extract latent structures and patterns in the data set. Tensor decomposition can also be viewed as a multidimensional extension of matrix decomposition (e.g., SVD).

PARAFAC2 (Parallel Factor 2) Decomposition Overview, Algorithm, and Implementation Example

PARAFAC2 (Parallel Factor 2) Decomposition Overview, Algorithm, and Implementation Example. PARAFAC2 (Parallel Factor 2) decomposition is one of the tensor decomposition methods, and is a type of mode-based tensor decomposition described in “Overview, Algorithm, and Implementation Examples of Mode-based Tensor Decomposition”. The usual PARAFAC (canonical decomposition) approximates tensors of three or more dimensions as a sum of lower-ranked tensors, but PARAFAC2 can be applied to tensors of more general geometry.

Overview of Tensor Power Method and Examples of Algorithms and Implementations

Overview of Tensor Power Method and Examples of Algorithms and Implementations. The Tensor Power Method is a type of iterative method for solving tensor singular value decomposition and eigenvalue problems, and is useful for finding approximate solutions to tensor singular values and eigenvalues. The following is a basic overview of the Tensor Power Method.

Overview of Alternating Least Squares (ALS) and Related Algorithms and Examples of Implementations

Overview of Alternating Least Squares (ALS) and Related Algorithms and Examples of Implementations. Alternating Least Squares (ALS) is a method for solving optimization problems using the Least Squares method, which is often used in the context of matrix and tensor decomposition. An overview of ALS is given below.

Overview of Alternating Least Squares for Tensor Factorization (ALS-TF) and Examples of Algorithms and Implementations

Overview of Alternating Least Squares for Tensor Factorization (ALS-TF) and Examples of Algorithms and Implementations. Alternating Least Squares for Tensor Factorization (ALS-TF) is a method for tensor factorization. ALS-TF is especially applied to recommendation systems and tensor data analysis.

Overview of Alternating Least Squares for Non-Negative Matrix Factorization (ALS-NMF), Algorithm and Example Implementation

Overview of Alternating Least Squares for Non-Negative Matrix Factorization (ALS-NMF), Algorithm and Example Implementation. Alternating Least Squares for Non-Negative Matrix Factorization (ALS-NMF) is a type of Non-Negative Matrix Factorization (NMF). NMF is a method for decomposing a matrix \(V \) with non-negativity constraints into a product of a non-negative matrix \(W \) and \(H \), and ALS-NMF optimizes it while keeping the non-negativity constraints.

Overview of Block Term Decomposition (BTD) and Examples of Algorithms and Implementations

Overview of Block Term Decomposition (BTD) and Examples of Algorithms and Implementations. Block Term Decomposition (BTD) is one of the methods for tensor data analysis. Tensor data is a multi-dimensional data structure similar to a two-dimensional matrix, and BTD aims to decompose the tensor data into low-rank block structures.

Overview of Random Algorithms for Tensor Decomposition and Examples of Implementations

Overview of Random Algorithms for Tensor Decomposition and Examples of Implementations. The random algorithm for tensor decomposition is a method for decomposing a large tensor into a product of smaller tensors, where the tensor is a multidimensional array and the tensor decomposition will aim to decompose that tensor into a product of multiple rank 1 tensors (or tensors of smaller rank). The random algorithm begins by approximating the tensor with a random matrix, and this approximation matrix is used as an initial estimate for finding a low-rank approximation of the tensor

Overview of Higher Order Singular Value Decomposition (HOSVD) and examples of algorithms and implementations

Overview of Higher Order Singular Value Decomposition (HOSVD) and examples of algorithms and implementations. Higher Order Singular Value Decomposition (HOSVD) is a method for dimensionality reduction and data compression of tensors (multidimensional arrays of three or more dimensions). HOSVD captures the structure of the original tensor by decomposing it into many smaller tensors and compressing the information in each tensor. Specifically, HOSVD decomposes a tensor into multiple dimensions using singular value decomposition (SVD) and, in each mode (dimension), decomposes the tensor using the left and right singular matrices obtained by the singular value decomposition.

Overview of Tensor Train Decomposition and examples of algorithms and implementations

Overview of Tensor Train Decomposition and examples of algorithms and implementations. Tensor Train Decomposition (TT decomposition) is one of the methods for dimensionality reduction and data compression of multidimensional tensors and is an approach that provides an efficient data representation by approximating a tensor as a product of multiple low-rank tensors. vector and reconstructing the column vectors into specific products (tensor columns), allowing each element of the tensor to be represented as an inner product of the tensor columns.

Overview of HOOI (High-Order Orthogonal Iteration) and examples of algorithms and implementations

Overview of HOOI (High-Order Orthogonal Iteration) and examples of algorithms and implementations. High-Order Orthogonal Iteration (HOOI) is one of the methods based on the high-dimensional singular value decomposition (SVD) of a tensor; HOOI iteratively applies the singular value decomposition in each mode of the tensor to obtain a low-rank approximation of the tensor.

Overview of the TTM (Tensor-Train Matrix) and examples of algorithms and implementations

Overview of the TTM (Tensor-Train Matrix) and examples of algorithms and implementations. Tensor-Train Matrix (TTM) is a unique representation form of tensor, an approach that allows the representation of a matrix in tensor form through the tensorisation of the matrix TTM uses the technique of matrixisation of tensors to approximate a high-dimensional matrix as a product of low-rank tensors. TTM is the application of the Tensor Train (TT) decomposition to matrices, where the TT decomposition is a technique for approximating a tensor as the product of several low-rank tensors.

Information Geometry of Positive Definite Matrices (1) Introduction of Dual Geometric Structures

Information Geometry of Positive Definite Matrices (1) Introduction of Dual Geometric Structures. Information geometry first appeared in statistics. By geometrically considering the space of probability distributions, it added new views and insights to conventional mathematical statistics. For example, the way the space of probability distributions in question bends (curvature) is related to the performance of the parameter estimator. This is a beautiful result that is unique to geometry.

These results are derived from some dual structure of the space of probability distributions. Differences and horses are used in a variety of ways, and here “dual structure” is the image of a structure in which something can be turned over twice to get back to its original state, or two things cooperate to support something. Such “information geometrical structures” equipped with duality are not unique to the space of probability distributions. For example, “duality” in optimization, or “free energy” and “entropy” in thermodynamics can be regarded as examples. Thus, information geometry extends beyond the framework of statistical science and finds applications in various fields.

Information Geometry of Positive Definite Matrices (2) From Gaussian Graphical Model to Convex Optimization

Information Geometry of Positive Definite Matrices (2) From Gaussian Graphical Model to Convex Optimization. In this article, I will introduce the Gaussian graphical model, a basic model in causal inference, and describe the information geometry of positive definite symmetric matrices that appears in the model.

Information Geometry of Positive Definite Matrices (3) Computation Procedure and Curvature

Information Geometry of Positive Definite Matrices (3) Computation Procedure and Curvature. Statistics is well known as one of the fields where information geometry plays an active role. In this article, I will discuss semi-positive definite programming problems and their information geometry as a typical example of information geometry appearing in another field. In particular, I will focus on the fact that the number of iterations of the interior point method, which is the main solution method of semi-positive definite programming problems, can be expressed as an integral quantity of curvature in terms of information geometry.

Probability and Statistics

Probability, uncertainty and randomness

Probability, uncertainty and randomness. Probability is a quantification of the degree to which an event is likely to occur; specifically, the probability of an event occurring is calculated by dividing the number of cases in which the event occurs (the preferred outcome) by the total number of cases (all possible outcomes). Uncertainty, which is closely related to this probability, is often recognised as similar, but has a strictly different concept from probability. Randomness and probability are also closely related.

Modeling Gaussian and non-Gaussian worlds.

Modeling Gaussian and non-Gaussian worlds.” The probabilistic approach to machine learning, also discussed in “Probabilistic Approaches in Machine Learning,” uses a variety of probability distributions starting from the Gaussian distribution to perform calculations. This Gaussian distribution (normal distribution) is named after Carl Friedrich Gauss, but in fact it is said that it was not Gauss who “first discovered” it, as follows.

Uncertainty and Machine Learning Technology

Uncertainty and Machine Learning Technology. Uncertainty (Uncertainty) refers to a state of uncertainty or information in which future events or outcomes are difficult to predict, caused by the limitations of our knowledge or information, and represents a state in which it is difficult to have complete information or certainty. Mathematical methods and models, such as probability theory and statistics, are used to deal with uncertainty. These methods are important tools for quantifying uncertainty and minimizing risk.

This section describes probability theory and various implementations for handling this uncertainty.

Philosophical perspectives on probability and AI solutions to uncertainty

Philosophical perspectives on probability and AI solutions to uncertainty. When considering solving real-world problems from probability, the perspectives of prediction and uncertainty are of paramount importance. This uncertainty and AI technology are closely related, and how AI handles decision-making in uncertain environments is an important topic in AI design and application. Uncertainty means that future outcomes are not entirely predictable, and how AI handles that uncertainty will influence different technological approaches.

The Concept of Nothingness, Its Mathematical Representation and Application to AI Technology.

The concept of nothingness, its mathematical expression, and its application to AI technology. The Big Bang Theory is a widely accepted model of the origin of the universe, which states that the universe began its rapid expansion from an extremely hot and dense state approximately 13.8 billion years ago. This theory is based on Einstein’s general theory of relativity, in which time and space are explained as “something that is entailed by the existence of matter and energy. In other words, space and time themselves are said to have originated with the expansion of the universe.

Fluctuations and their applications.

Fluctuations and their applications. Fluctuation” is a word that generally refers to a state of slight movement that is not settled, or an unstable state with repeated small changes. For example, it is used to describe phenomena involving minute physical movement, such as “flickering flames” and “fluctuating sound,” as well as psychological and social instability and fluctuations, such as “fluctuations of the mind” and “fluctuating prices. In the field of physics, fluctuation refers to temporary and stochastic minute fluctuations in thermal motion and quantum mechanics, and is also considered an important concept to describe disturbances and fluctuations that naturally occur in regular patterns.

Quantum entanglement and covariance

Quantum entanglement and covariance. Quantum entanglement is a state in which two or more particles are so strongly bound together that their states instantly correlate with each other, no matter how far apart they are in space.

KL divergence constraint

KL divergence constraint. KL divergence (Kullback-Leibler Divergence) is an asymmetric measure of similarity between probability distributions \(P \) and \(Q \), which is mainly used in information theory and machine learning. When treated as a constraint, it is mainly applied in optimisation problems and generative modelling.

Epistemic uncertainty and AI complementation

Epistemic uncertainty and AI complementation. Epistemic Uncertainty refers to uncertainty arising from a lack of, or incomplete, knowledge or information, and is caused by an inadequate understanding of an event or system, which can be reduced by acquiring more information or deepening existing knowledge This will be something that can be reduced by acquiring more information or deepening existing knowledge.

Aleatory uncertainty and its resolution by AI

Aleatory uncertainty and its resolution by AI. Aleatory Uncertainty refers to uncertainty that is mainly caused by natural phenomena and stochastic fluctuations. This type of uncertainty is inherently random and uncontrollable and is often expressed using probabilistic models. This applies, for example, to weather conditions or the roll of the dice.

Overview of Cross Entropy and Related Algorithms and Implementations

Overview of Cross Entropy and Related Algorithms and Implementations. Cross Entropy is a concept commonly used in information theory and machine learning, especially in classification problems to quantify the difference between model predictions and actual data. Cross-entropy is derived from information theory, which uses the concept of “entropy” as a measure of the amount of information. Entropy is a measure of the uncertainty or difficulty of predicting information. It is greatest when the probability distribution is even and decreases as the probability concentrates on a particular value.

Overview of Probability and Statistics, its philosophy and libraries in various languages for specific use

Overview of Probability and Statistics, its philosophy and libraries in various languages for specific use. Probability and statistics is one of the fields of mathematics, and is a theory that deals with the probability of uncertain events and the analysis and modeling of data. This section describes the fundamental ideas of probability and statistics, which are indispensable for data analysis and machine learning, as well as libraries and reference information in various languages for putting these ideas into practice.

Overview of Kullback-Leibler Variational Estimation and Various Algorithms and Implementations

Overview of Kullback-Leibler Variational Estimation and Various Algorithms and Implementations. Kullback-Leibler Variational Estimation is a method for estimating approximate probabilistic models of data by evaluating and minimizing differences between probability distributions. It is widely used in the context of Its main applications are as follows.

Overview of the Dirichlet distribution and related algorithms and implementation examples

Overview of the Dirichlet distribution and related algorithms and implementation examples. The Dirichlet distribution is a type of multivariate probability distribution that is mainly used for modeling the probability distribution of random variables. The Dirichlet distribution is a probability distribution that generates a vector (multidimensional vector) consisting of K non-negative real numbers.

Overview of Softmax Functions and Related Algorithms and Examples of Implementations

Overview of Softmax Functions and Related Algorithms and Examples of Implementations. A softmax function is a function used to convert a vector of real numbers into a probability distribution, which is usually used to interpret the output of a model as probabilities in machine learning classification problems. The softmax function calculates the exponential function of the input elements, which can then be normalized to obtain a probability distribution.

Statistical Hypothesis Testing and Machine Learning Techniques

Statistical Hypothesis Testing and Machine Learning Techniques. Statistical Hypothesis Testing is a method in statistics that probabilistically evaluates whether a hypothesis is true or not, and is used not only to evaluate statistical methods, but also to evaluate the reliability of predictions and to select and evaluate models in machine learning. It is also used in the evaluation of feature selection as described in “Explainable Machine Learning,” and in the verification of the discrimination performance between normal and abnormal as described in “Anomaly Detection and Change Detection Technology,” and is a fundamental technology. This section describes various statistical hypothesis testing methods and their specific implementations.

Statistical Analysis and Correlation Evaluation Using Clojure/Incanter

Statistical Analysis and Correlation Evaluation Using Clojure/Incanter. Statistical analysis in general considers how a sample is described in terms of summary statistics and how population parameters can be inferred from it. Such an analysis tells us something about the population in general and the sample in particular, but does not provide a very precise description of the individual elements. This is because much information is lost by reducing the data to two statistics, the mean and the standard deviation.

We often want to go further and establish a relationship between two or more variables or to predict another variable from one variable. This is where correlation and regression studies come in. Correlation concerns the strength and direction of the relationship between two or more variables. Regression determines the nature of this relationship from which predictions can be made.

Linear regression is an elementary machine learning algorithm. Given a sample of data, the model learns a linear equation and is able to make predictions about the new unknown data. To this end, we will use Incanter, a statistical library in Clojure, to describe how matrices can be manipulated using Incanter to determine the relationship between height and weight of Olympic athletes.

Various probability distributions

Various probability distributions. Overview of various stochastic models used as approximations of stochastic generative models (Student’s t distribution, Wishart distribution, Gaussian distribution, gamma distribution, inverse gamma distribution, Dirichlet distribution, beta distribution, categorical distribution, Poisson distribution, Bernoulli distribution)

Intuitive Methods in Economic Mathematics: Probability and Statistics Reading Notes

Intuitive Methods in Economic Mathematics: Probability and Statistics Reading Notes. This book provides an intuitive understanding of the essence of highly developed economic mathematics, with a focus on 70 figures and graphs. In this book, “Probability and Statistics Edition,” you will learn the mechanism of the formation of the normal distribution curve, touch the wonder of the central limit theorem, which is considered the most important principle in probability and statistics theory, and learn Black-Scholes theory as an educational tool!

Introduction to Probability Theory Reading Memo

Introduction to Probability Theory Reading Memo. It is said that the mathematical consideration of probability began with the correspondence between Pascal and Fermat regarding betting. Classical probability theory, based on the concept of combinations, made a great leap forward into “modern mathematics” based on set theory in the 20th century, thanks to Borel and Kolmogorov. This book is written to bridge the gap between classical and modern probability theory, and explains the meaning of abstract mathematical formulas in a way that is easy for readers to understand, while providing plenty of elementary concrete examples such as playing cards and throwing dice. This is an introductory book that allows students to relearn probability in depth, as they learn it in high school mathematics.

Nine Stories of Probability and Statistics that Changed Humans and Society Reading Memo

Nine Stories of Probability and Statistics that Changed Humans and Society Reading Memo. From Pascal and Fermat to von Neumann and Keynes. The ideas of the pioneers of probability and statistics, who took on the challenge of predicting the uncertain future by measuring “chance,” are introduced in an easy-to-understand manner in the form of a virtual dialogue between them and the author.

134 Stories of Probability and Statistics that Changed the World Reading Memo

134 Stories of Probability and Statistics that Changed the World Reading Memo. While probability and statistics problems are very familiar, easy to understand, and interesting, they can be quite difficult to solve because of the number of possible correct answers. In fact, there have been cases where the great mathematicians of the time made mistakes even in problems that can be answered correctly by today’s junior high and high school students. On the other hand, there are still elegant solutions to unique problems by great mathematicians with mathematical sense. The author, an actuary and mathematical puzzle designer, presents many such seemingly mysterious problems, matters requiring clever thinking, and interesting historical episodes from his unique perspective.

On the various probability distributions used in stochastic generative models

On the various probability distributions used in stochastic generative models. Overview of various stochastic models used as approximations of stochastic generative models (Student’s t distribution, Wishart distribution, Gaussian distribution, gamma distribution, inverse gamma distribution, Dirichlet distribution, beta distribution, categorical distribution, Poisson distribution, Bernoulli distribution)

Statistical modeling with python

Statistical modeling with python

Definition and properties of the Cauchy distribution and its proof (external link)

Definition and properties of the Cauchy distribution and its proof (external link). The Cauchy distribution is a well-known thick-hemmed (heavy-hemmed) distribution with undefined expected value and slower decay than the normal distribution. The probability density function is \(p(x) = \frac{1}{\pi(x^2+1)}\). The definition and proof of its properties will be given in detail.

Geometric approach to data

There are various types of affroaches that treat information geometrically. One is called soft geometry, which deals with the phase of information and includes topological data analysis. The other, called hard information geometry, deals with differential geometrical studies of statistical models with probability distributions as elements, and includes approaches such as Riemannian geometry, symplectic geometry, and complex geometry.

Applications of information geometry extend not only to statistical inference such as EM algorithms, but also to statistical physics, learning theory, and information thermodynamics, and further development of quantum information geometry, Wasserstein geometry, and Rupiner geometry is also expected. In addition, information geometry will be something that is beginning to be applied in the field of artificial intelligence, in the interpretation of information in neural nets and neural firing patterns, and in the academic area connecting superstring theory and quantum information.

In the following pages of this blog, we will discuss this geometrical approach to data.

Mathematics of Optimization

First Time Optimization Reading Notes

First Time Optimization Reading Notes. This book is explained in detail in an easy-to-understand manner, making extensive use of graphic explanations so that the reader can understand concrete problems. The goal is to promote intuitive understanding of the problem and solution method, and to enable the reader to solve specific problems. We have endeavored to present the text in a concise manner, and have included side notes for additional explanations or other things that we would like to mention. Practice problems are provided at the end of each chapter, and answers to all problems are included at the end of the book. For the beginning student learning optimization problems for the first time, this book is a good book because it starts with a mathematical review, making it very easy to understand and learn.

Hesse Matrices and Regularity

Hesse Matrices and Regularity. A Hesse matrix is a matrix representation of the second-order partial derivatives of a multivariate function, in which the second-order partial derivatives for each variable of the multivariate function are stored in the Hesse matrix, just as the second-order derivatives of a single variable function are considered as second-order derivatives. Hesse matrices play an important role in many mathematical and scientific applications, such as nonlinear optimization and numerical analysis.

Cross-Entropy Loss

Cross-Entropy Loss. Cross-Entropy Loss is one of the common loss functions used in machine learning and deep learning to evaluate and optimize the performance of models in classification tasks, especially in binary classification (selecting one of two classes) and multi-class classification (selecting one of three or more It is a widely used method for binary classification (selecting one of two classes) and multiclass classification (selecting one of three or more classes) problems, among others.

Overview of the Gelman-Rubin Statistic and Related Algorithms and Examples of Implementations

Overview of the Gelman-Rubin Statistic and Related Algorithms and Examples of Implementations. The Gelman-Rubin statistic (or Gelman-Rubin diagnostic, Gelman-Rubin statistical test) is a statistical method for diagnosing convergence of Markov chain Monte Carlo (MCMC) sampling methods, particularly when MCMC sampling is done with multiple chains, where each chain will be used to evaluate whether they are sampled from the same distribution. This technique is often used in the context of Bayesian statistics. Specifically, the Gelman-Rubin statistic evaluates the ratio between the variability of samples from multiple MCMC chains and the variability within each chain, and this ratio will be close to 1 if statistical convergence is achieved.

Overview of Kronecker-factored Approximate Curvature (K-FAC) matrix and related algorithms and implementation examples

Overview of Kronecker-factored Approximate Curvature (K-FAC) matrix and related algorithms and implementation examples. Kronecker-factored Approximate Curvature (K-FAC) is a method for efficiently approximating the inverse of the Hessian matrix in machine learning optimization problems, as described in “Hesse Matrix and Regularity”. This method has attracted attention as an efficient and scalable optimization method, especially in the training of neural networks. K-FAC was developed to efficiently approximate the Fisher information matrix or the inverse of the Hesse matrix in neural network optimization problems, as described in “Overview of the Fisher Information Matrix and Related Algorithms and Examples of Implementations. This makes it possible to train neural networks with high efficiency even at large scales.

Overview of the Fisher Information Matrix and Related Algorithms and Examples of Implementations

Overview of the Fisher Information Matrix and Related Algorithms and Examples of Implementations. The Fisher information matrix is a concept used in statistics and information theory to provide information about probability distributions. This matrix is used to provide information about the parameters of a statistical model and to evaluate its accuracy. Specifically, it contains information about the expected value of the derivative of the probability density function (or probability mass function) with respect to its parameters.

Overview of Classification Problems Using Fisher’s Computational Method and Examples of Algorithms and Implementations

Overview of Classification Problems Using Fisher’s Computational Method and Examples of Algorithms and Implementations. Fisher’s Linear Discriminant is a method for constructing a linear discriminant model to distinguish between two classes, which aims to find a projection that maximizes the variance between classes and minimizes the variance within classes. Specifically, the following steps are used to construct the model.

Block K-FAC Overview, Algorithm, and Implementation Examples

Block K-FAC Overview, Algorithm, and Implementation Examples. Block K-FAC (Block Kronecker-factored Approximate Curvature) is a kind of curve chart (curvature information) approximation method used in deep learning model optimization.

Derivation of the Cramér-Rao Lower Bound (CRLB)

Derivation of the Cramér-Rao Lower Bound (CRLB). The Cramér-Rao lower bound provides a lower bound in statistics to measure how much uncertainty an estimator has. Information Matrix” described in “Overview of the Fisher Information Matrix and Related Algorithms and Examples of Implementations. The procedure for deriving the CRLB is described below.

Overview of Monte Carlo Dropout and Examples of Algorithms and Implementations

Overview of Monte Carlo Dropout and Examples of Algorithms and Implementations. Monte Carlo Dropout is a method for estimating uncertainty in neural network inference using dropout. Usually, dropout is a method to promote network generalization by randomly disabling nodes during training, but Monte Carlo Dropout uses this method during inference.

Overview of NUTS and Examples of Algorithms and Implementations

Overview of NUTS and Examples of Algorithms and Implementations. NUTS (No-U-Turn Sampler) is a type of Hamiltonian Monte Carlo (HMC) method, which is an efficient algorithm for sampling from probability distributions, as described in “MCMC Method for Stochastic Integral Calculations: Algorithms other than Metropolis Method (HMC Method)”. HMC is based on the Hamiltonian dynamics of physics and is a type of Markov chain Monte Carlo method. NUTS improves on the HMC method by automatically selecting the appropriate step size and sampling direction to achieve efficient sampling.

Overview of Procrustes Analysis and Related Algorithms and Examples of Implementations

Overview of Procrustes Analysis and Related Algorithms and Examples of Implementations. Procrustes analysis is a method for finding the optimal rotation, scaling, and translation between corresponding point clouds of two datasets. This method is mainly used when two datasets represent the same object or shape, but need to be aligned by rotation, scaling, or translation.

About Sequential Quadratic Programming

About Sequential Quadratic Programming. Sequential Quadratic Programming (SQP) is an iterative optimization algorithm for solving nonlinear optimization problems with nonlinear constraints. The SQP method is widely used as a numerical solution method for constrained optimization problems, especially in engineering, economics, transportation planning, machine learning, control system design, and many other areas of application.

Overview of Optimization by Integer Linear Programming (ILP) and Examples of Algorithms and Implementations

Overview of Optimization by Integer Linear Programming (ILP) and Examples of Algorithms and Implementations. Integer Linear Programming (ILP) is a method for solving mathematical optimization problems, especially for finding integer solutions under constraints. ILP is a type of Linear Programming (LP) with the additional conditions that the objective function and constraints are linear and the variables take integer values.

Overview of the Sequential Quadratic Programming (SQP) method, algorithm and implementation examples

Overview of the Sequential Quadratic Programming (SQP) method, algorithm and implementation examples. Sequential Quadratic Programming (SQP) is a numerical algorithm for solving non-linear constrained optimisation problems, where the objective function is optimised while satisfying constraints. SQP is characterised by its ‘efficiency’ in obtaining solutions to constrained problems with higher-order accuracy, often converging faster than other methods (e.g. gradient descent or interior point methods), and its ‘generality’ in being applicable to a variety of problems involving non-linear objective functions and constraints, Convergence: in general, it has quadratic convergence to a local optimum solution (when the problem satisfies the appropriate conditions).

Overview of Newton’s method and its algorithm and implementation

Overview of Newton’s method and its algorithm and implementation. Newton’s method (Newton’s method) is one of the iterative optimization algorithms for finding numerical solutions to nonlinear equations and functions, and is mainly used to find roots of equations, making it a suitable method for finding minima and maxima of continuous functions as well. Newton’s method is used in many machine learning algorithms because of its fast convergence.

The Modified Newton Method

The Modified Newton Method. The Modified Newton Method is an algorithm developed to improve the regular Newton-Raphson method to address several issues, and the main objective of the Modified Newton Method will be to improve convergence and numerical stability.

Quasi-Newton Method

Quasi-Newton Method. The Quasi-Newton Method (QNM) is an iterative method for solving nonlinear optimization problems. This algorithm is a generalization of the Newton method, which searches for the minimum of the objective function without computing higher derivatives (Hesse matrix). The quasi-Newton method is relatively easy to implement because it uses an approximation of the Hesse matrix and does not require an exact calculation of the Hesse matrix.

Newton-Raphson Method

Newton-Raphson Method. The Newton-Raphson Method (Newton-Raphson Method) is an iterative method for numerical solution of nonlinear equations and for finding the roots of a function, and the algorithm is used to approximate the zero point of a continuous function, starting from an initial estimated solution. The Newton-Raphson method converges quickly when the function is sufficiently smooth and is particularly effective when first derivatives (gradients) and second derivatives (Hesse matrices) can be computed.

Newton method rescaling

Newton method rescaling. Rescaling of Newton’s method is one of the methods used in numerical optimisation to improve convergence speed and to avoid problems related to singularities and locally optimal solutions. to improve convergence and stability of the calculation.

How to deal with singularities in Newton’s method

How to deal with singularities in Newton’s method. The Newton method, described in ‘Overview of the Newton method and its algorithm and implementation’, is a powerful method for finding solutions to non-linear equations, but problems can arise at singularities (e.g. points where the Jacobi matrix is singular or approximately singular). There are several methods for dealing with singularities, and the appropriate method should be chosen according to the type of problem and the characteristics of the solution.

How to improve linear convergence in Newton’s method

How to improve linear convergence in Newton’s method. The Newton method, described in ‘Overview of the Newton method and its algorithm and implementation’, is a very powerful method, especially for solving convex optimisation problems and non-linear equations, but the convergence speed is sometimes only linear. The following methods have been proposed to improve this.

Overview of penalty function methods, algorithms and implementation examples

Overview of penalty function methods, algorithms and implementation examples. The Penalty Function Method is a method for converting a constrained optimisation problem into an unconstrained optimisation problem, which allows existing unconstrained optimisation algorithms (e.g. the gradient method and the Newton method described in ‘Overview, algorithm and implementation of the Newton method’) to be used to solve the constrained problem. This makes it possible to solve constrained problems using existing unconstrained optimisation algorithms (e.g. the gradient method and the Newton method described in ‘Overview, algorithms and implementation of the Newton method’).

Overview, algorithms and implementation examples of Trust-Region Methods

Overview, algorithms and implementation examples of Trust-Region Methods. Trust-Region Methods are one of the algorithms for solving non-linear optimisation problems, designed to overcome the problems of the Newton method as described in ‘Overview, Algorithms and Implementations of Gradient Descent Methods and Newton Methods’. In this method, the approach is to approximate the optimisation problem within a small domain (the confidence region) and iteratively find the optimal solution within that domain.

Alternative Methods of Numerical Differentiation in Calculating Derivatives of Newton’s Method

Alternative Methods of Numerical Differentiation in Calculating Derivatives of Newton’s Method. In Newton’s method, the derivative \(f'(x)\) is used to find the roots of the function\(f(x)\), but when it is difficult to find the derivative analytically or the function is only given numerically, it is necessary to consider alternative methods of numerical differentiation. A standard method is to approximate derivatives using finite differences, and the following methods are commonly used.

Overview of the Leapfrog Method and Examples of Algorithms and Implementations

Overview of the Leapfrog Method and Examples of Algorithms and Implementations. The Leapfrog Method is a type of time integration method for numerically solving time-evolving equations of motion (especially Hamiltonian dynamical systems), and is often used to solve Newton’s equation of motion (F=ma). It is often used in molecular dynamics simulations and celestial mechanics.

The vanishing gradient problem and its countermeasures

The vanishing gradient problem and its countermeasures. The vanishing gradient problem is one of the problems that occur mainly in deep neural networks and often occurs when the network is very deep or when a specific architecture is used.

Overview of the Hilbert Wand Transform and Examples of Algorithms and Implementations

Overview of the Hilbert Wand Transform and Examples of Algorithms and Implementations. The Hilbert transform (Hilbert transform) is an operation widely used in signal processing and mathematics, and it can be a technique used to introduce an analyticity (analytic property) of a signal. The Hilbert transform converts a real-valued signal into a complex-valued signal, and the complex-valued signal obtained by the Hilbert transform can be used to extract phase and amplitude information from the original real-valued signal.

About Residual Coupling

About Residual Coupling. Residual Connection is a method for directly transferring information across layers in deep learning networks, which was introduced to address the problem of gradient loss and gradient explosion, especially when training deep networks. Residual coupling was proposed by Kaiming He et al. at Microsoft Research in 2015 and has since been very successful.

Overview of Maximum Likelihood Estimation and its Algorithm and Implementation

Overview of Maximum Likelihood Estimation and its Algorithm and Implementation. Maximum Likelihood Estimation (MLE) is an estimation method used in statistics. This method is used to estimate the parameters of a model based on given data or observations. Maximum likelihood estimation attempts to maximize the probability that data will be observed when the values of the parameters are changed. This section provides an overview of this maximum likelihood estimation method, its algorithm, and an example implementation in python.

Dual Problem and Lagrange Multiplier Method

Dual Problem and Lagrange Multiplier Method. Dual problems are an important concept in mathematical optimization theory. In optimization problems, when considering the problem of minimizing or maximizing an objective function under given constraints, there exists a related dual problem.

The dual problem has the same properties as the original optimization problem, but the roles of the constraints and the objective function are exchanged. Specifically, if the original problem is to minimize the objective function, the dual problem is to maximize the objective function.

This section describes this dual problem based on the Lagrange multiplier method.

Overview of the Gradient Method and Examples of Algorithms and Implementations

Overview of the Gradient Method and Examples of Algorithms and Implementations. The gradient method is one of the widely used methods in machine learning and optimization algorithms, whose main goal is to iteratively update parameters in order to find the minimum (or maximum) value of a function. In machine learning, the goal is usually to minimize the cost function (also called loss function). For example, in regression and classification problems, a cost function is defined to represent the error between predicted and actual values, and it helps to find the parameter values that minimize this cost function.

This section describes various algorithms for this gradient method and examples of implementations in various languages.

Stochastic Gradient Descent (SGD) Overview, Algorithm and Implementation Examples

Stochastic Gradient Descent (SGD) Overview, Algorithm and Implementation Examples. Stochastic Gradient Descent (SGD) is an optimization algorithm widely used in machine learning and deep learning. parameters are updated. The basic concepts and features of SGD are described below.

Overview of Natural Gradient Descent and Examples of Algorithms and Implementations

Overview of Natural Gradient Descent and Examples of Algorithms and Implementations. Natural Gradient Descent is a type of Stochastic Gradient Descent (SGD) described in “Overview of Stochastic Gradient Descent (SGD), Algorithms, and Implementation Examples. It is a type of Stochastic Gradient Descent (SGD), which is an optimization method for efficiently updating model parameters, and is an approach that takes into account the geometric structure of the model parameter space and appropriately scales the gradient information.

Gaussian-Hermite Integration Overview, Algorithm and Implementation

Gaussian-Hermite Integration Overview, Algorithm and Implementation. Gaussian-Hermite Integration is a method of numerical integration often used for stochastic problems, especially those in which the probability density function has a Gaussian distribution (normal distribution), and for integrals such as the wave function in quantum mechanics. The Gauss-Hermite polynomial is used to approximate the integral. This section provides an overview, algorithm, and implementation of the Gauss-Hermite integral.

Overview of the Ornstein-Uhlenbeck process and examples of algorithms and implementations

Overview of the Ornstein-Uhlenbeck process and examples of algorithms and implementations. The Ornstein-Uhlenbeck process is a type of stochastic process, especially one used to model the motion of a random variable in continuous time. The process has been widely applied in various fields, including physics, finance, statistics, and machine learning. The Ornstein-Uhlenbeck process is obtained by introducing resilience into Brownian motion (or Wiener process). Normally, Brownian motion represents random fluctuations, but in the Ornstein-Uhlenbeck process, a recovery force is added to that random fluctuation to move it back toward some average.

Overview of Model Predictive Control (MPC) and Examples of Algorithms and Implementations

Overview of Model Predictive Control (MPC) and Examples of Algorithms and Implementations. Model Predictive Control (MPC) is a control theory technique that uses a model of the control target to predict future states and outputs, and an online optimization method to calculate optimal control inputs. MPC is used in a variety of industrial and control applications.

Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method

Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method.The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is a type of numerical optimization algorithm for solving nonlinear optimization problems. The BFGS method is known as the quasi-Newton method and provides effective solutions to many real-world optimization problems.

Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) Method

Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) Method. The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method is an algorithm that uses the “Broyden-Fletcher -The L-BFGS method, like the BFGS method, is a form of the quasi-Newton method that uses an inverse approximation of the Hesse matrix The L-BFGS method, like the BFGS method, is a form of quasi-Newtonian method that minimizes the objective function using an inverse approximation of the Hesse matrix. However, the L-BFGS method is designed to reduce memory consumption and is particularly suited to high-dimensional problems.

Conjugate Gradient Method

Conjugate Gradient Method. The conjugate gradient method is a numerical algorithm used for solving linear systems of linear equations and nonlinear optimization problems. It can also be applied as a quasi-Newtonian method for nonlinear optimization problems.

Trust Region Method

Trust Region Method. The Trust Region Method is an optimization algorithm for solving nonlinear optimization problems, which is used to find a solution under constraints in minimizing (or maximizing) an objective function. The Trust Region Method is suitable for constrained optimization problems and nonlinear least squares problems, and is particularly useful for finding globally optimal solutions.

Stochastic Gradient Langevin Dynamics (SGLD) Overview, Algorithm and Implementation Examples

Stochastic Gradient Langevin Dynamics (SGLD) Overview, Algorithm and Implementation Examples. Stochastic Gradient Langevin Dynamics (SGLD) is a stochastic optimization algorithm that combines stochastic gradient and Monte Carlo methods. SGLD is widely used in Bayesian machine learning and Bayesian statistical modeling to estimate the posterior distribution.

Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) Overview, Algorithm, and Implementation Examples

Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) Overview, Algorithm, and Implementation Examples. Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) is a type of Hamiltonian Monte Carlo (HMC), which is a stochastic sampling method combined with a stochastic gradient method and is used to estimate the posterior distribution of large data sets and high-dimensional parameter spaces. data sets and high-dimensional parameter space, making it suitable for Bayesian statistical inference.

Overview of the ε-Greedy Method (ε-Greedy) and Examples of Algorithms and Implementations

Overview of the ε-Greedy Method (ε-Greedy) and Examples of Algorithms and Implementations. The ε-greedy method (ε-greedy) is a simple and effective strategy for dealing with the trade-off between search and exploitation (exploitation and exploitation), such as reinforcement learning. The algorithm is a method to adjust the probability of choosing the optimal action and the probability of choosing a random action.

Overview of Markov Decision Processes (MDPs) and Examples of Algorithms and Implementations

Overview of Markov Decision Processes (MDPs) and Examples of Algorithms and Implementations. A Markov Decision Process (MDP) is a mathematical framework in reinforcement learning that is used to model decision-making problems in environments where agents receive rewards associated with states and actions. and Markov properties of the process.

Overview of the Davidon-Fletcher-Powell (DFP) method, its algorithm, and examples of its implementation

Overview of the Davidon-Fletcher-Powell (DFP) method, its algorithm, and examples of its implementation. The DFP method (Davidon-Fletcher-Powell method) is one of the numerical optimization methods and is particularly suitable for nonlinear optimization problems. This method is characterized by using a quadratic approximation approach to find the optimal search direction, and the DFP method belongs to the category of quasi-Newton methods, which seek the optimal solution while updating the approximation of the inverse of the Hesse matrix.

Overview of the Frank-Wolfe method and examples of applications and implementations

Overview of the Frank-Wolfe method and examples of applications and implementations. The Frank-Wolfe method is a numerical algorithm for solving non-linear optimisation problems, proposed by Marguerite Frank and Philippe Wolfe in 1956. The Frank-Wolfe method is also related to linear programming problems and can be applied to continuous optimisation problems. However, its convergence speed may be slower than that of general optimisation algorithms, and therefore other efficient algorithms may be preferred for high-dimensional problems. The Frank-Wolff method is useful in large-scale and constrained optimisation problems and is widely used in machine learning, signal processing and image processing. The Frank-Wolff method is also often used in combination with other optimisation methods.

Stochastic Optimization

With the development of the Internet, sensor technology, and computers, a large amount of data of various types is now available, and machine learning technology has been attracting attention and making great progress in extracting meaningful information from large amounts of data. While the availability of a wide variety of data has made it possible to solve a wide range of problems, it has also created the problem of increased computational complexity due to the increased size of the data.

In order to fully utilize machine learning technology, it is necessary to learn large amounts of data as efficiently as possible. Stochastic optimization is a powerful method for solving large-scale learning problems with large amounts of data and is a fundamental component of current machine learning. In this section, we will discuss various methods of stochastic optimization in machine learning.

In the following pages of this blog discuss stochastic optimization.

Statistical Learning Theory

Theories on the statistical properties of machine learning algorithms can be used to theoretically elucidate the performance of machine learning algorithms and the effects of data set size and complexity to improve model selection and the learning process. These theories include the theory of generalization error, the theory of algorithm performance, and the theory of stochastic gradient descent.

In the following pages of this blog, we will discuss the details of this statistical learning theory.

Sequential optimization for machine learning

In today’s highly information-oriented society, gaining useful insights from data has become an extremely important issue for making important scientific discoveries and promoting industry and economic society. Today, data are becoming larger, more dimensional, and more diverse. As a fundamental technology for appropriately handling such data, a field called machine learning has emerged in recent years and is developing rapidly. In traditional academic fields that deal with data, such as statistics and data science, modeling techniques have been developed mainly to perform statistical inference appropriately.

Machine learning, on the other hand, is unique in that it focuses more on the development of efficient computational methods. In the era of big data, various methods developed in the field of machine learning are having a significant impact on society.

In the following pages of this blog, we will discuss the method of continuous optimization, which is an indispensable computational method for constructing machine learning algorithms.

Algorithms and Mathematics

Can you program that formula?

Can you program that formula?. The theme of the article is programming, but unlike most programming books, in addition to algorithms and code, it includes mathematical proofs and historical background on mathematical discoveries from ancient to modern times.

To be more specific, the theme is generic programming. Generic programming is a programming technique that emerged in the 1980s, and the C++ TL (Standard Template Library) was developed in the 1990s. Generic programming is a programming technique that focuses on designing algorithms and data structures to make them work in the most common environments without reducing efficiency.