Algorithms as the basis for programming in artificial intelligence, machine learning, etc.

Mathematics Machine Learning Artificial Intelligence Graph Data Algorithm Programming Digital Transformation Navigation of this blog

Algorithms and Data Structures

Algorithms and data structures are fundamental concepts in computer science and are closely related.

An algorithm represents a process that takes input, processes it in some procedure, and finally returns an output, which is an important method used in many fields, including computer science, mathematics, physics, economics, and psychology. An algorithm can also be referred to as a procedure that clearly defines the steps or method of solving a certain problem.

Algorithms are closely related to mathematics. Mathematics allows us to formalize abstract problems and transform them into concrete problems, which can then be formalized into algorithms. This would specifically be the design of search algorithms, for example, using set theory and logic in discrete mathematics.

Such a mathematical approach is used not only in the development of algorithms, but also in the evaluation of algorithms. Algorithms are evaluated in terms of correctness, efficiency, and generality: a correct algorithm should return the correct output for all inputs; an efficient algorithm should be fast, requiring less processing time and resources; a general-purpose algorithm should be able to process different inputs in the same designed to process different inputs in the same procedure. All of these evaluations have a mathematical representation.

Specific algorithms include basic algorithms such as sorting, searching, pattern recognition, database, error correction, and encryption, as well as simulation algorithms such as Markov chain Monte Carlo methods. Some examples of them are listed below.

Bubble Sort Algorithm: The bubble sort algorithm is used to sort an array of numbers in ascending or descending order. The algorithm sorts the entire array by comparing two adjacent elements and repeating the exchange as necessary.
Dijkstra’s Shortest Path Algorithm: Dijkstra’s Shortest Path Algorithm is used to solve the shortest path problem for graphs. The algorithm finds the shortest path to all nodes by calculating the shortest distance from the starting point to each node and selecting the next node after the node whose shortest distance is determined.
Quick Sort Algorithm: The quick sort algorithm is used to sort an array similar to bubble sort. This algorithm sorts the entire array by dividing the array, selecting an element called a pivot for each sub-array, and moving smaller and larger values to the left and right of the pivot, respectively.
Linear Search Algorithm: The linear search algorithm is used to search for a given element in a data structure such as an array or list. The algorithm compares elements sequentially from the beginning of the array and continues until the specified element is found.

Data structures refer to methods for efficiently storing, retrieving, and manipulating data. In programming, it is important to select the appropriate data structure to manipulate data efficiently. There are the following types of data structures.

Array: A data structure for storing the same type of data continuously. Access to elements is fast, but adding and deleting elements is costly.
List: A data structure that allows elements to be added or deleted while maintaining data order. Access to elements is time consuming.
Stack: A data structure that allows elements to be added and removed in a LIFO (Last In, First Out) fashion. It allows access to elements from the top of the stack at high speed.
Queue: A data structure that allows elements to be added or deleted in a FIFO (First In, First Out) fashion. It enables fast access to elements from the head of the queue.
Tree: A hierarchical data structure that can have multiple child nodes from the root node. It allows for fast data insertion and retrieval.
Graph: A data structure composed of nodes and edges that can express complex relationships. Various algorithms exist for graph search and shortest path search.

In order to efficiently manipulate the data structure, the appropriate algorithm must be used. For example, when searching for elements in an array, a binary search algorithm is more efficient than a linear search algorithm. Also, when finding the shortest path in a graph, various algorithms exist, such as Dijkstra’s shortest path algorithm or Bellman-Ford’s algorithm.

Thus, a particular data structure has a particular algorithm that deals with it, and different data structures may have different algorithms to solve the same problem.

Furthermore, the choice of algorithm can significantly change the efficiency of the data structure. For example, a list is more efficient than an array when inserting elements in the middle of a list, and an array is more efficient when accessing elements in a list randomly.

Algorithms are especially important in computer programming, where many programs work by executing algorithms designed to solve a particular problem. In programming, designing an algorithm is one of the most fundamental and important steps in the design of a computer program.

An algorithm is also a function, a chain of operations linked in a certain order. Furthermore, a chain of operations linked in a certain order is also programming itself, and is the link between the world of mathematics and the world of programming. The data structures and analysis methods that emerge in the world of algorithms are the basis for the use of various languages and machine learning/artificial intelligence techniques.

In this blog, we discuss the theory and implementation of various algorithms below.

Basic Algorithms

Introduction to Algorithms

This is the third edition of a comprehensive textbook on computer algorithm theory written for teaching at MIT by four world-renowned experts in the fundamental field of computer science. The previous editions have already established themselves as world-standard textbooks on algorithms and data structures, but the book has been extensively revised once again, including the addition of new chapters and sections based on a comprehensive review of the descriptions to improve the textbook.
The book not only explains algorithms in an easy-to-understand manner, but also explains in scientific detail what concepts are necessary and how they are backed up by analysis before the final algorithm is designed.
The book also contains 957 exercises at the end of each section and 158 problems of various levels at the end of each chapter, making it a textbook for undergraduate and graduate courses, a handbook for technical specialists, and an encyclopedia of algorithms.

Overview of Search Algorithms and Various Algorithms and Implementations

Search Algorithm (Search Algorithm) refers to a family of computational methods used to find a target within a problem space. These algorithms have a wide range of applications in a variety of domains, including information retrieval, combinatorial optimization, game play, route planning, and more. This section describes various algorithms, their applications, and specific implementations with respect to these search algorithms.

Overview of Self-Adaptive Search Algorithms and Examples of Applications and Implementations

Self-Adaptive Search Algorithms, or Self-Adaptive Search Algorithms, are a family of algorithms used in the context of evolutionary computation and optimization, where the parameters and strategies within the algorithm are characterized by adaptive adjustment to the problem. These algorithms are designed to adapt to changes in the nature of the problem and the environment in order to efficiently find the optimal solution. This section describes various algorithms and examples of implementations with respect to this self-adaptive search algorithm.

Overview of Multi-Objective Search Algorithm and Examples of Application and Implementation

Multi-Objective Search Algorithm (Multi-Objective Optimization Algorithm) is an algorithm for optimizing multiple objective functions simultaneously. Multi-objective optimization aims to find a balanced solution (Pareto optimal solution set) among multiple optimal solutions rather than a single optimal solution, and such problems have been applied to many complex systems and decision-making problems in the real world. This section provides an overview of this multi-objective search algorithm and examples of algorithms and implementations.

Exploratory Ranking Overview, Algorithm and Example Implementation

Exploratory Ranking is a technique for identifying items that are likely to be of interest to users in ranking tasks such as information retrieval and recommendation systems. This technique aims to find the items of most interest to the user among ranked items based on the feedback given by the user.

Overview of Maximum Marginal Relevance (MMR) and Examples of Algorithms and Implementations

Maximum Marginal Relevance (MMR) is a ranking method for information retrieval and information filtering that aims to optimize the ranking of documents provided to users by information retrieval systems. MMR was developed as a method for selecting documents that are relevant to the user’s interests from among multiple documents. The method will rank documents based on both the relevance and diversity of each document, specifically emphasizing the selection of documents that are highly relevant but have low similarity to other options.

Overview of the Minimax Method and Examples of Algorithms and Implementations

The minimax method is a type of search algorithm widely used in game theory and artificial intelligence, which is used to select the optimal move in a perfect information game (a game in which both players know all information). Typical games include chess, shogi, Othello, and Go.

Alpha-beta pruning: overview, algorithm, and implementation examples

Alpha-beta pruning is a type of search algorithm used in the fields of artificial intelligence and computer games. This is a common approach used in combination with tree search algorithms such as the minimax method described in “Overview of the Minimax Method, Algorithms and Examples of Implementation. This algorithm is used to efficiently find a solution by reducing unnecessary search when searching the tree structure of a game. Specifically, the possible combinations of moves in a game are represented by a tree structure, and unnecessary moves are removed during the search, thereby reducing computation time.

Overview of Monte Carlo Tree Search and Examples of Algorithms and Implementations

Monte Carlo Tree Search (MCTS), a type of decision tree search, is a probabilistic method for exploring the state space of a game to find the optimal action, and is a particularly effective approach in games and decision-making problems.

Overview of UCT (Upper Confidence Bounds for Trees), Algorithm and Example Implementation

UCT (Upper Confidence Bounds for Trees) is an algorithm used in the selection phase of Monte Carlo Tree Search (MCTS), which aims to balance the search value of each node in the search. It is important to strike a balance between exploration and utilization. That is, the more nodes being explored are visited, the higher the value of the node will be estimated, but at the same time, the unexplored nodes will be given an appropriate opportunity to be explored.

Overview of Information Set Monte Carlo Tree Search (ISMCTS) and Examples of Algorithms and Implementations

Information Set Monte Carlo Tree Search (ISMCTS) is a variant of Monte Carlo Tree Search (MCTS) used in games such as incomplete information games (e.g. poker) and information hiding games (e.g. Go, Shogi). The characteristic feature of this method is that it handles groups of game states, called information sets, when searching the game tree by applying MCTS.

Overview of Nested Monte Carlo Search (NMC) and Examples of Algorithms and Implementations

Nested Monte Carlo Search (NMC) is a type of Monte Carlo Tree Search (MCTS), which is a method for efficiently exploring a search space.

Rapid Action Value Estimation (RAVE) Overview, Algorithm, and Example Implementation

Rapid Action Value Estimation (RAVE) is a game tree search method developed as an extension of Monte Carlo Tree Search (MCTS) described in “Overview of Monte Carlo Tree Search, Algorithms and Examples”. RAVE is used to estimate the value of moves selected during game tree search. While the usual MCTS uses statistics of the moves explored to estimate the value of moves when the model is incomplete or as the search progresses, RAVE improves on this and aims to find suitable moves more quickly.

Overview of Ranking Algorithms and Examples of Implementations

A ranking algorithm is a method for sorting a given set of items in order of most relevance to the user, and is widely used in various fields such as search engines, online shopping, and recommendation systems. This section provides an overview of common ranking algorithms.

Random Forest Ranking Overview, Algorithm and Implementation Examples

Random Forest is a very popular ensemble learning method in the field of machine learning (a method that combines multiple machine learning models to obtain better performance than individual models). This approach combines multiple Decision Trees to build a more powerful model. There are many variations in ranking features using random forests.

Diversity-Promoting Ranking Overview, Algorithm, and Implementation Example

Diversity-Promoting Ranking is one of the methods that play an important role in information retrieval and recommendation systems, which aim to make users’ information retrieval results and the list of recommended items more diverse and balanced. This will be the case. Usually, the purpose of ranking is to display items that match the user’s interests at the top, but at this time, multiple items with similar content and characteristics may appear at the top. For example, in a product recommendation system, similar items or items in the same category often appear at the top of the list. However, because these items are similar, they may not adequately cover the user’s interests, leading to information bias and limiting choices, and diversity promotion ranking is used to address these issues.

Overview of Rank SVM, Algorithm and Implementation Example

Rank SVM (Ranking Support Vector Machine) is a type of machine learning algorithm applied to ranking tasks, especially for ranking problems in information retrieval and recommendation systems. Related papers include “Optimizing Search Engines using Clickthrough Data” and “Ranking Support Vector Machine with Kernel Approximation”.

Diversified Top-k Retrieval (DTkR) Overview, Algorithm and Example Implementation

Diversified Top-k Retrieval (DTkR) is a method for obtaining diversified top-k search results in information retrieval and ranking tasks, aiming to obtain search results with different perspectives and diversity rather than simple Top-k results. In general Top-k search, the objective is simply to obtain the top k results with the highest scores, but similar results tend to rank high and lack diversity. On the other hand, DTkR aims to make the search results more diverse and different, and can perform information retrieval with diversity that cannot be obtained with simple Top-k search results.

Overview of Submodular Diversification and examples of algorithms and implementations

Submodular Diversification is a method for selecting the top k items with diversity in information retrieval and ranking tasks. The basis of Submodular Diversification is the Submodular function, also described in “Submodular Optimisation and Machine Learning”, which is a set function \( f: 2^V \rightarrow \mathbb{R} \), with the following properties.

Overview of Cluster-based Diversification and examples of algorithms and implementations

Cluster-based Diversification is a method for introducing diversity into a recommendation system using clustering of items. In this method, similar items are grouped into the same cluster and diversity is achieved by selecting items from different clusters.

Overview, algorithms and implementation examples of neural ranking models

A neural ranking model is a type of machine learning model used in search engines and recommendation systems, where the main objective is to sort items (e.g. web pages, products, etc.) in the best ranking based on given queries and user information. For a typical search engine, it is important to display first the web pages that are most relevant to the user’s query, and to achieve this, the search engine considers a number of factors to determine the ranking of web pages. These include keyword match, page credibility and the user’s previous click history.

Overview, algorithms and implementation examples of personalised ranking

Personalised ranking is a ranking method that provides items in the most suitable rank for each user. While general ranking systems present items in the same rank for all users, personalised ranking takes into account the individual preferences and behaviour of the user and Personalised ranking takes into account the user’s individual preferences and behaviour and ranks items in the most appropriate order for that user. The purpose of personalised ranking is to increase user engagement by showing items that are likely to be of interest to the user at a higher rank, increase user engagement, increase user purchases, clicks and other actions, and increase conversion rates Increased conversion rates, users find the information and products they are looking for more quickly, which increases user satisfaction, which increases user satisfaction, and so on.

Overview of Beam Search and Examples of Algorithms and Implementations

Beam Search is a search algorithm mainly applied to combinatorial optimization and finding meaningful solutions. It is mainly used in areas such as machine translation, speech recognition, and natural language processing.

Heuristic Search (Hill Climbing, Greedy Search, etc.) Based Structural Learning

Structural learning based on heuristic search is a method that combines heuristic methods for searching the architecture and hyperparameters of machine learning models to find the optimal model or structure, and heuristics are intuitive and simple rules or approach. Below we describe some common methods related to heuristic search-based structure learning.

Overview of Random Walks, Algorithms and Examples of Implementations

Random Walk is a basic concept used in graph theory and probability theory to describe random movement patterns in graphs and to help understand the structure and properties within a graph.

Overview of Message Passing in Machine Learning and Examples of Algorithms and Implementations

Message passing in machine learning is an effective approach to data and problems with graph structures, and is a widely used technique, especially in methods such as Graph Neural Networks (GNN).

Mathematical Metaheuristics Reading Notes

Meta-heuristics is an organic combination of empirical methods (heuristics) for solving difficult optimization problems. Recently, it has become a popular optimization algorithm among practitioners as a framework for solving practical problems with ease. When solving practical problems, a robust solution can be designed in a relatively short time if one has some programming skills, an eye for selecting metaheuristics, and a knack for design.

Metaheuristics and Applications Reading Notes

This book systematically summarizes algorithms and applications of a new optimization method, meta-heuristics. The Algorithms section covers a wide range of state-of-the-art methods such as the Memetic algorithm, artificial immune systems, ant colony optimization methods, Particle Swarm Optimization, etc. The Applications section covers a wide range of scheduling problems, delivery planning problems, applications to structural optimal design, etc. The applications section covers a wide range of scheduling problems, delivery planning problems, and applications to structural optimal design. This book can be used as a handbook and is a must-have for students, engineers, and researchers involved in systems engineering and information technology.

About LLE (Locally Linear Embedding)

LLE (Locally Linear Embedding) is a nonlinear dimensionality reduction algorithm that embeds high-dimensional data into a lower dimension. It assumes that the data is locally linear and reduces the dimension while preserving the local structure of the data. It is primarily used for tasks such as clustering, data visualization, and feature extraction.

About Multidimensional Scaling (MDS)

Multidimensional Scaling (MDS) is a statistical method for visualizing multivariate data that provides a way to place data points in a low-dimensional space (usually two or three dimensions) while preserving distances or similarities between the data. This technique is used to transform high-dimensional data into easily understandable low-dimensional plots that help visualize data features and clustering.

About t-SNE (t-distributed Stochastic Neighbor Embedding)

t-SNE is a nonlinear dimensionality reduction algorithm that embeds high-dimensional data into lower dimensions. t-SNE is mainly used for tasks such as data visualization and clustering, where its particular strength is its ability to preserve the nonlinear structure of high-dimensional data. t-SNE’s main ideas are The main idea of t-SNE is to reflect the similarity of high-dimensional data in a low-dimensional space.

About UMAP (Uniform Manifold Approximation and Projection)

UMAP is a nonlinear dimensionality reduction method for high-dimensional data, which aims to embed the data in a lower dimension while preserving its structure. It is used for visualization and clustering in the same way as t-SNE (t-distributed Stochastic Neighbor Embedding) described in “About t-SNE (t-distributed Stochastic Neighbor Embedding)” but adopts a different approach in some respects.

Overview of Dynamic Programming and Examples of Application and Implementation in python

Dynamic Programming is a mathematical method for solving optimization problems, especially those with overlapping subproblems. Dynamic programming provides an efficient solution method because it dramatically reduces the amount of exponential computation by saving and reusing the results once computed. This section describes various algorithms and specific implementations in python for this dynamic programming.

Dynamic Programming Algorithms and Data Structures

It is an effective approach for programming and algorithm design to record the results of a calculation in a memo and reuse them to improve efficiency while avoiding the wasteful repetition of the same calculation. One such approach is dynamic programming.

Search engine matching algorithms Matching strings in search engines

Introduction to string matching algorithms for search engines (n-grams, meta word indexing, etc.)

Search Engine Page Rank Algorithm

About the pagerank algorithm that determines search rankings, which is one of the search engine algorithms that has dramatically improved google.

Overview of Exponential Smoothing and Examples of Algorithms and Implementations

Exponential Smoothing is a statistical method used for forecasting and smoothing time series data, especially for forecasting future values based on past observations. Exponential smoothing is a simple but effective method that allows for weighting against time and adjusting for the effect of past data.

Pattern recognition algorithm Nearest neighbor, decision tree, neural net

Introduction to nearest neighbor methods, decision trees, and neural networks as basic algorithms for pattern recognition.

Database Algorithms (1) Database Consistency

One of the biggest differences between a database and other information storage methods is that the information in the database has a predefined structure. Another characteristic is “consistency”.

Consistency means that the information contained in the database does not contradict each other even if the data is moved in and out of the database. In this paper, three algorithms are described to achieve these characteristics. The first one is “write log ahead”, the second one is “two-stage commit”, and the last one is “relational database”.

Database algorithms (2) Relational databases

A database is data with a structure, the simplest of which is a table structure. Sometimes it is more efficient to divide the table structure into multiple parts instead of just one.

This simple structure saves the memory of the computer and also has the advantage that when changing the data, all the repeated information (class information) can be changed at once instead of changing it one by one.

The data (class number in the above example) that is commonly used in the two split tables is called the key. The database uses this key to access the data as a chunk, and this chunk is called a “B-tree” in computer science.

Data compression algorithm (1) Lossless compression

There are two main types of compression technology: “lossy compression” and “lossy compression. Lossy compression can reproduce the original data perfectly even when it is compressed, so it is normally sufficient to have only one compression technique, but it has the problem that the compression ratio cannot be that good, so lossy compression is used.

The basic principle of “lossless compression” is to find the repetitive parts of the data and omit them.

Data compression algorithm (2) Lossy compression

An introduction to data compression algorithms used in image information (JPEG) and other applications.

Proof of Indeterminacy About problems that cannot be calculated

This article is about what computers (algorithms) cannot do. The algorithms here are not about hard problems like driving a car perfectly or evaluating a student’s performance, but essentially about problems that cannot be computed.

The subject of the explanation is about a checking tool that detects bugs and crashes in software. The conclusion is that it can be proven that it is impossible for any program tool to find all possible crash locations.

error-correcting code algorithm

It defines the three jobs that a computer does. The first is to perform calculations, the second is to store data, and the third is to transmit data. From a software point of view, the function of storing data is represented by database technology, and the function of transmitting data is represented by the Internet and other web technologies.

When considering the transmission and storage of data, the most important thing is to store or transmit the data “completely and accurately. On the other hand, hardware such as memory and communication is always subject to noise and malfunctions, and 100% operation cannot be guaranteed. Therefore, it is necessary to ensure “complete accuracy” by using software, and one of the technologies to achieve this is “error correction technology”.

The method of storing and transmitting data with “perfect accuracy” is to increase the redundancy of the data. For example, if the number “5123” is transmitted in the form of “five one two three,” in the former case, even if a single character error occurs, the original data will be completely lost, but in the latter case, for example, even if one of the characters in “five” is replaced by “fave,” there is a high possibility that it can be corrected to “five” using the information before and after. In the latter case, for example, even if one of the letters in “five” is replaced by “fave,” it is likely to be corrected to “five” with the information before and after.

digital signature algorithm

The “signature” of a digital signature is defined as something that can be read but cannot be copied by anyone. This is the complete opposite of the concept of “digital” which can be copied by anyone. Digital signatures provide a solution to this paradox.

The purpose of a digital signature is not to sign something that you send to someone else, like a regular letter, but to check a computer when someone sends you something that is signed by someone else.

There are two requirements for a signature: (1) the place where the original signature is kept must be a trustworthy place, and (2) the signature cannot be easily forged by a third party.

Public-key Cryptography

The simplest cryptographic technique starts with “having a secret to share”. When you send some data to a specific person, you have a secret number that you share only with that person. (e.g. 322) Then, when you send 8 as data, you send the added number to the other person (322+8=400), and the other person can subtract the shared number from the sent number.

To make this even more practical, you can send a longer number, for example, one that is not easily recognized as a shared number. (For example, there are 999 ways to use a 3-digit number as a secret number, but a computer can easily try all combinations at this level. This is a larger number than a trillion times a trillion. It would take a billion years to calculate this on a current computer, so we can conclude that it is a secure method.

Data sorting Bubble sort, quick sort, merge sort, selection sort, heap sort

Sorting (rearranging) data is the basis of algorithms. Sorting algorithms include the following.

Bubble sort, which repeats the process of comparing and replacing adjacent values; Quick sort, which determines a standard value called a pipot and repeatedly divides a group of data into two groups, one above the standard value and the other below the standard value, to replace elements; Merge sort, which decomposes data to the smallest unit (single element) and repeatedly sorts and merges elements from the smallest unit.

Selective sorting” involves finding the minimum value in the first and subsequent elements and exchanging it with the first element, then finding the minimum value in the second and subsequent elements and exchanging it with the second element, and so on, and “Insertion sorting” involves taking two elements from the front and reordering them if they are in reverse order, then taking the third element and inserting it in the appropriate place among up to two elements, and so on.

In “heap sort,” the original data is converted into an ordered tree as shown below, and the children and parents of the converted ordered tree are compared, and if the parent is smaller than the child, the parent and child are exchanged.

Can you program that formula?

The theme of the book is programming, but unlike most programming books, in addition to algorithms and code, it includes mathematical proofs and historical background on mathematical discoveries from ancient to modern times.

To be more specific, the theme is generic programming. Generic programming is a programming technique that emerged in the 1980s, and the C++ TL (Standard Template Library) was developed in the 1990s. Generic programming is a programming technique that focuses on designing algorithms and data structures to make them work in the most common environments without reducing efficiency.

A Textbook of Algorithms in Python

This book discusses the basics of Python programming and basic algorithm construction as shown in the table of contents below.

The content of the book begins with a basic knowledge of pyhtpn and how to handle basic data structures such as vectors, arrays, stacks, queues, lists, trees, and graphs in python, search algorithms such as linear search, binary trees, and tree search, sorting algorithms such as selection sort, bubble sort, insertion sort, quicksort, merge sort, heap sort, hash and Euclidean reciprocity, string search, and heap sort. Sorting algorithms such as selection sort, bubble sort, insertion sort, quick sort, merge sort, heap sort, etc., specific algorithms such as hash and Euclidean reciprocity, string search, shortest path problem, etc., and finally understanding of algorithms through visualization of algorithms, etc.

Implementation of a simple recommendation algorithm using Clojure

A recommendation system is an information system that attempts to predict user preferences and tastes for an item. A recommendation system is an information filtering system that aims to provide useful information to the user. A recommendation system uses the user’s behavioral history or recommends items that other users like. These two approaches form the basis of the two types of algorithms (content-based filtering and collaborative filtering) used in recommendation systems.

Mathematics in Machine Learning

According to the wiki, “Mathematics is generally classified as a formal science, distinct from the natural sciences. Ultimately, regardless of methodology, mathematical results are not the result of experimentation or observation as in the natural sciences. It goes on to say

Formal science is an abstract structure described by a symbolic system, which means that mathematics abstracts the structure (model) of things using some symbols, and draws results by making inferences using the abstracted model.

By using mathematics, algorithms used in machine learning can be described rigorously and without error. The inference functions used in artificial intelligence also rely on the formal scientific rigor of mathematics. This rigor is the opposite of a highly ambiguous symbolic system such as natural language.

In the following pages of this blog, I will summarize the information that I think is necessary for the use of these mathematics in computers such as artificial intelligence and machine learning.

Mathematics of Optimization

First Time Optimization Reading Notes

This book is explained in detail in an easy-to-understand manner, making extensive use of graphic explanations so that the reader can understand concrete problems. The goal is to promote intuitive understanding of the problem and solution method, and to enable the reader to solve specific problems. We have endeavored to present the text in a concise manner, and have included side notes for additional explanations or other things that we would like to mention. Practice problems are provided at the end of each chapter, and answers to all problems are included at the end of the book. For the beginning student learning optimization problems for the first time, this book is a good book because it starts with a mathematical review, making it very easy to understand and learn.

Stochastic Optimization

With the development of the Internet, sensor technology, and computers, a large amount of data of various types is now available, and machine learning technology has been attracting attention and making great progress in extracting meaningful information from large amounts of data. While the availability of a wide variety of data has made it possible to solve a wide range of problems, it has also created the problem of increased computational complexity due to the increased size of the data.

In order to fully utilize machine learning technology, it is necessary to learn large amounts of data as efficiently as possible. Stochastic optimization is a powerful method for solving large-scale learning problems with large amounts of data and is a fundamental component of current machine learning. In this section, we will discuss various methods of stochastic optimization in machine learning.

In the following pages of this blog discuss stochastic optimization.

Statistical Learning Theory

The purpose of this content is to understand the theory of statistical properties of machine learning algorithms. For this purpose, we mainly discuss (a) the law of large uniform numbers, (b) the universal kernel, and (c) discriminant fitting loss. Many domestic and foreign works describe (a), and the Foundation of Machine Learning (Mohri, MIT Press) develops a prospective theory of (a) with Rademacher complexity as the main axis. On the other hand, there are few works that systematically explain (b) and (c).

In the following pages of this blog, we will discuss the details of this statistical learning theory.

Sequential optimization for machine learning

In today’s highly information-oriented society, gaining useful insights from data has become an extremely important issue for making important scientific discoveries and promoting industry and economic society. Today, data are becoming larger, more dimensional, and more diverse. As a fundamental technology for appropriately handling such data, a field called machine learning has emerged in recent years and is developing rapidly. In traditional academic fields that deal with data, such as statistics and data science, modeling techniques have been developed mainly to perform statistical inference appropriately.

Machine learning, on the other hand, is unique in that it focuses more on the development of efficient computational methods. In the era of big data, various methods developed in the field of machine learning are having a significant impact on society.

In the following pages of this blog, we will discuss the method of continuous optimization, which is an indispensable computational method for constructing machine learning algorithms.

Graph algorithms

Graphs are a way of expressing connections between objects such as objects and states. Since many problems can be attributed to graphs, many algorithms have been proposed for graphs.

In the following pages of this blog, we discuss the basic algorithms for graph data, such as the search algorithm, shortest path algorithm, minimum global tree algorithm, data flow algorithm, DAG based on strongly connected component decomposition, SAT, LCA, decision tree, etc., and applications such as knowledge data processing and Bayesian processing based on graph structures.

Markov chain Monte Carlo methods

MCMC is an abbreviation for Markov Chain Monte Carlo Method, which is an algorithm for extracting samples (generating random numbers) from a multivariate probability distribution.

MCMC is just a computational method, and in order to solve a task, it is necessary to organize the problem so that it can be solved by MCMC (statistical modeling). This is similar to the way that “what to calculate” is more important than the calculation of addition, subtraction, and multiplication itself in arithmetic problems.

Matrix decomposition” and “optimization” are necessary to fit a curve by the least-squares method or to estimate the parameters of a probability distribution by the maximum likelihood method. In contrast, in Bayesian statistics, the “answer” is first given in the form of a “distribution” called the “posterior distribution. In contrast, in Bayesian statistics, the “answer” is first given in the form of a “distribution” called the “posterior distribution,” which makes it a perfect match for MCMC, which generates samples from the distribution. The final necessary information such as point estimates, errors, prediction intervals, etc. are obtained from the many samples generated by MCMC.

The difference between MCMC and optimization is that MCMC keeps moving and generating samples all the time, while optimization stops somewhere (the real optimal solution when it works, or the first time when it doesn’t). In fact, MCMC may also move close to optimization, but even in that case, it moves around the optimal solution forever, expressing error in the Bayesian sense.

MCMC is a general-purpose method for handling probability distributions, and is also used in statistical physics and frequency theory statistics. On the other hand, in Bayesian statistics, various computational methods are used such as Laplace approximation, Kalman filter, successive Monte Carlo method, variational Bayesian method, etc. MCMC is one of them.

In the following pages of this blog, we discuss the basic theory of the Markov chain Monte Carlo method, the specific algorithm and the implementation code.