Machine learning techniques in general (from problem extraction, mathematics, algorithms to recent techniques)

This page is intended to be a comprehensive page on the categories of areas shown below for understanding and applying machine learning techniques.Clicking on an item in the table of contents will take you to the corresponding summary page.

Fundamental Theory and Mathematics

Machine Learning Technology

Machine Learning Technology. Machine learning is a technique that abstracts and models patterns and rules from training data and generalizes them to new data. This section describes the basic concepts of machine learning and how it works.

Problem setting and quantification

Problem Solving Methods and Thinking and Design of Experiments. In machine learning, it is important to quantify goals using frameworks such as PDCA and KPI. If the problem is unclear, hypotheses are formulated using deduction and abduction methods, verified while avoiding confirmation bias, and quantified using Fermi estimation. This section describes problem-solving methods, thinking methods, and experimental design.

Implementation and application

The implementation and application of machine learning are described in programming techniques such as Clojure and functional programming, Pyhton and machine learning, R language and machine learning, C/C++ and various machine learning algorithms.

In addition, the following artificial intelligence technologies are described as the basis for machine learning: hardware technology, natural language processing technology, knowledge data and its utilization, semantic web technology, ontology technology, chatbot technology, agent technology, and user interface technology.

In addition, IT infrastructure technology, web technology, microservices and multi-agent systems, database technology, search technology, etc. are used in ICT technology for PF that utilizes these technologies.

Mathematics in Machine Learning

Mathematics in Machine Learning. Machine learning is a technique for analyzing data and discovering patterns based on mathematical theory. Mathematics includes basic areas such as arithmetic, algebra, geometry, and analysis, as well as applied areas such as probability theory, statistics, and mathematical optimization. This section describes these mathematical topics.

Algorithms and Data Structures

Algorithms and Data Structures. Algorithms refer to problem-solving procedures, while data structures refer to efficient ways of storing and manipulating data. It includes sorting, searching, encryption, Markov chain Monte Carlo methods, etc., and the selection of appropriate algorithms is important. This section describes the theory, algorithms, and implementation.

Data Processing and Preprocessing

Basic Machine Learning and Data Analysis

General Machine Learning and Data Analysis. Basic machine learning tasks include regression to predict continuous values, classification to categorize categories, clustering to group data, dimensionality reduction to process high-dimensional data, and methods to learn sequence patterns. This section describes the theory, specific algorithms, and implementations of these techniques.

Noise Removal, Data Cleansing, and Interpolation of Missing Values

Noise Removal, Data Cleansing, and Interpolation of Missing Values in Machine Learning. Noise removal, data cleansing, and missing value interpolation in machine learning are essential processes for improving data quality and model performance. Noise removal removes unwanted information such as sensor noise and measurement errors, while data cleansing and missing value interpolation align the data by correcting inaccuracies and missing values. This section describes these methods and examples of their implementation.

Parallel and Distributed Processing in Machine Learning

Parallel distributed processing in machine learning. The learning process of machine learning requires high-speed, parallel distributed processing to handle large amounts of data. Parallel distributed processing distributes processing among multiple computers and performs multiple processes at the same time, thus enabling high-speed processing. This section describes specific implementations of these parallel distributed processing techniques.

Machine learning with small data

Small data learning, combining logic with machine learning, and local/population learning. Small data refers to data sets with a limited number of samples, and lack of data is a challenge in machine learning. To cope with this, techniques such as data expansion, transition learning, model simplification, and cross-validation are utilized to achieve high accuracy even with small data. This section summarizes approaches to machine learning with small data.

Models and Algorithms

Deep Learning

Deep Learning. Deep learning is a machine learning technology that uses neural networks that mimic the structure of neurons in the brain to solve complex problems such as image recognition, speech recognition, and natural language processing with high accuracy. It is more versatile than conventional machine learning because it can automatically extract features from large amounts of data. Furthermore, Python tools such as TensorFlow/Keras and PyTorch can be used to simplify model building and training. This section summarizes the theory of deep learning, its applications in various fields, and how to use the tools.

Automatic Generation by Machine Learning

Automatic Generation by Machine Learning. Automatic generation by machine learning involves a computer learning patterns and regularities in data and generating new data based on them. There are various approaches to automatic generation, including deep learning approaches, probabilistic approaches, and simulation approaches. This section describes various approaches and specific implementations of machine learning-based automatic generation techniques.

Reinforcement Learning

Theory and algorithms of various reinforcement learning techniques and their implementation in python. Reinforcement learning is a type of machine learning in which an agent chooses actions in an environment and learns measures to maximize rewards. The environment is modeled as a Markov decision process, and TD learning and Q learning are typical methods. This section describes the theory, algorithms, and Python implementation of reinforcement learning.

Online Learning/OnlinePrediction

Online Learning and Online Prediction. Online learning is a learning method that updates the model each time data is given sequentially, and is suitable for large-scale data analysis and continuously generated data. Online forecasting is a framework that leverages this learning to address decision-making problems. This section describes its theory, implementation, and applications.

Probabilistic approaches in machine learning

Probabilistic approaches in machine learning. Probabilistic generative models are methods that model the distribution of data and generate new data, and are used in supervised and unsupervised learning. Gaussian and beta distributions are assumed for modeling, and maximum likelihood and Bayesian estimation are used for learning. Typical models include LDA, HMM, BM, AE, VAE, and GAN, and are applied to natural language processing, speech recognition, and statistical analysis. This paper describes the theory, implementation, and applications of probabilistic generative models.

Machine Learning with Bayesian Inference and Graphical Model

Machine learning with Bayesian inference and graphical models. Machine learning using Bayesian inference is a statistical learning method that calculates the posterior probability distribution for an unknown variable given observed data according to Bayes’ theorem, the fundamental law of probability, and then calculates estimators for the unknown variable and predictive distributions for new data to be observed in the future based on the obtained posterior probability distribution. This is a statistical learning method. In this section, we describe the basic theory, implementation, and graphical model approach to machine learning techniques based on Bayesian inference.

Nonparametric Bayesian and Gaussian Processes

Nonparametric Bayes and Gaussian Processes. Nonparametric Bayesian models are stochastic models in infinite dimensional space, computed using efficient search algorithms such as Markov chain Monte Carlo methods. Major applications include clustering, structural change estimation, factor analysis, and sparse modeling. Gaussian process is a flexible method for handling smooth functions by Bayesian estimation of the probability distribution of the parameters of the function, and the posterior function is obtained by fitting with real data. This section describes the theory and implementation of these methods.

Topic Model Theory and Implementation

Topic Model Theory and Implementation. Topic models are probabilistic generative models that extract potential topics from a set of documents and are used to understand the content of documents. This allows one to estimate which topics are addressed in a document, which can be useful in large-scale text data analysis. Typical models include LDA and PLSA, which estimate topic distribution and word distribution based on word frequencies. Topic models have been applied not only to text analysis, but also to music, images, and video fields. Various applications include news article analysis, social media analysis, recommendations, image classification, and music genre classification.

Application to special data

Graph Data Algorithm and Machine Learning

Graph data processing algorithms and their application to machine learning/artificial intelligence tasks. Graphs are a way to represent connections between objects, and many problems can be transformed into graph problems. Algorithms related to this include search algorithms, shortest path algorithms, minimum global tree algorithms, data flow algorithms, and strongly connected component decomposition. Also described are algorithms such as DAG, SAT, LCA, and decision trees, as well as applications such as graph structure-based knowledge data processing and Bayesian processing.

Graph Neural Networks

Graph Neural Networks. Graph neural networks (GNNs) are a technique that applies deep learning to graph data, extracts features from the data, and constructs a neural network based on the feature representations. This captures complex data patterns and builds models with nonlinearities. The difference is that while conventional deep learning performs matrix operations based on the grid structure of image and text data, GNN specializes in graph structures and processes data combining nodes and edges. This section describes the algorithm, implementation examples, and application examples of GNN.

Simulation, Data Science and Artificial Intelligence

Simulation, Data Science and Artificial Intelligence. Large-scale computer simulations have become an effective tool in a variety of fields, including astronomy, meteorology, physical properties, and biology. However, only a limited number of simulations can be performed purely on the basis of fundamental laws, and data science is required to set the parameters and initial values that form the premise for the calculations. Modern data science is closely related to simulation and closely intertwined with artificial intelligence. This section discusses simulation, data science, and artificial intelligence.

Analysis of Time Series Data

Time Series Data Analysis. Time-series data is data that changes over time; examples include stock prices, temperatures, and traffic volumes. By applying machine learning, historical data can be learned and unknown data can be predicted for business decision making and risk management. Typical methods include ARIMA, LSTM, Prophet, and state-space models, which learn from past data to predict the future. This section describes the theory, algorithms, and applications of time series data analysis.

Anomaly detection and change detection

Anomaly detection and change detection techniques. Machine learning anomaly detection is a technique for detecting anomalies that deviate from normal conditions, while change detection is a technique for detecting changes in conditions. They are used to detect anomalous behavior, such as manufacturing line failures, network attacks, and fraudulent financial transactions. Techniques for anomaly and change detection include Hotelling’s T2 method, Bayesian methods, neighborhood methods, mixed distribution models, support vector machines, Gaussian process regression, and sparse structure learning, and these approaches are described.

Structural Learning

Structural Learning. Learning the structure of data is important in the interpretation of that data. Structural learning includes basic methods such as hierarchical clustering and decision trees, as well as relational data learning, graph structure learning, and sparse structure learning.

Explicitness and Optimization

Explainable Machine Learning

Explainable machine learning. Explainable machine learning (EXPLAINABLE machine learning) is a technique for presenting the reasons or rationale for the results output by machine learning algorithms in a form that can be explained. The two main current approaches are (A) interpretation by interpretable machine learning models and (B) model-independent post-interpretation models.

Recommended Technology

Recommendation Technology. Recommendation technology using machine learning analyzes a user’s past behavior history, preference data, and other data to provide better personalized recommendations based on that data. The recommendation technology consists of the following flow. (1) create a user profile, (2) extract features of items, (3) train a machine learning model, and (4) generate recommendations from the created model. This section describes the specific implementation and theory behind this recommendation technology.

Machine Learning Based on Sparsity

Sparsity-based machine learning. Machine learning based on sparsity is a technique used for feature selection and dimensionality reduction of high-dimensional data, taking advantage of the property that many elements of the data are close to zero and have a few non-zero elements; it uses linear regression with L1 regularization, logistic regression, etc. to perform feature selection and dimensionality reduction to improve interpretability. This method is widely used in sensor data processing, image processing, and natural language processing.

Overview of Kernel Methods and Support Vector Machines

An overview of kernel methods and support vector machines. The kernel method is a technique used in machine learning to handle nonlinear relationships. It measures the similarity between data using a function called a kernel function, and evaluates the similarity between two sets of data by calculating the inner product between features of the input data. The kernel method is mainly used in algorithms such as support vector machine (SVM), kernel principal component analysis (KPCA), or Gaussian process (GP). This section provides an overview of the kernel method, its theory, implementation, and various applications, mainly related to support vector machines.

Relational Data Learning

Relational Data Learning. Relational data is data that represents “relationships” among N objects, and the relationships are expressed in matrix form. Relational data learning is a learning method for extracting patterns in this matrix data and is mainly applied to two tasks: prediction and knowledge extraction. Prediction is the problem of estimating the value of unobserved data, while knowledge extraction is the task of analyzing data characteristics and extracting useful knowledge and rules. This paper provides a theoretical overview, algorithms, and applications with respect to this learning method.

Causal Inference and Causal Search Techniques

Statistical Causal Inference and Causal Search. While machine learning approaches exist that derive “correlations” from vast amounts of data, deriving “causal relationships” is expected to have applications in the medical and manufacturing industries. Analyzing causal relationships requires a statistical approach that differs from general machine learning. Specifically, “causal inference” is a technique to identify and verify causal relationships, while “causal search” is a technique to discover causal relationships, and there is a difference in purpose and approach. The theory, implementation, and application of these causal inference/search techniques are described.

Submodular Optimization and Machine Learning

Submodular Optimization and Machine Learning. The concept of a submodular function is a convex function with respect to discrete variables, which plays an important role in combinatorial optimization problems. Combinatorial is the procedure of selecting a part from a collection of possible choices, and by utilizing the submodular function, an efficient solution to the optimization problem can be derived. This makes it widely applied in information theory, machine learning, economics, social sciences, and other fields, such as social network analysis, image segmentation, and ad placement. Here we describe the theory and implementation of a machine learning approach to submodular optimization.

Theory and Algorithms for the Bandit Problem

Theory and Algorithms for the Bandit Problem. The bandit problem is a type of reinforcement learning in which the agent selects the arm with the highest reward from multiple alternatives (arms). The agent finds the optimal arm by drawing the arm many times to obtain the reward, while the reward of each arm is unknown. The following assumptions are made in solving this problem: (1) agents select arms independently, (2) each arm generates a reward according to a probability distribution, (3) the rewards of the arms are observable but the probability distribution is unknown, and (4) agents obtain rewards by repeatedly pulling the arms. Algorithms used to solve this problem include (1) the ε-greedy method (combining random and optimal selection), (2) the UCB algorithm (preferentially selecting uncertain arms), and (3) the Thompson extraction method (sampling the next arm to be selected from the posterior distribution). Bandit problems have also been applied to real-world problems such as ad delivery and treatment selection.

Other Machine Learning

Topological Handling of Data Using Topological Data Analysis

In this article, we will discuss Topological Data Analysis.

Topological data analysis is a method of analyzing a set of data using a “soft” geometry called topology. Machine learning is an operation to find a model that fits the data well given the data, and a model is a space expressed in terms of some parameters. Given that a model is a space represented by some parameters, the essence of machine learning is to find a projection (function) from the data points to the space of the model.

Topology, on the other hand, is like the oft-used coffee cup and doughnut example. Suppose a coffee cup is made of an unbreakable clay-like material, and if we deform it little by little, we can eventually transform it into a doughnut.