General Machine Learning and Data Analysis

Machine Learning Natural Language Processing Artificial Intelligence Digital Transformation DX Case Study  Navigation of this blog
General Machine Learning and Data Analysis

In this page, we summarize these general algorithms for machine learning as follows.

General Machine Learning

Bowtie analysis is a risk management technique that is used to organise risks in a visually understandable way. The name comes from the fact that the resulting diagram of the analysis resembles the shape of a bowtie. The combination of bowtie analysis with ontologies and AI technologies is a highly effective approach to enhance risk management and predictive analytics and to design effective responses to risk.

  • Overview of Iterative Optimization Algorithms and Examples of Implementations

Iterative optimization algorithms are an approach that iteratively improves an approximate solution in order to find the optimal solution to a given problem. These algorithms are particularly useful in optimization problems and are used in a variety of fields. The following is an overview of iterative optimization algorithms.

  • Overview of interpolation methods and examples of algorithms and implementations

Interpolation is a method of estimating or complementing values between known data points, connecting points in a data set to generate a smooth curve or surface, which can then be used to estimate values at unknown points. Several major interpolation methods are discussed below.

Mini-batch learning is one of the most widely used and efficient learning methods in machine learning, which is computationally more efficient and applicable to large data sets compared to the usual Gradient Descent method. This section provides an overview of mini-batch learning. Mini-batch learning is a learning method in which multiple samples (called mini-batches) are processed in batches, rather than the entire dataset at once, and the gradient of the loss function is calculated for each mini-batch and the parameters are updated using the gradient.

Feature engineering refers to the extraction of useful information from a dataset and the creation of input features that machine learning models can use to make predictions and classification, and is an important process in the context of machine learning and data analysis. This section describes various methods and implementations of feature engineering.

Negative Log-Likelihood (NLL) is a loss function for optimising the parameters of models in statistics and machine learning, especially those often used in models based on probability distributions (such as classification models). It is a measure of a model’s performance based on the probability that the observed data were predicted by the model, and its purpose is to optimise the parameters of the model so that the model can explain the observed data with a high probability.

Contrastive Divergence (CD) is a learning algorithm mainly used for training Restricted Boltzmann Machines (RBM), a generative model for modelling the probability distribution of data, and CD is a method for efficiently learning its parameters.

Noise Contrastive Estimation (NCE) is a method for estimating the parameters of a probabilistic model, and is a particularly effective approach for processing large data sets and high-dimensional data. Use.

Negative sampling is a learning algorithm in natural language processing and machine learning, especially used in word embedding models such as Word2Vec as described in ‘Word2Vec’. It is a method for selective sampling of infrequent data (negative examples) for efficient learning of large datasets.

  • Model Quantization and Distillation

Model quantization (Quantization) and distillation (Knowledge Distillation) are methods for improving the efficiency of machine learning models and reducing resources during deployment.

  • Overview of Model Distillation with Soft Target and Examples of Algorithms and Implementations

Model distillation by soft target (Soft Target) is a technique for transferring the knowledge of a large and computationally expensive teacher model to a small and efficient student model. Typically, soft target distillation focuses on teaching the probability distribution of the teacher model to the student model in a class classification task. Below we provide an overview of model distillation by soft targets.

  • Model Lightening through Pruning and Quantization

Model lightening is an important technique for converting deep learning models to smaller, faster, and more energy efficient models. There are various approaches to model lightening, including pruning and quantization The following is a list of some of the most common approaches to model lightening.

  • Overview of Post-training Quantization and Examples of Algorithms and Implementations

Post-training quantization is a method of quantizing a model after the training of a neural network has been completed, and this method converts the weights and activations of the model, which are usually expressed in floating-point numbers, into a form expressed in low-bit numbers such as integers. This reduces the model’s memory usage. This reduces model memory usage and improves inference speed. The following is an overview of post-training quantization.

  • Overview of Model Distillation with FitNet and Examples of Algorithms and Implementations

FitNet is a model distillation method that allows small student models to learn knowledge from large teacher models. Below we provide an overview of model distillation with FitNet.

Quantization-Aware Training (QAT) is one of the training methods for effectively quantizing (quantizing) neural networks. Quantization is the process of expressing the weights and activations of a model in low-bit numbers, such as integers, instead of floating-point numbers. Quantization-Aware Training is one of the methods to incorporate this quantization into the model during training to obtain a model that takes into account the effects of quantization during training.

  • Attention Transfer Model Distillation Overview, Algorithm, and Implementation Examples

Attention Transfer is one of the methods for model distillation in deep learning. Model distillation is a method for transferring knowledge from a large and computationally demanding model (teacher model) to a small and lightweight model (student model). This allows student models to perform as well as teacher models while reducing the use of computational resources and memory.

  • Measures for Dealing with Unknown Models in Machine Learning

Measures for machine learning models to deal with unknown data have two aspects: ways to improve the generalization performance of the model and ways to design how the model should deal with unknown data.

  • Overview of Hard Negative Mining and Examples of Algorithms and Implementations

Hard Negative Mining is a method of focusing on difficult negative samples (negative examples) in the field of machine learning, especially in tasks such as anomaly detection and object detection. This allows the model to deal with more difficult cases and is expected to improve performance.

  • How to Deal with Overlearning in Machine Learning

Overfitting is a phenomenon in which a machine learning model overfits the training data, resulting in poor generalization performance for new data.

Similarity is a concept that describes the degree to which two or more objects or things have common features or properties and are considered similar to each other, and plays an important role in evaluating, classifying, and grouping objects in terms of comparison and relatedness. This section describes the concept of similarity and general calculation methods for various cases.

The main approaches to using artificial intelligence techniques to extract emotions include (1) natural language processing, (2) speech recognition, (3) image recognition, and (4) biometric analysis. These methods are combined with algorithms such as machine learning and deep learning, and are basically detected using large amounts of training data. Approaches that combine different modalities (text, voice, images, biometric information, etc.) to comprehensively understand emotions are also more accurate methods.

Methods for extracting emotion from textual data include, specifically, dividing sentences into tokens, using machine learning algorithms to understand word meaning and context, and training models using a dataset for emotion analysis to predict the emotion context for unknown text This is achieved by

  • Auto-Grading (automatic grading) technology

Auto-grading refers to the process of using computer programmes and algorithms to automatically assess and score learning activities and assessment tasks. This technology is mainly used in the fields of education and assessment.

In machine learning tasks, recall is an indicator mainly used for classification tasks. To achieve 100% recall means, in the case of a general task, to extract all the data (positives) that should be found without omission, and this is something that frequently appears in tasks involving real-world risks.

However, achieving 100% reproducibility is generally difficult to achieve, as it is limited by the characteristics of the data and the complexity of the problem. In addition, the pursuit of 100% reproducibility may lead to an increase in the percentage of false positives (i.e., mistaking an originally negative result for a positive result), so it is necessary to consider the balance between these two factors.

This section describes the issues that must be considered in order to achieve a 100% reproducibility rate, as well as approaches and specific implementations to address these issues.

When performing real-world machine learning tasks, one often encounters cases where different labels are assigned to things that should have been assigned the same label. In this article, we discuss how to deal with such cases of inaccurate teacher data in machine learning.

Fermi estimation (Fermi estimation) is a method for making rough estimates when precise calculations or detailed data are not available and is named after the physicist Enrico Fermi. Fermi estimation is widely used as a means to quickly find approximate answers to complex problems using logical thinking and appropriate assumptions. In this article, we will discuss how this Fermi estimation can be examined using artificial intelligence techniques.

Search Algorithm (Search Algorithm) refers to a family of computational methods used to find a target within a problem space. These algorithms have a wide range of applications in a variety of domains, including information retrieval, combinatorial optimization, game play, route planning, and more. This section describes various algorithms, their applications, and specific implementations with respect to these search algorithms.

Self-Adaptive Search Algorithms, or Self-Adaptive Search Algorithms, are a family of algorithms used in the context of evolutionary computation and optimization, where the parameters and strategies within the algorithm are characterized by adaptive adjustment to the problem. These algorithms are designed to adapt to changes in the nature of the problem and the environment in order to efficiently find the optimal solution. This section describes various algorithms and examples of implementations with respect to this self-adaptive search algorithm.

Multi-Objective Search Algorithm (Multi-Objective Optimization Algorithm) is an algorithm for optimizing multiple objective functions simultaneously. Multi-objective optimization aims to find a balanced solution (Pareto optimal solution set) among multiple optimal solutions rather than a single optimal solution, and such problems have been applied to many complex systems and decision-making problems in the real world. This section provides an overview of this multi-objective search algorithm and examples of algorithms and implementations.

k-means is one of the algorithms used in the machine learning task called clustering, a method that can be used in a variety of tasks. Clustering here refers to the method of dividing data points into groups (clusters) with similar characteristics. The k-means algorithm aims to divide the given data into a specified number of clusters. This section describes the various algorithms of this k-means and their specific implementations.

Decision Tree is a tree-structured classification and regression method used as a predictive model for machine learning and data mining. Since decision trees can construct conditional branching rules in the form of a tree to predict classes (classification) and values (regression) based on data characteristics (features), they can white box machine learning results, as described in “Explainable Machine Learning”. This section describes various algorithms for decision trees and concrete examples of their implementation.

The issue of small amount of data to be trained (small data) is a problem that appears in various tasks as a factor that reduces the accuracy of machine learning. Machine learning with small data can be approached in various ways, taking into account data limitations and the risk of overlearning. This section discusses the details of each approach and implementation examples.

  • Overview of SMOTE (Synthetic Minority Over-sampling Technique), Algorithm and Implementation Examples

SMOTE (Synthetic Minority Over-sampling Technique) is a technique for complementing under-sampling by combining minority class samples in datasets with unbalanced class distributions. used to improve model performance, primarily in machine learning class classification tasks. An overview of SMOTE is given below.

Active learning in machine learning (Active Learning) is a strategic approach to effectively selecting labeled data to improve model performance. Typically, training machine learning models requires large amounts of labeled data, but since labeling is costly and time consuming, active learning increases the efficiency of data collection.

  • Target Domain-Specific Fine Tuning in Machine Learning Technology

Target domain-specific fine tuning refers to the process in machine learning techniques of adjusting a model from a general, pre-trained model to one that is more suitable for a specific task or tasks related to a domain. It is a form of transition learning and is performed in the following steps.

Ensemble Learning is a type of machine learning that combines multiple machine learning models to build a more powerful predictive model. Combining multiple models rather than a single model can improve the prediction accuracy of the model. Ensemble learning has been used successfully in a variety of applications and is one of the most common techniques in machine learning.

Transfer learning, a type of machine learning, is a technique for applying a model or knowledge learned in one task to a different task. Transfer learning is usually useful when a new task requires little data or high performance. This section provides an overview of transfer learning and various algorithms and implementation examples.

Automatic machine learning (AutoML) refers to methods and tools for automating the process of designing, training, and optimizing machine learning models.AutoML is particularly useful for users with limited machine learning expertise or those seeking to develop efficient models, with the following main goals. This section provides an overview of this AutoML and examples of various implementations.

  • Overview of Question-Answering Learning and Examples of Algorithms and Implementations

Question Answering (QA) is a branch of natural language processing in which the task is to generate appropriate answers to given questions. retrieval, knowledge-based query processing, customer support, work efficiency, and many other applications. This paper provides an overview of question-answering learning, its algorithms, and various implementations.

The EM algorithm (Expectation-Maximization Algorithm) is an iterative optimization algorithm widely used in statistical estimation and machine learning. In particular, it is often used for parameter estimation of stochastic models with latent variables.

Here, we provide an overview of the EM algorithm, the flow of applying the EM algorithm to mixed models, HMMs, missing value estimation, and rating prediction, respectively, and an example implementation in python.

The EM (Expectation Maximization) algorithm can also be used as a method for solving the Constraint Satisfaction Problem. This approach is particularly useful when there is incomplete information, such as missing or incomplete data. This paper describes various applications of the constraint satisfaction problem using the EM algorithm and its implementation in python.

Labeling of image information can be achieved by various machine learning approaches, as described below. This time, we would like to consider the fusion of these machine learning approaches and the constraint satisfaction approach, which is a rule-based approach. These approaches can be extended to labeling text data using natural language processing, etc.

LightGBM is a Gradient Boosting Machine (GBM) framework developed by Microsoft, which is a machine learning tool designed to build fast and accurate models for large data sets. Here we describe its implementation in pyhton, R, and Clojure.

Generalized Linear Model (GLM) is one of the statistical modeling and machine learning methods used for stochastic modeling of the relationship between response variables (objective variables) and explanatory variables (features). This section provides an overview of this generalized linear model and its implementation in various languages (python, R, and Clojure).

Stochastic optimization represents a method for solving optimization problems involving stochastic elements, and stochastic optimization in machine learning is a widely used method for optimizing the parameters of a model. Whereas in general optimization problems, the goal is to find optimal values of parameters to minimize or maximize the objective function, stochastic optimization is particularly useful when the objective function contains noise or randomness caused by various factors, such as data variability or observation error .

In stochastic optimization, random factors and stochastic algorithms are used to find the optimal solution. For example, in the field of machine learning, stochastic optimization methods are frequently used to optimize parameters such as weights and biases of neural networks. In SGD (Stochastic Gradient Descent), a typical method, optimization is performed by randomly selecting samples of the data set and updating parameters based on those samples, so that the model can be efficiently trained without using the entire data set The model can be trained without using the entire dataset.

This section describes implementations in python for SGD and mini-batch gradient descent, Adam, genetic algorithms, and Monte Carlo methods and examples of their application to parameter tuning, feature selection and dimensionality reduction, and k-means.

Multi-Task Learning is a machine learning method that simultaneously learns multiple related tasks. Usually, each task has a different data set and objective function, but Multi-Task Learning aims to incorporate these tasks into a model at the same time so that they can complement each other by utilizing their mutual relevance and shared information.

Here, we provide an overview of methods such as shared parameter models, model distillation, transfer learning, and multi-objective optimization for this multitasking, and discuss examples of applications in natural language processing, image recognition, speech recognition, and medical diagnosis, as well as a simple implementation in python.

Sparse modeling is a technique that takes advantage of sparsity in the representation of signals and data. Sparsity refers to the property that non-zero elements in data or signals are limited to a very small portion. The purpose of sparse modeling is to efficiently represent data by utilizing sparsity, and to perform tasks such as noise removal, feature selection, and compression.

This section provides an overview of sparse modeling algorithms such as Lasso, compression estimation, Ridge regularization, elastic nets, Fused Lasso, group regularization, message propagation algorithms, dictionary learning, etc., as well as a description of the various algorithms used in image processing, natural language processing, recommendation, signal processing The paper describes the implementation of the algorithms in various applications such as image processing, natural language processing, recommendation, machine learning, signal processing, brain science, and so on.

Robust Principal Component Analysis (RPCA) is a method for finding a basis in data, and is characterized by its robustness to data containing outliers and noise. This paper describes various applications of RPCA and its concrete implementation using pyhton.

  • About LLE (Locally Linear Embedding)

LLE (Locally Linear Embedding) is a nonlinear dimensionality reduction algorithm that embeds high-dimensional data into a lower dimension. It assumes that the data is locally linear and reduces the dimension while preserving the local structure of the data. It is primarily used for tasks such as clustering, data visualization, and feature extraction.

  • About Multidimensional Scaling (MDS)

Multidimensional Scaling (MDS) is a statistical method for visualizing multivariate data that provides a way to place data points in a low-dimensional space (usually two or three dimensions) while preserving distances or similarities between the data. This technique is used to transform high-dimensional data into easily understandable low-dimensional plots that help visualize data features and clustering.

t-SNE is a nonlinear dimensionality reduction algorithm that embeds high-dimensional data into lower dimensions. t-SNE is mainly used for tasks such as data visualization and clustering, where its particular strength is its ability to preserve the nonlinear structure of high-dimensional data. t-SNE’s main ideas are The main idea of t-SNE is to reflect the similarity of high-dimensional data in a low-dimensional space.

  • About UMAP (Uniform Manifold Approximation and Projection)

UMAP is a nonlinear dimensionality reduction method for high-dimensional data, which aims to embed the data in a lower dimension while preserving its structure. It is used for visualization and clustering in the same way as t-SNE (t-distributed Stochastic Neighbor Embedding) described in “About t-SNE (t-distributed Stochastic Neighbor Embedding)” but adopts a different approach in some respects.

DBSCAN is a popular clustering algorithm in data mining and machine learning that aims to discover clusters based on the spatial density of data points rather than assuming the shape of the clusters. This section provides an overview of this DBSCAN, its algorithm, various application examples, and a concrete implementation in python.

Statistical analysis in general considers how a sample is described in terms of summary statistics and how population parameters can be inferred from it. Such an analysis tells us something about the population in general and the sample in particular, but does not provide a very precise description of the individual elements. This is because much information is lost by reducing the data to two statistics, the mean and the standard deviation.

We often want to go further and establish a relationship between two or more variables or to predict another variable from one variable. This is where correlation and regression studies come in. Correlation concerns the strength and direction of the relationship between two or more variables. Regression determines the nature of this relationship from which predictions can be made.

Linear regression is an elementary machine learning algorithm. Given a sample of data, the model learns a linear equation and is able to make predictions about the new unknown data. To this end, we will use Incanter, a statistical library in Clojure, to describe how matrices can be manipulated using Incanter to determine the relationship between height and weight of Olympic athletes.

Although it is useful to know that two variables are correlated, it is not enough to predict the weight of an Olympic swimmer from his/her height, and vice versa, using the data described in “Statistical Analysis and Correlation Evaluation Using Clojure/Incanter”. In establishing the correlation, we measured the strength and sign of the relationship, but not the slope. In order to make predictions, it is necessary to know what the rate of change of one variable will be when the other variable changes by one unit.
What is needed is an equation that relates the specific value of one variable, called the independent variable, to the expected value of the other variable, called the dependent variable. For example, in a linear equation that predicts weight from height, height is the independent variable and weight is the dependent variable.

The line represented by this equation is called the regression line. The term was introduced by Sir Francis Galton, a 19th century English philologist who, along with his student Karl Pearson (who defined the correlation coefficient), developed various methods to study linear relationships, which collectively came to be called regression methods.

In the previous article, “Regression Analysis Using Clojure (1) Single Regression Model,” we discussed how to construct a regression line with a single independent variable. However, when considering a real problem, it is often desirable to construct a model with multiple independent variables. This problem is called a multiple regression problem. Each independent variable needs its own coefficient. So rather than alphabetize each coefficient, we will specify a new variable, beta (pronounced “beta”), to hold all the coefficients.

The input image becomes a feature vector after a series of processing. The final step of class recognition is classification, which assigns a class (e.g., “dog” or “cat”) to this feature vector. The algorithm that performs classification is called a classifier.

The algorithm for classification is called a classifier. In this section, we will discuss the Bayes decision rule for constructing a classifier.

Continuing from the previous article, we will discuss classifiers using the perceptron, deep learning, and SVM.

When considering class recognition, if we can predict the posterior probability of a class whose discriminant function takes a value between 0 and 1, we can quantify the degree to which the input data belongs to the target class. However, since the output of the discriminant function ranges from -∞ to +∞, it is difficult to directly interpret it as a posterior probability. Therefore, we can use a probabilistic discriminant function that extends the linear discriminant function to predict the posterior probability of a class. In this case, the probabilistic discriminant function is used. Logistic regression and softmax regression, which are approaches using probabilistic discriminant functions, are important elements of neural networks.

    In actual images, some disturbance or noise is added, and if we use local features obtained from images that are affected by disturbance as they are, we may not be able to obtain the expected recognition accuracy. Therefore, statistical feature extraction is necessary to convert the observed data into features that are advantageous for recognition based on the established statistical structure of the data.

    Statistical feature extraction means that the extracted local features are further extracted based on the probability statistical structure of the data, and transformed into robust features that are not easily affected by noise or disturbances. Statistical feature extraction can be applied not only to local features but also to various features in image recognition.

    Statistical feature extraction can be classified according to the presence or absence of external criteria, i.e., teacher information, such as which class the data belongs to. When there is no external criterion, principal component analysis is used as a feature extraction method. When there is an external criterion, Fisher’s linear discriminant analysis is used for feature extraction in class recognition, the regular modified correlation distribution is used for bivariate correlation maximization, and the partial least squares method is used for bivariate covariance maximization. Although these seem to be different methods at first glance, they are deeply related to each other.

    Object detection aims to find a rectangular region in an image that surrounds an object such as a person or a car. Many object detection methods propose multiple candidate object regions and use object class recognition methods to determine which object these regions are classified as. Since the number of candidate object regions proposed from images is often huge, methods with low computational cost are often used for object class recognition.

    Sliding window method, selective search method, and branch-and-bound method are the methods to propose object region candidates from images. There are also several methods to classify them, such as Exampler-SVM, Random Forest, and R-CNN (regious with CNN feature).

    A decision tree learner is a powerful classifier that uses a tree structure to model the relationship between possible outcomes as futures.

    A key feature of the decision tree algorithm is that the flowchart-like tree structure is not necessarily exclusive to the learner, but the output results of the model can be read by humans to provide a great hint as to why or how the model does (or does not) work well for a particular task.

    Such a mechanism can be particularly useful when the classification mechanism must be transparent for legal reasons, or when the results are shared with others to make explicit business practices between organizations.

    In this article, we will discuss clustering of decision trees using R.

    The data to be used is the bank.zip of the Bank Marketing Data Set, which is downloaded from the machine learning data repository of the University of Hamburg. This data set contains data from a financial credit survey in Germany.

    Classification rules represent knowledge in the form of logical if-else statements that give classes to unlabeled instances. These are designated as “antexedent” and “consequent,” and form the hypothesis that “if this happens, that happens. A simple rule asserts something like, “If the hard disk is ticking, it will soon make an error. The prior case consists of a specific combination of feature values, whereas the posterior case specifies the class values that will be given when the conditions of the rule are met.

    Classification rule learning is often used in the same way as decision tree learning. Classification rules can be specifically used in applications that generate knowledge for future actions, such as the following. (1) identifying conditions that cause hardware errors in mechanical devices, (2) describing key characteristics of groups of people belonging to customer segments, (3) extracting conditions that herald a significant drop or rise in stock market prices.

    In this article, we will discuss the extraction of rules using decision trees with R.

    As in the previous article, we will use the same data from the UCI repository to extract the rules: “Is the mushroom edible? Is it poisonous? from the UCI repository.

    In this section, we describe an algorithm that learns how to sort alignments by presenting a number of correct alignments (positive examples) and incorrect alignments (negative examples). The main difference between these approaches is that the techniques in this section require some sample data for learning. This can be provided by the algorithm itself, such as only a subset of the correspondences to be judged, or it can be determined by the user, or it can be brought from external resources.

    PCA is a method of dimensionally compressing multidimensional data, for example, assuming data in a two- or three-dimensional space, if the data is uneven (large variance) on a certain one- or two-dimensional axis (straight line) or two-dimensional axis (plane), it is said that the data is biased on that straight line or plane, and is represented as data on a certain one- or two-dimensional axis (straight line) or two-dimensional axis (plane) instead of the original two- or three-dimensional axis.

    The actual algorithm assumes a basis vector e that represents a new subspace (the specific 1D or 2D axis mentioned above), and calculates the minimum or problem min E(c0, e) using the median (mean) of the data distribution c0, using the Lagrange undecided coefficient method.

    It is also a benchmark for Hinton’s autoencoder as a dimension compression technique.

    Sequential pattern mining is a special case of structural data mining in which data mining finds statistically related patterns among data examples whose values are delivered in sequences.

    Tasks performed in these include “efficient database and index construction of sequence information,” “extraction of frequently occurring patterns,” “comparison of sequence similarities,” and “completion of missing sequence data.

    Specific examples of applications include analysis of gene and protein sequence information (e.g., nucleotide base A, G, C, and T sequences) and expression analysis of their functions, as well as pattern extraction of purchase items that occur in large-scale transactions in stock trading and e-commerce (e.g., if a customer buys onions and potatoes, the same pattern will be extracted). It can also be used for process mining such as workflow. Here we describe a typical algorithm, apriori.

    FP-Growth (Frequent Pattern-Growth) is an efficient algorithm for data mining and frequent pattern mining, and is a method used to extract frequent patterns (itemsets) from transaction data sets. In this paper, we describe various applications of the FP-Growth algorithm and an example implementation in python.

    Introduction to nearest neighbor methods, decision trees, and neural networks as basic algorithms for pattern recognition.

      Recommended Technology

      Recommendation technology using machine learning analyzes the user’s past behavior history, preference data, and other data to provide better personalized recommendations based on that data. The recommendation technology consists of the following flow. (1) create a user profile, (2) extract features of items, (3) train a machine learning model, and (4) generate recommendations from the created model.

      In this blog, specific implementation and theory of this recommendation technology are described in the following pages.

      Noise Removal, Data Cleansing, and Interpolation of Missing Values in Machine Learning

      Noise removal and data cleansing and missing value interpolation in machine learning are important processes for improving data quality and the performance of predictive models.

      Noise reduction is a technique for removing unwanted information or random errors in data. Noise can be caused by a variety of factors, including sensor noise, measurement error, and data entry errors. Noise can negatively impact the training and prediction of machine learning models, and the goal of noise removal is to obtain reliable data and improve model performance. Data cleansing, or interpolation of missing values, is the process of cleaning a dataset to resolve issues such as inaccuracies, incompleteness, duplicates, and missing values.

      The following pages of this blog provide a variety of information about noise removal, data cleansing, and missing value interpolation in this machine learning process, including example implementations.

      Model validation

      Statistical Hypothesis Testing is a method in statistics that probabilistically evaluates whether a hypothesis is true or not, and is used not only to evaluate statistical methods, but also to evaluate the reliability of predictions and to select and evaluate models in machine learning. It is also used in the evaluation of feature selection as described in “Explainable Machine Learning,” and in the verification of the discrimination performance between normal and abnormal as described in “Anomaly Detection and Change Detection Technology,” and is a fundamental technology. This section describes various statistical hypothesis testing methods and their specific implementations.

      Epidemiology is the study of the frequency and distribution of health-related events in specific populations, the elucidation of their causes, and the application of research findings to the prevention and control of health problems. Epidemiology began in the 19th century with the study of cholera by Jhon Snow.

      Often we hear first-time students say, “I don’t understand statistics.” There are also mathematical scientists who make similar statements with a deep knowledge of statistics. This opinion about statistics cannot be said to be generally uninformed or unfair. Statistics is concerned with “methods of inference” and is based on the “concept of probability. But it is not easy to answer the question, “What is rational inference?” is not an easy question to answer. As is well known, the best minds of the time, from the time of the Greeks to the present day, have addressed this difficult question, but the controversy surrounding this question is far from settled. Furthermore, the concept of “probability” is difficult to grasp in a straightforward manner, and this, too, is a matter of constant controversy. In view of this fact, those who feel that they do not understand statistics or probability may be the ones with sound common sense. If someone thinks that he or she can easily understand these concepts, he or she may need to recheck his or her level of understanding. It is not without reason that statistics are difficult to understand.

      In real-world applications of SVM, hyperparameters such as regularization parameters need to be determined appropriately. This problem is called model selection and is one of the important topics in machine learning. To address this issue, we first describe the cross-validation method, which is one of the model selection methods.

      In the k-partition cross-validation method, k training and evaluation are performed. To illustrate well, in the first case, the training data are the 𝑘1𝑘𝑛𝑎𝑙𝑙 examples collected from D2 to Dk, and the set of 1𝑘𝑛𝑎𝑙𝑙 examples included in D1 is the evaluation data. The classifier is trained using the training data, and the resulting classifier is evaluated using the evaluation data. Similarly, for the second time, the training data is the set of 𝑘1𝑘𝑛𝑎𝑙𝑙 examples in D1, D3 to Dk as well, and the set of 1𝑘𝑛𝑎𝑙𝑙 examples in D2 as evaluation data is used for learning and evaluation. The same steps are repeated until the kth time.

      Here we describe a regularized path tracking algorithm for SV classification. This algorithm is often used to solve the dual problem of SV classification. First, we review the dual representation of the decision function and the dual problem, and then derive that the optimal solution of the dual variable is the one in the regularization parameter C. We then use them to describe the regularized path tracking algorithm in SV classifiers.

      From Clojure for Data Science. Describes techniques for evaluating clustering, one of the unsupervised learning algorithms. Curse of dimensionality, Mahalanobis distance, Davies-Bouldin index, Dunn index, squared error, RSME, cluster number estimation, inter-cluster density, intra-cluster density)

      Ensembles Learning

      Ensemble learning methods are the next trend after deep learning. It is a state-of-the-art machine learning method that trains multiple learners using typical methods such as boosting and bagging, and then combines them. It is known to be much more accurate than single learning methods and has actually been used successfully in many situations.
      Chapters 1 deals with the background knowledge of ensemble methods, Chapters 2-5 deal with the core knowledge of ensemble methods, Chapter 5 discusses recent information-theoretic manifolds and manifold generation, and Chapter 6 describes advanced ensemble methods. Chapter 6 describes advanced ensemble methods. This book is a must-read for researchers, engineers, and students involved in artificial intelligence and machine learning.

      When the data is distributed in a complex way in the feature space, a nonlinear classifier becomes effective. To construct a nonlinear classifier, kernel methods and neural networks can be used. In this section, we describe ensemble learning, which constructs a nonlinear classifier by combining multiple simple classifiers. Collective learning is also called ensemble learning.

      As collective learning, we describe bagging, which generates subsets from the training data set and trains a predictor on each subset. This method is particularly effective for unstable learning algorithms. An unstable learning algorithm is one in which small changes in the training data set have a large impact on the structure and parameters of the predictor being learned. Neural networks and decision trees are examples of unstable learning algorithms.

      The bootstrap method is a method of generating diverse subsets from a finite set of data. The bootstrap method is a method to generate M new data sets by repeating random recovery extraction from the data set M times.

      In machine learning, many of us have probably seen the term “ensemble learning” at least once. In fact, this idea is an essential part of machine learning. Furthermore, there are three methods of ensemble learning: “bagging,” “boosting,” and “stacking. What is the difference between them? In this lecture, we will explain the basics of ensemble learning, from the definition of what it is to the differences between the methods, as well as the advantages and cautions in learning.

      In this article, we will discuss group learning, which is a learning method that combines simple learning algorithms. As a representative example, we will discuss boosting and give a theoretical evaluation of its prediction accuracy.

      Parallel and Distributed Processing in Machine Learning

      The learning process of machine learning requires high-speed parallel distributed processing to handle large amounts of data. Parallel distributed processing distributes processing among multiple computers and performs multiple processes at the same time, enabling high-speed processing.

      The following pages of this blog describe specific implementations of these parallel and distributed processing techniques.

      Data Analysis

      With the development of information technology (IT), vast amounts of data have been accumulated, and attempts to create new knowledge and value by analyzing this “big data” are spreading. In this context, “prescriptive analysis” has been attracting attention as a method for analyzing big data. Prescriptive analysis is an analytical method that attempts to derive the optimal solution for an “objective” from a complex combination of conditions. This article looks at the characteristics of prescriptive analysis, including its differences from “explanatory analysis” and “predictive analysis,” which are used in many big data analyses, and the scope of benefits that can be derived from its application.

      meta-heuristic algorithm

      Meta-heuristics is an organic combination of empirical methods (heuristics) for solving difficult optimization problems. Recently, it has become a popular optimization algorithm among practitioners as a framework for solving practical problems with ease. When solving practical problems, robust solutions can be designed in a relatively short time if one has some programming skills, an eye for selecting meta-heuristics, and a knack for design.

      • Overview of genetic algorithms, application examples, and implementation examples

      Genetic algorithm (GA) is a type of evolutionary computation, and is an optimization algorithm for optimizing problems by imitating the evolutionary process in nature, and is used for optimization, exploration, machine learning, and machine design. This is a method that has been applied to a variety of problems. The basic elements and mechanism of the genetic algorithm are described below.

      • Overview of Genetic Programming (GP) and its algorithms and implementations

      Genetic Programming (GP) is a type of evolutionary algorithm that is widely used in machine learning and optimization. An overview of GP is given below.

      • Overview of Gene Expression Programming (GEP) and Examples of Algorithms and Implementations

      Gene Expression Programming (GEP) is a type of evolutionary algorithm, a method that is particularly suited for the evolutionary generation of mathematical expressions and programs. This technique is used to evolve the form of a mathematical expression or program to help find the best solution for a particular task or problem. The main features and overview of GEP are described below.

      Particle Swarm Optimization (PSO) is a type of evolutionary computation algorithm inspired by swarming behavior in nature, modeling the behavior of flocks of birds and fish. PSO is characterized by its ability to search a wider search space than genetic algorithms, which tend to fall into local solutions. PSO is widely used to solve machine learning and optimization problems, and numerous studies and practical examples have been reported.

      • Overview of the Calton Method (Cultural Algorithm) and Examples of Application and Implementation

      Cultural Algorithm is a type of evolutionary algorithm that extends evolutionary algorithms by introducing cultural elements. programming are representative examples. The Calton method introduces a cultural component to these evolutionary algorithms, and becomes one that takes into account not only the evolution of individuals, but also the transfer of knowledge and information between individuals.

      Evolutionary algorithms are optimisation techniques designed based on the principles of natural selection and genetic information transfer in evolutionary biology. In evolutionary algorithms, candidate solutions are represented as individuals, and individuals are evolved through genetic manipulation (crossover, mutation, etc.) to search for the optimal solution.

      In the previous article, we described an iterative similarity computation approach using a set of similarity equations. In this article, we will discuss optimization approaches that further extend these approaches. In this article, we first discuss two of these approaches: expectation maximization and particle swarm optimization.

      In the previous article, we discussed machine learning for sorting alignments. In this article, we will discuss tuning approaches for matching.

      コメント

      タイトルとURLをコピーしました