Overview of various machine learning techniques for anomaly and change detection

Machine Learning Artificial Intelligence Digital Transformation Sensor Data & IOT Stream Data Processing Probabilistic Generative Model Support Vector Machine Sparse Modeling Relational Data Learning Navigation of this blog

Anomaly and Change Detection with Machine Learning

Overview

Abnormality detection by machine learning is a technology aimed at detecting situations that differ significantly from the normal situation, and is used for the purpose of detecting all types of abnormal behavior, for example, manufacturing line failures, network attacks, and financial transaction irregularities.

On the other hand, change detection by machine learning is a technology aimed at detecting changes from one situation to another, and is used for the purpose of detecting temporal changes in, for example, sensor data or image data.

Both anomaly detection and change detection use machine learning to learn patterns in data and detect anomalous situations or changes, and the following three main methods are used.

- Anomaly/change detection by unsupervised learning: Anomaly/change detection by unsupervised learning is a method that learns the characteristics of stationary data patterns and detects data patterns that differ from them. Principal component analysis and kernel density estimation are mainly used.
- Anomaly/change detection by supervised learning: Anomaly/change detection by supervised learning is a method that prepares both steady data and different data, and detects other data based on the features of the steady data. Classification algorithms and regression analysis are mainly used.
- Semi-supervised learning for anomaly/change detection: Semi-supervised learning for anomaly/change detection is a method that uses only stationary data for learning and anomaly/change data similar to stationary data for anomaly/change detection. It is often used for deep learning-based detection.

However, the following points are different between anomaly detection and change detection.

- In anomaly detection, the objective is to detect anomalous data that is different from normal data, whereas in change detection, the change data to be detected is not necessarily anomalous data.
- In anomaly detection, normal data is usually acquired and unsupervised learning is used to detect anomalous data (since anomalous data is scarce), whereas in change detection, the target data is acquired periodically and a method to detect changes using either supervised or unsupervised learning is generally used.
- Anomaly detection has been widely used in areas such as monitoring systems, quality control, and security, whereas change detection is often used in areas such as IoT devices, business applications, and natural environment monitoring.
- The purpose of anomaly detection is to detect abnormal data and requires high accuracy, whereas the purpose of change detection is to detect changes that differ from normal data, and the required accuracy of detection depends on the degree of state change of the monitored object and the purpose of detection.

The following issues are considered to exist in anomaly detection by machine learning.

- Data quality: Since anomaly detection by machine learning is data-dependent, data quality affects the accuracy of anomaly detection. Problems such as small amount of anomaly data, too much noise, or missing data will reduce the accuracy of detection.
- Amount of data: Abnormal data is often very small compared to normal data, resulting in insufficient amount of data for anomaly detection. Since it is sometimes difficult to collect anomaly data, an efficient data collection method is needed.
- Algorithm selection: There are various algorithms for anomaly detection, and the accuracy of anomaly detection varies greatly depending on which algorithm is selected. However, which algorithm is optimal depends on the nature of the data and the system, so it is important to select the appropriate algorithm.
- Threshold setting: Anomaly detection requires the setting of a threshold value for judgment, but this is a difficult problem, and know-how is needed to strike a balance between a high detection rate and a low false positive rate, as well as to select the right algorithm.
- Temporal changes: Anomaly detection needs to be robust to temporal changes, but the state of the system may change, and the anomaly detection algorithm may not be able to keep up.

The following issues are also considered in change detection.

- Missing data and anomalies: If the data used for change detection contains missing or anomalous values, machine learning algorithms may not be able to make appropriate predictions. This is why data preprocessing is important.
- Model Selection: A variety of machine learning algorithms are used for change detection, but which algorithm is optimal depends on the problem. The choice of the appropriate model is necessary here.
- Lack of labels: Change detection requires labeled data. However, if labeled data is insufficient, machine learning algorithms may not be able to make accurate predictions. Therefore, a solution to the problem of missing labels is needed.
- Model Adaptability: Change detection may involve different frequencies and timing of changes. In order to allow machine learning models to respond flexibly, it is necessary to devise ways to increase adaptability.
- Parameter optimization: There are many parameters in machine learning, and it is important to set them appropriately. Optimization of parameters is also necessary for change detection, but since parameter optimization is time-consuming and costly, efficient methods need to be sought. Translated with www.DeepL.com/Translator (free version)

About Anomaly and Change Detection Technologies

From the Machine Learning Professional Series, “Anomaly and Change Detection.

In any business setting, it is extremely important to be able to detect changes or signs of anomalies. For example, by detecting changes in sales, we can quickly take the next step, or by detecting signs of abnormalities in a chemical plant in operation, we can prevent serious accidents from occurring. This will be very meaningful in considering the tasks of digital transformation and artificial intelligence.

The traditional approach to these problems has been to use past cases as “rules”, for example, “IF (temperature ≥ 28°C) AND (humidity 75%) THEN uncomfortable”. However, the number of rules that humans can recognize in a Meiji sense is an order of magnitude smaller than the diversity of the real world, and the question of “who will make the rules for us? The problem of “who will make the rules?” has been called the “knowledge acquistion bottleneck” in the history of artificial intelligence research and has become a major issue.

In recent years, it has become possible to construct a practical anomaly and change detection system by using statistical machine learning techniques. This is a general-purpose method that uses the probability distribution p(x) of the possible values of the observed value x to describe the conditions for anomalies and changes in mathematical expressions.

Here, we will discuss specific problems of anomaly detection and change detection. First of all, a typical approach is called outlier detection, which finds typical anomaly patterns in certain data, as shown in red in the figure below.

There is also a type of problem in which the behavior of the observed value changes, as shown in the red part of the figure below, rather than the value shifting, and this problem is called change detection or change-point detection.

In addition to the above, there are other problems with discrete data such as spam mail detection and change detection.

In order to analyze these problems, the key is to learn probability distributions according to the nature of the data, and how to connect the distributions with the degree of anomaly or change.

In this blog, we will discuss the following topics related to these anomaly and change detections.

Implementations

Overview of Anomaly Detection Techniques and Various Implementations

Overview of Anomaly Detection Techniques and Various Implementations. Anomaly detection is a technique for detecting anomalous behavior or patterns in a data set or system. Anomaly detection is a system for modeling the behavior and patterns of normal data and detecting anomalies by evaluating deviations from them. Anomaly refers to the unexpected appearance of data or abnormal behavior, and is captured as differences or outliers from normal data. Anomaly detection is performed using both supervised and unsupervised learning methods.

This section provides an overview of anomaly detection techniques, application examples, and implementations of statistical anomaly detection, supervised anomaly detection, unsupervised anomaly detection, and deep learning-based anomaly detection.

Overview of Change Detection Techniques and Examples of Implementations

Overview of Change Detection Techniques and Examples of Implementations. Change detection technology (Change Detection) is a method for detecting changes or anomalies in the state of data or systems. Change detection compares two states, the learning period (past data) and the test period (current data), to detect changes in the state of the data or system. The mechanism is to model normal conditions and patterns using data from the learning period and compare them with data from the test period to detect abnormalities and changes.

This section provides an overview of this change detection technology, application examples, and specific implementations of the reference model, statistical change detection, machine learning-based change detection, and sequence-based change detection.

Similarity in machine learning

Similarity in machine learning. Similarity is a concept that describes the degree to which two or more objects or things have common features or properties and are considered similar to each other, and plays an important role in evaluating, classifying, and grouping objects in terms of comparison and relatedness. This section describes the concept of similarity and general calculation methods for various cases.

Overview of Self-Supervised Learning and Examples of Various Algorithms and Implementations

Overview of Self-Supervised Learning and Examples of Various Algorithms and Implementations. Self-Supervised Learning is a type of machine learning and can be considered as a type of supervised learning. While supervised learning uses labeled data to train models, self-supervised learning uses the data itself instead of labels to train models. This section describes various algorithms, applications, and implementations of self-supervised learning.

Robust Principal Component Analysis Overview and Implementation Examples

Robust Principal Component Analysis Overview and Implementation Examples. Robust Principal Component Analysis (RPCA) is a method for finding a basis in data, and is characterized by its robustness to data containing outliers and noise. This paper describes various applications of RPCA and its concrete implementation using pyhton.

Overview of Online Learning, Various Algorithms, Application Examples and Specific Implementations

Overview of Online Learning, Various Algorithms, Application Examples and Specific Implementations. Online learning is a method of learning by sequentially updating a model in a situation where data arrives sequentially. Unlike batch learning in ordinary machine learning, this algorithm is characterized by the fact that the model is updated each time new data arrives. This section describes various algorithms and examples of applications of on-run learning, as well as examples of implementations in python.

Overview of Structural Learning and Various Applications and Implementations

Overview of Structural Learning and Various Applications and Implementations. Structural Learning is a branch of machine learning that refers to methods for learning structures and relationships in data, usually in the framework of unsupervised or semi-supervised learning. Structural learning aims to identify and model patterns, relationships, or structures present in the data to reveal the hidden structure behind the data. Structural learning targets different types of data structures, such as graph structures, tree structures, and network structures.

This section discusses various applications and concrete implementations of structural learning.

Overview of RNN and Examples of Algorithms and Implementations

Overview of RNN and Examples of Algorithms and Implementations. RNN (Recurrent Neural Network) is a type of neural network for modeling time-series and sequence data, and can retain past information and combine it with new information, such as speech recognition, natural language processing, video analysis, and time series prediction, It is a widely used approach for a variety of tasks.

Overview of LSTM and Examples of Algorithms and Implementations

Overview of LSTM and Examples of Algorithms and Implementations. LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN), which is a very effective deep learning model mainly for time series data and natural language processing (NLP) tasks. LSTM can retain historical information and model long-term dependencies, making it a suitable method for learning long-term information as well as short-term information.

About GRU (Gated Recurrent Unit)

About GRU (Gated Recurrent Unit). GRU (Gated Recurrent Unit) is a type of recurrent neural network (RNN) that is widely used in deep learning models, especially for processing time series data and sequence data. The GRU is designed to model long-term dependencies in the same way as the LSTM (Long Short-Term Memory) described in “Overview of LSTM and Examples of Algorithms and Implementations,” but it is characterized by its lower computational cost than the LSTM. It is characterized by lower computational cost than LSTM.

About Bidirectional RNN (BRNN)

About Bidirectional RNN (BRNN). Bidirectional Recurrent Neural Network (BRNN) is a type of recurrent neural network (RNN) model that can simultaneously consider past and future information. BRNN is particularly useful for processing sequence data and is widely used in tasks such as natural language processing and It is widely used in tasks such as natural language processing and speech recognition.

About Deep RNN

About Deep RNN. Deep RNN (Deep Recurrent Neural Network) is a type of recurrent neural network (RNN), which is a stacked model of multiple RNN layers. deep RNN helps model complex relationships in sequence data and extract more sophisticated feature representations. Typically, a Deep RNN consists of RNN layers stacked in multiple layers in the temporal direction.

About Stacked RNN

About Stacked RNN. Stacked RNN (Stacked Recurrent Neural Network) is a type of recurrent neural network (RNN) architecture that uses multiple RNN layers stacked on top of each other, enabling modeling of more complex sequence data and effectively capturing long-term dependencies It is a method that allows for more complex sequence data modeling and the ability to effectively capture long-term dependencies.

AnoGAN Overview, Algorithm and Implementation Example

AnoGAN Overview, Algorithm and Implementation Example. AnoGAN (Anomaly GAN) is a method that utilizes Generative Adversarial Network (GAN) for anomaly detection, especially applied to anomaly detection in medical imaging and quality inspection in the manufacturing industry. AnoGAN is an anomaly detection method that learns only normal data and uses it to find anomalies. Based on conventional GAN (Goodfellow et al., 2014), it trains the Generator (G) and Discriminator (D) to build a generative model that captures the characteristics of normal data.

Overview of SkipGANomaly, Algorithm, and Example Implementations

Overview of SkipGANomaly, Algorithm, and Example Implementations. SkipGANomaly is a GAN-based method described in “Overview of GANs and Various Applications and Examples of Implementations” for the purpose of anomaly detection, which improves on conventional GANomaly by introducing skip connections. The GAN-based method, described in “Overview of GANs and Various Applications and Examples of Implementations,” improves on regular GANomaly by introducing skip connections.

About Echo State Network (ESN)

About Echo State Network (ESN). Echo State Network (ESN) is a type of reservoir computing, a type of recurrent neural network (RNN) used for prediction, analysis, and pattern recognition of time series and sequence data. tasks and may perform well in a variety of tasks.

Reservoir computing

Reservoir computing. Reservoir Computing (RC) is a type of recurrent neural network (RNN), which is a machine learning method that is particularly effective in processing time series data. The method simplifies the learning of complex dynamic patterns by keeping parts of the network (reservoirs) connected randomly.

Overview of DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and Examples of Applications and Implementations

Overview of DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and Examples of Applications and Implementations. DBSCAN is a popular clustering algorithm in data mining and machine learning that aims to discover clusters based on the spatial density of data points rather than assuming the shape of the clusters. This section provides an overview of this DBSCAN, its algorithm, various application examples, and a concrete implementation in python.

Dirichlet Process (DP) Overview, Algorithm and Implementation Examples

Dirichlet Process (DP) Overview, Algorithm and Implementation Examples. The Dirichlet Process (DP) is a powerful tool for dealing with infinite-dimensional probability distributions and plays a central role in Bayesian nonparametric models, which are applied to clustering and topic modeling.

Hierarchical Dirichlet Process (HDP) Overview, Algorithm and Implementation Examples

Hierarchical Dirichlet Process (HDP) Overview, Algorithm and Implementation Examples. The Hierarchical Dirichlet Process (HDP) is a Bayesian nonparametric method for handling infinite mixture models. The Bayesian nonparametric method is one of the Bayesian nonparametric methods for dealing with infinite mixture models.

Chinese Restaurant Process Overview, Algorithm and Implementation Example

Chinese Restaurant Process Overview, Algorithm and Implementation Example. The Chinese Restaurant Process (CRP) is a probabilistic model used to intuitively explain the Dirichlet Process (DP), as described in “Overview of the Dirichlet Process (Dirichlet Process, DP), Algorithms, and Examples of Implementations. The Dirichlet Process (DP) is a probabilistic model used to intuitively explain the Dirichlet Process (DP). It is frequently used for clustering problems in particular.

Stick-breaking Process Overview, Algorithm, and Implementation Example

Stick-breaking Process Overview, Algorithm, and Implementation Example. The Stick-breaking Process is a typical method for intuitively understanding the Dirichlet Process (DP), as described in “Overview of the Dirichlet Process (DP), Algorithms, and Examples of Implementations. It is a typical approach to understand the Dirichlet Process (DP) intuitively, in which a bar of length 1 is infinitely and repeatedly divided at random to generate an infinite-dimensional probability distribution. This makes it a visually and mathematically beautiful way to construct discrete probability measures of Dirichlet processes.

Elasticsearch and Machine Learning

Elasticsearch and Machine Learning. Elasticsearch is an open source distributed search engine for search, analysis, and data visualization that also integrates Machine Learning (ML) technology and can be leveraged for data-driven insights and predictions. It is a platform that can be used to achieve data-driven insights and predictions. This section describes various uses and specific implementations of machine learning technology in Elasticsearch.

ElasticSearch Plug-ins and Implementation Examples

ElasticSearch Plug-ins and Implementation Examples. Elasticsearch is an open source distributed search engine that provides many features to enable fast text search and data analysis. Various plug-ins are also available to extend the functionality of Elasticsearch. This section describes these plug-ins and their specific implementations.

Theory and application of anomaly and change detection

Basic concept of anomaly and change detection – Neyman-Pearson Decision Rule

Basic concept of anomaly and change detection – Neyman-Pearson Decision Rule. For the problem of anomaly and change detection, it has become possible to construct a practical anomaly and change detection system by using statistical machine learning techniques. This is a general-purpose method that describes the conditions for anomalies and changes in mathematical expressions using the probability distribution p(x) of the possible values of the observed value x.

The specific problems of anomaly and change detection include (1) what is called outlier detection, where a typical anomaly pattern is found in a certain data set, and (2) the type of anomaly where the behavior of the observed value changes rather than the value deviates.

In order to analyze these, the key is to learn the probability distribution according to the nature of the data and how to connect the distribution with the degree of anomaly or change. There are two general frameworks: (1) cases in which, in addition to the M-dimensional vector x, a label y indicating whether the data is abnormal or normal (or whether it is a change point or not) is simultaneously observed as data for building an abnormality judgment model, and (2) cases in which a probability model is specified to extract a very small number of abnormal states.

Implementation of a simple anomaly detection algorithm using Clojure

Implementation of a simple anomaly detection algorithm using Clojure. Anomaly detection will be a machine learning technique that determines whether a given set of values for some selected features representative of a system is unexpectedly different from the values of the features that are normally observed. Applications of anomaly detection include structural and operational defect detection in manufacturing, network intrusion detection systems, system monitoring, and medical diagnostics. Anomaly detection is an extended version of binary classification in the form of a machine learning problem.

One approach to anomaly detection is to use probability distribution models constructed from training data to detect anomalies. Another approach that can be used to detect anomalies would be a proximity-based approach. In this approach, the proximity, or closeness, of a set of observed values to the remaining values in the sample data would be determined.

Whether a given set of observed values is anomalous can also be determined based on the density of the surrounding data. This approach is called the density-based anomaly detection approach. A given set of input values is classified as anomalous if the data around the given values is low.

Anomaly detection by Hotelling’s T2 method -Mahalanobis distance and chi-square distribution

Anomaly detection by Hotelling’s T2 method -Mahalanobis distance and chi-square distribution. Consider a data set D={x(1),x(2),…,x(N)} consisting of N observations in M dimensions. The Hotelling T2 method assumes that this data does not contain any anomalous samples, or if it does, they are overwhelmingly small, and that each sample independently follows the following probability density function

We use what is often called the Mahalanobis distance (squared), which describes how far the observed data x’ is from the sample mean 𝜇̃, emphasizing the “distance” aspect.

In order to determine the threshold for determining whether this distance is normal or abnormal, we explicitly derive the probability distribution that the abnormality a follows based on the assumption of multivariate normal distribution. This is a hypothesis such as, “Since this value is so rare that it occurs in less than 5% of normal cases, it must not be normal.

Here, the degree of abnormality a universally follows a chi-square distribution with M degrees of freedom and a scale factor, regardless of the physical units or numerical values of the data.

Anomaly detection by simple Bayesian method – Differences from binary classification

Anomaly detection by simple Bayesian method – Differences from binary classification. One of the factors that make the problem of anomaly detection difficult is that there are so many variables that it is unmanageable, and the naive Bayes method is a method that solves this problem with a simple idea of separating the problem by each variable (dimension of the variable).

The origin of the name “Bayesian” in simple Bayes is a combination of Bayesian decision rule and Neyman-Pearson decision rule related to the above simple assumption that variables are independent of each other, and it is “a modeling method that considers each variable as independent and applies it to the degree of anomaly.

In this paper, we will also discuss the simple Bayesian algorithm.

Anomaly detection by nearest neighbor method -Riemannian measurement and dealing with multimodal distributions

Anomaly detection by nearest neighbor method -Riemannian measurement and dealing with multimodal distributions. The aforementioned T2 method of Hotelling is effective in practical use only in situations where the observed values are concentrated around a nearly constant value. In this article, we will discuss the nearest neighbor method, which does not have such limitations and is a simple method. In the neighborhood method, the essential issue is how to determine the distance between data, and I will introduce a method called metric learning that improves the neighborhood method by optimizing the Riemannian weight of the distance, and describe the margin maximization neighborhood method, which is one of the most effective methods in practice.

In the previous section, we have discussed anomaly detection methods using normal distribution and multinomial distribution. In this article, we will discuss anomaly detection methods using a special distribution called empiracal distribution. The meaning of the term “empirical distribution” is derived from the fact that, as can be seen from the nature of the delta function, it has values only at the “experienced” values in D, and becomes zero at other points. Since it is too inflexible to say that the probability density is zero except at the sample location, we will consider obtaining the probability density p(x’) at an arbitrary location x from the empirical distribution.

Sequential update anomaly detection using mixture distribution model-Jensen’s inequality and EM method

Sequential update anomaly detection using mixture distribution model-Jensen’s inequality and EM method. There are many practical systems that consist of multiple operating modes, such as air conditioning systems. In this paper, we describe an anomaly detection method using a mixture normal distribution, which is a natural formulation for such systems. The mixture distribution model has the problem that the maximum likelihood solution of the parameters of the model cannot be obtained analytically, but it is possible to obtain an iterative formula for parameter estimation using a technique called the EM method. However, it is possible to obtain an iterative formula for parameter estimation using the EM method. In addition to that, I will explain the method of weighted maximum likelihood estimation with time-dependent weights in order to deal with systems that change from time to time. Sequential update type anomaly detection with mixture normal distribution is one of the most widely used anomaly detection methods in practice.

When obtaining the parameters by maximizing the weighted log likelihood, the problem becomes that “there is a sum in the log”. Jensen’s inequality is a tool to deal with this difficulty.

Anomaly detection using support vector data description method -Biangulation problems and Lagrangian functions and data cleansing

Anomaly detection using support vector data description method -Biangulation problems and Lagrangian functions and data cleansing. In Hotelling’s T2 method, which is a classical method, the anomaly detection model was created by assuming that all data follows a single normal distribution. On the other hand, approaches using mixture distribution models and Bayesian estimation gave up using a single distribution and focused on the local scattering of data around the point of interest to create the anomaly detection model. Here, I will describe an approach that takes the idea back to the world of Hotelling’s T2 method, but instead uses a technique called “kernel trick” to indirectly represent the shading of the distribution.

Anomaly detection for directional data – Analysis Using Von Mises Fisher Distribution and Chi-Square

Anomaly detection for directional data – Analysis Using Von Mises Fisher Distribution and Chi-Square. In solving real problems, normalization and standardization of observed values is almost always done. For example, if we want to view a normalized word frequency vector to compare a one-page document with a 100-page document, it is not mathematically or practically appropriate to use a normal distribution. In this article, I will discuss the problem of anomaly detection from directional data, that is, vectors of uniform length. In the world of directional data, the von Mises-Fischer distribution plays a central role.

The most natural distribution to represent vectors of uniform length is the von Mieses-Fisher distribution. This distribution has two parameters: the mean distance and the concentration parameter.

Anomaly detection using Gaussian process regression -Output anomaly detection for input, application to design of experiments

Anomaly detection using Gaussian process regression -Output anomaly detection for input, application to design of experiments. This paper describes an anomaly detection technique for systems where input-output pairs can be observed. In this case, the relationship between input and output is modeled in the form of a response surface (or regression curve), and anomalies are detected in the form of deviations from the surface. In practical use, the relationship between input and output is often non-linear. In this article, we will discuss the Gaussian process regression method, which has a wide range of engineering applications among the nonlinear regression techniques.

When both input and output are observed, the most natural way to detect anomalies is to look at the output for a given input and see if it deviates significantly from the value expected by normal behavior. In this sense, this problem can be called a response anomaly detection problem.

Change detection using subspace method -Singular spectral transform method for time series data

Change detection using subspace method -Singular spectral transform method for time series data. This paper describes an anomaly detection technique for systems where input-output pairs can be observed. In this case, the relationship between input and output is modeled in the form of a response surface (or regression curve), and anomalies are detected in the form of deviations from the surface. In practical use, the relationship between input and output is often non-linear. In this article, we will discuss the Gaussian process regression method, which has a wide range of engineering applications among the nonlinear regression techniques.

Anomaly detection using sparse structure learning – Graph models and L1 regularization that link broken dependencies between variables to anomalies.

Anomaly detection using sparse structure learning – Graph models and L1 regularization that link broken dependencies between variables to anomalies. In the monitoring of systems represented by multivariate variables, it is important to know the contribution of each variable to individual abnormal phenomena. There are not many ways to do this, except for some extreme methods such as the simple Bayesian method. In this section, we will discuss the means of calculating the degree of anomaly of individual variables based on the idea of linking the breakdown of dependency among variables to anomalies. In learning the dependencies among variables, we devise a calculation method to efficiently extract the essential dependencies. This makes it possible to automatically extract the modular structure inherent in the system, even if the apparent dimension is high.

Anomaly detection using density ratio estimation – Anomaly Estimation from Unsupervised Data Using the Kullback-Leibler Density Ratio Estimation Method

Anomaly detection using density ratio estimation – Anomaly Estimation from Unsupervised Data Using the Kullback-Leibler Density Ratio Estimation Method. In this article, I will discuss the problem of “finding an abnormal sample among data that may contain abnormal samples, based on data that is known to be normal. This is basically a problem of outlier detection, but instead of treating the samples for calculating the degree of abnormality separately, we dare to consider the probability distribution of the entire data set and formulate it as a problem of estimating the probability density ratio for it. The main advantages of this formulation are that it provides a systematic method of determining the parameters included in the anomaly detection model, and that it is expected to improve detection accuracy by suppressing to some extent the adverse effects of noise riding on individual samples.

Change detection using density ratio estimation – Detection of structural changes using the Kullback-Leibler density ratio estimation method

Change detection using density ratio estimation – Detection of structural changes using the Kullback-Leibler density ratio estimation method. In this article, we will discuss a method for change detection by directly estimating the density ratio without making arbitrary assumptions about the distribution. Change detection is a problem of macroscopically examining the presence or absence of changes in the entire distribution, but in addition, the problem of microscopically examining changes in the individual dependencies among variables can also be handled in the same framework. The problem setting of structural change detection has been attracting attention in recent years as a powerful tool for monitoring complex systems.

Emphasizing the role of probability distributions in the change detection problem, it is sometimes referred to as the distributional cahnge detection problem. The problem is similar to that of the two-sample test in statistics, except that it does not fall within the framework of so-called test theory.

The definition of the degree of change is strictly the same as that of the Kullback-Leibler divergence, which measures the difference between distributions. Kullback-Leibler divergence is a measure of the degree of difference between two distributions.

Application of sparse models to anomaly detection

Application of sparse models to anomaly detection. We describe a graphical lasso (sparse structure learning of Gaussian graphical models) that introduces sparsity into the relationships, and its application to anomaly detection (extension of the Hotelling T2 score).

Application of Nonparametric Bayesian Structural Change Estimation

Application of Nonparametric Bayesian Structural Change Estimation. In this article, we will discuss structural change estimation of time series data as an application of nonparametric Bayesian models. One of the problems in the analysis of time series data is the estimation of changes in the structure of the data. The problem of analyzing changes in the properties of data is an important topic that has been extensively studied as change checking. Here, we describe a method using a statistical model such as the Dirichlet process.

The basic idea is to assume that each data is generated from multiple models with a certain probability, and to estimate structural changes in the data by estimating changes in the generation process over time.

On failure risk analysis and ontology (FMEA, HAZID)

On failure risk analysis and ontology (FMEA, HAZID). FMEA stands for Failure Mode and Effect Analysis, and is a systematic method of analyzing potential failures for the purpose of preventing failure problems.

FTA (Fault Tree Analysis) is a similar failure analysis method, but FTA is a top-down method in which the undesirable events of a product are first assumed, and the possible paths to failure or accident are described in a tree structure along with the probability of occurrence. On the other hand, FMEA is a bottom-up analysis method that describes not the failure itself (function), but the failure event that causes the failure. Specifically, after organizing the system information (structure, functions, components, etc.) in preparation, an FMEA sheet is prepared that lists the failure modes, their effects, and the assumed failure modes.

HAZID is an abbreviation for Hazard Identification Study, which is a method for safety assessment of plants and systems to identify potential risks (hazards) and evaluate the magnitude of those risks. The identification of hazards is done by using the What-if method or its improved version, the Structured What-if Technique (SWIFT). In this method, a structured worksheet is used and questions such as “What-if”, “How could”, and “It is possible” are asked. It is used for brainstorming based on questions such as “What-if”, “How could”, and “It is possible”, in order to anticipate various problems in advance.

This paper introduces FMEA and HAZID, followed by specific examples of combining them with ontology.

Time series forecasting with GRU

Time series forecasting with GRU. We describe an advanced method to improve the performance and generalization power of RNNs. In this paper, we take the problem of predicting temperature as an example, and access time-series data such as temperature, pressure, and humidity sent from sensors installed on the roof of a building. Using these data, we solve the difficult problem of predicting the temperature 24 hours after the last data point, and discuss the challenges we face when dealing with time series data.

Specifically, I describe an approach that uses recurrent dropout, recurrent layer stacking, and other techniques for optimization, and uses GRU (Gated Recurrent Unit) layers.

Fault Diagnosis System Based on Ontology for Fleet Case Reused

Fault Diagnosis System Based on Ontology for Fleet Case Reused. In order to minimize the effects of unexpected system failures, the efficiency of fault diagnosis must be improved. When classical diagnostic techniques are considered, unexpected events are detected from a local perspective, i.e., at the equipment level. However, when complex systems are considered, classical techniques are useless because the whole system may not be monitored and the performance may vary due to the interaction between the equipment and the environment.

Maintenance personnel use their knowledge of component degradation mechanisms, built on multiple technologies of mechanical, electrical, electronic, or software nature, to formulate hypotheses about the causes of failures and performance anomalies based on the occurrence of symptoms.

The ontology approach to them is described for the case of a ship.

Bow tie analysis, ontologies and AI technologies

Bow tie analysis, ontologies and AI technologies. Bowtie analysis is a risk management technique that is used to organise risks in a visually understandable way. The name comes from the fact that the resulting diagram of the analysis resembles the shape of a bowtie. The combination of bowtie analysis with ontologies and AI technologies is a highly effective approach to enhance risk management and predictive analytics and to design effective responses to risk.

Anomaly Detection Using Autoencoders (external link)

Anomaly Detection Using Autoencoders (external link). This page introduces a python implementation of anomaly detection in images using autoencoders. As a simple example, detection with MNIST data is presented.

Anomaly Detection Using GAN (external link)

Anomaly Detection Using GAN (external link). Anomaly detection from image information using AnoGAN, ADGAN, EfficientGAN, and f-ANOGAN

Anomaly Detection with AnoGAN (external link)

Anomaly Detection with AnoGAN (external link). AnoGAN stands for Anomaly Detection with Generative Adversarial Networks. AnoGAN is recognized as the first attempt at anomaly detection.

The mechanism of AnoGAN is very simple: the GAN is trained sufficiently to focus only on normal images. This allows the discriminative network to determine “normal images” and also to identify “not normal images” = “images that are not normal”. Anomaly detection is also achieved because given an image with an abnormality, it will be able to highly discriminate between normal and abnormal images.

As a result, it is said that this is the first proposal of an abnormality detection by GAN called AnoGAN, which can discriminate whether an image is abnormal or not, instead of the conventional role of GAN, which is considered to be mainly to produce precise and realistic images.

This background involves medical and other fields. For example, in special cases of pathology, there are patterns where there are normal images but few datasets of abnormal images. Even if one wants to perform optimal learning, detection is not possible without a pattern. Therefore, the reversal concept of learning normal images and detecting any abnormalities was born.

Change point detection using Cauchy distribution in R/Stan (external link)

Change point detection using Cauchy distribution in R/Stan (external link). Stan can be used to detect change points in time series data. one of the time series data sets incorporated in R is the Nile River flow data (Nile), which is known to have changed abruptly between 1898 and 1899. This page shows how the Nile River flow data can be used to detect change points.

Anomaly detection using Mahalanobis distance (external link)

Anomaly detection using Mahalanobis distance (external link). Mahalanobis distance is a method for anomaly detection in situations where data is stable. It is characterized by the fact that its sensitivity decreases as the dimension of the data increases, so it may not be suitable for capturing subtle changes in values, but it is powerful in situations where messed up values come into stable data.

RADM:Real-Time Anomaly Detection in Multivariate Time Series Based on Bayesian Network (eternal link)

RADM:Real-Time Anomaly Detection in Multivariate Time Series Based on Bayesian Network (eternal link). Aiming at the anomaly detection in multivariate time series(MTS), we propose a real-time anomaly detection algorithm in MTS based on Hierarchical Temporal Memory(HTM) and Bayesian Network(BN), called RADM. First of all, we use HTM model to evaluate the real-time anomalies of each univariate time series(UTS) in MTS. Secondly, a model of anomalous state detection in MTS based on Naive Bayesian is designed to analyze the validity of the above MTS. Lastly, considering the real-time monitoring cases of the system states of terminal nodes in Cloud Platform, we utilize ternary time series of CPU utilization, Network speed and Memory occupancy ratio as data samples, and through the experimental simulation, we verify that RADM proposed in this paper can take advantage of the specific relevance in MTS and make a more effective judgment on the system anomalies

Anomaly Detection in Multivariate Time Series Using Fuzzy AdaBoost and Dynamic Naive Bayesian Classifier

Anomaly Detection in Multivariate Time Series Using Fuzzy AdaBoost and Dynamic Naive Bayesian Classifier. This paper presents a novel method to detect anomaly using Fuzzy AdaBoost and Dynamic Naive Bayesian classifier. Dynamic Naive Bayesian Classifier (DNBC) is an extension of Hidden Markov Models (HMM) that is used here to model multivariate observation sequences typically generated from multiple sensors associated to monitoring processes. The Fuzzy AdaBoost (FAB) method is used for ensembling multiple DNBCs to classify an instance as an anomaly. FAB needs Footprint of Uncertainty (FOU) of error that is further used to Figure out the misclassification error and update weights of the data samples required during the boosting process. Here, we introduce an approach to initialize the intervals of FOU using the statistical assets of data that belongs to the normal class. The efficacy of the proposed method is demonstrated through a case study on a stuck pipe problem that occurs during oil well drilling process.

Multivariate Gaussian distribution for anomaly detection(external link)

Multivariate Gaussian distribution for anomaly detection(external link). Consider there are T1, T2, P1, F1 and Power are the variables have particular range when the operation is normal. When there is an abnormality in the system, these parameters will follow anomalous behaviors. So in simple terms, finding this hidden anomalous behavior in the data is the
Anomaly detection” problem.

Deux Ex Machina

AIシステム設計・意思決定構造の設計を専門としています。
Ontology・DSL・Behavior Treeによる判断の外部化、マルチエージェント構築に取り組んでいます。

Specialized in AI system design and decision-making architecture.
Focused on externalizing decision logic using Ontology, DSL, and Behavior Trees, and building multi-agent systems.