Relational Data Learning

Machine Learning Natural Language Processing Artificial Intelligence Digital Transformation  Semantic Web Knowledge Information Processing Graph Data Algorithm Structural Learning Recommendation Technology  Navigation of this blog

Overview of Relational Data Analysis with Machine Learning

Relational data is data that represents what kind of “relationship” exists for any given pair of N objects. Considering a matrix as a form of representing this relational data, the relationships are the matrix elements themselves, and relational data learning can be said to be learning to extract patterns in this matrix data.

There are two types of tasks to which these are applied: “prediction” and “knowledge extraction.

Prediction is the problem of estimating the value of unobserved data using statistical models learned and designed from observed data, and knowledge extraction is the task of extracting information that leads to some useful knowledge, rules, or knowledge by analyzing the characteristics of the related data itself or by modeling the given observed data appropriately. The knowledge extraction problem is a task to extract some useful knowledge, rule, or information that leads to knowledge by analyzing the characteristics of the relational data itself or by appropriately modeling given observed data.

There are various algorithms used for this relational data learning, such as multilayer perceptron, support vector machine, decision tree, random forest, deep learning, anomaly detection, etc. By using them, data classification, prediction, anomaly detection, clustering, and feature selection can be realized.

Applications of machine learning for relational data analysis include the following.

  • Marketing: Relational data analysis using machine learning can be used to analyze consumer preferences and behavior patterns in the field of marketing. For example, data such as past purchase history and click history can be used to analyze consumer preferences for product segmentation, customer targeting, and promotion optimization.
  • Sales forecasting: When a company wants to forecast future sales based on past sales data, it can use machine learning to analyze relational data. Specifically, the company can extract features such as sales per month, product category, and sales volume per region from historical sales data and use machine learning algorithms to forecast future sales.
  • Medical: Relational data analysis using machine learning will enable early detection of diseases and optimization of treatment methods in the medical field. For example, based on past patient data, it will be possible to predict and diagnose diseases and predict the effectiveness of treatment.
  • Finance: Relational data analysis using machine learning will enable the prediction of customer credit risk and fraud detection in the financial sector. For example, models for customer credit scoring and fraud detection can be built based on data such as past customer transaction and borrowing history.
  • IoT: Relational data analysis using machine learning enables anomaly detection and predictive maintenance from sensor data in the IoT field. For example, it can predict equipment failures and optimal maintenance timing based on data collected from multiple sensors.

About Relational Data Learning

From “Relational Data Learning” in the Machine Learning Professional Series.

Relational data, which is expressed around the relationship between “something” and “something,” appears in various fields and has an important place in data analysis. In the simplest case, relational data is data that represents what kind of “relationship” exists for any pair of N objects.

If we consider a matrix as a form of representing the relationship between “something” and “something,” the data representing the relationship would be the elements in the matrix itself. One of the mathematical approaches to matrices is the “matrix factorization method”. The matrix factorization method was originally conceived based on linear algebra, but it has been developed in recent years by creating various derivative techniques such as non-negative matrix factorization to improve generalization performance and qualitative interpretability, and tensor factorization to apply to higher-order data. In this book, relational data analysis is introduced from the viewpoint of this matrix factorization method.

There are two major tasks to which this “relational data analysis” can be applied: prediction and knowledge extraction.

A prediction problem is a problem of estimating the value of unobserved data using a statistical model learned and designed from observed data. There are two typical prediction problems. These can be realized as a problem of predicting missing values in a relational data matrix. Another important example of a prediction problem is the estimation of information dissemination or information diffusion in a network.

The above problems are often formulated primarily as supervised learning problems.

Knowledge extraction problems are designed to analyze the properties of the relational data itself by computing the graph features, or to extract some useful knowledge or information that can lead to knowledge by appropriately modeling the given observation data. The specific tasks include community extraction and clustering in a broad sense.

Many knowledge extraction tasks can be formulated as unsupervised learning problems. This is because no one knows what kind of clusters exist in the real-world relational data, or whether there really are clusters, and therefore no teacher data can be created. This means that in the knowledge extraction task, we must take the approach of an unsupervised learning problem, which is to create a computational model to reflect the hypothesis (intention) that “if there are clusters, they should follow these properties,” and then examine the computational results in conjunction with the actual data.

Here, we will discuss the general idea of clustering of general objects and clustering of relational data. General object clustering is to find nearby samples in the feature space according to the features of the object, and in the case of samples whose features are easy to understand, as shown below, it seems to be a simple task that can be understood at a glance by the human eye.

関係データ学習より

In contrast to the above example, the task of “separating nodes with similar connections” in the following relational data learning clustering is not as easy to understand visually.

関係データ学習より

The above example is actually equivalent to the one shown in the figure below, with the arrangement and order changed.

関係データ学習より

Thus, we can see that relational data is not intuitive.

In this blog, we will discuss the following items regarding this relational data learning.

Implementation

Relational Data Learning is a machine learning method for relational data (e.g., graphs, networks, tabular data, etc.). Conventional machine learning is usually applied only to individual instances (e.g., vectors or matrices), but relational data learning considers multiple instances and the relationships among them.

This section discusses various applications for this relational data learning and specific implementations in algorithms such as spectral clustering, matrix factorization, tensor decomposition, probabilistic block models, graph neural networks, graph convolutional networks, graph embedding, and metapath walks. The paper describes.

A graph neural network (GNN) is a type of neural network for data with a graph structure. ) to express relationships between elements. Examples of graph-structured data include social networks, road networks, chemical molecular structures, and knowledge graphs.

This section provides an overview of GNNs and various examples and Python implementations.

Graph Convolutional Neural Networks (GCN) is a type of neural network that enables convolutional operations on data with a graph structure. While regular convolutional neural networks (CNNs) are effective for lattice-like data such as image data, GCNs were developed as a deep learning method for non-lattice-like data with very complex structures, such as graph data and network data.

ChebNet (Chebyshev network) is a type of Graph Neural Network (GNN), which is one of the main methods for performing convolution operations on graph-structured data. ChebNet is an approximate implementation of convolution operations on graphs using Chebyshev polynomials, which are used in signal processing.

Graph Attention Network (GAT) is a deep learning model that uses an attention mechanism to learn the representation of nodes in a graph structure. GAT is a model that uses a set of mechanisms to learn the representation of a node.

  • Graph Isomorphism Network (GIN) Overview, Algorithm and Example Implementation

Graph Isomorphism Network (GIN) is a neural network model for learning isomorphism of graph structures. The graph isomorphism problem is the problem of determining whether two graphs have the same structure, and is an important approach in many fields.

GraphSAGE (Graph Sample and Aggregated Embeddings) is a graph embedding algorithm for learning node embeddings (vector representation) from graph data. By sampling and aggregating the local neighborhood information of nodes, it effectively learns the embedding of each node. This approach makes it possible to obtain high-performance embeddings for large graphs.

Causal inference is a methodology for inferring whether one event or phenomenon is a cause of another event or phenomenon. Causal exploration is the process of analyzing data and searching for potential causal candidates in order to identify causal relationships.

This section discusses various applications of causal inference and causal exploration, as well as a time-lag example.

Elasticsearch is an open source distributed search engine that provides many features to enable fast text search and data analysis. Various plug-ins are also available to extend the functionality of Elasticsearch. This section describes these plug-ins and their specific implementations.

Theory and application

Clustering, one of the most typical problems in knowledge discovery from relational data, is described. As a solution to this problem, we give a theoretical overview of spectral clustering for symmetric relational data (relational data with no direction and a single domain), including algorithms.

The stochastic block model, which is a probabilistic model applicable to the entirety of general asymmetric relational data, and its extension technique, the infinite relational model, are described, and the theory and algorithms for clustering asymmetric relational data are described.

The basic idea, algorithm, and extension of matrix factorization are described. As an example, consider the task of “recommending movies where customers are rows, movies are columns, and the values of the elements are ratings. Or, which movie belongs to the same genre as a certain movie? In this paper, I will describe the data compression task in the case of a huge number of users such as Netflix, where the matrix is huge, and the probability of a word j appearing in a document i is represented by a matrix, which is further decomposed into a matrix of the probability of a pattern appearing in a document and a matrix of the probability of a word being generated from a pattern. The task is described below.

Previously, we represented the relationship between two types of objects, such as customers and movies, by mapping them to the two “axes” of a matrix, i.e., rows and columns, and then we introduced tensors to represent the relationship between objects with three or more axes, and discussed data representation and computation using these tensors.

Just as matrix factorization is used to analyze matrix data, tensor factorization is used to analyze tensor data. In this section, we will discuss the basic concepts and algorithms of tensor decomposition.

When the data is given as a pair of scalar output y and M-dimensional vector x, D={(y(n),x(n))|n=1,…,N}, we will discuss how to achieve a sparse solution as a linear regression (sparsifying the regression coefficients, or sparsifying in the sense of eliminating extra samples rather than variables).

As a use case, I will discuss the case where the observed data is contaminated with noise and not all samples can be trusted, as well as the case where a machine is constantly monitored by a fixed number of sensors by extending it to graph data.

An overview of graph neural networks using the Deep Graph Library and its applications to MPNN, (Message Passing Neural Networks framework) for estimating physical properties of compounds, convolution for image data, and the Transformer algorithm (BERT) for learning hidden layer weighting using attention to natural language processing.

This paper describes an anomaly detection technique for systems where input-output pairs can be observed. In this case, the relationship between input and output is modeled in the form of a response surface (or regression curve), and anomalies are detected in the form of deviations from the surface. In practical use, the relationship between input and output is often non-linear. In this article, we will discuss the Gaussian process regression method, which has a wide range of engineering applications among the nonlinear regression techniques.

The change detection problem is described with time series data in mind. Change detection cannot be achieved simply by calculating the anomaly level of a single sample. Based on this concept, change detection is formulated abstractly as a sequential density estimation problem. As the simplest concrete example, we describe a change detection technique called the singular spectral transformation method. The singular spectral transformation method is one of the most important change detection methods in practical use because of its versatility and robustness against noise.

First, the cumulative sum method, a classical change detection technique, is described. As an example, suppose that the concentration of a certain chemical substance in a reactor of a chemical plant is monitored from moment to moment. In this case, a monitoring technique such as Hotelling’s T2 method, which calculates the degree of abnormality one by one for each observation, would be inappropriate. In this case, what we want to do is to detect whether or not some abnormal situation is occurring continuously, i.e., change detection, rather than a single, sudden abnormal value obtained.

As a fusion of logic and probability, we describe SRL developed in North America, which introduces logical expressions to improve the descriptive power of Bayesian nets and is used as a kind of convenient (macro-like) function. Specifically, we will discuss probabilistic relational model (PRM), Markov logic network (MLN), and probabilistic soft logic (PSL ) are described.

Wikipedia’s infoboxes contain rich structured information of various entities, which have been explored by the DBpedia project to generate large scale Linked Data sets. Among all the infobox attributes, those attributes having hyperlinks in its values identify semantic relations between entities, which are important for creating RDF links between DBpedia’s instances. However, quite a few hyperlinks have not been anotated by editors in infoboxes, which causes lots of relations between entities being missing in Wikipedia. In this paper, we propose an approach for automatically discovering the missing entity links in Wikipedia’s infoboxes, so that the missing semantic relations between entities can be established. Our approach first identifies entity mentions in the given infoboxes, and then computes several features to estimate the possibilities that a given attribute value might link to a candidate entity. A learning model is used to obtain the weights of different features, and predict the destination entity for each attribute value. We evaluated our approach on the English Wikipedia data, the experimental results show that our approach can effectively find the missing relations between entities, and it significantly outperforms the baseline methods in terms of both precision and recall.

The goal of this work is to learn a measure supporting the detection of strong relationships between Linked Data entities. Such relationships can be represented as paths of entities and properties, and can be obtained through a blind graph search process traversing Linked Data. The challenge here is therefore the design of a cost-function that is able to detect the strongest relationship between two given entities, by objectively assessing the value of a given path. To achieve this, we use a Genetic Programming approach in a supervised learning method to generate path evaluation functions that compare well with human evaluations. We show how such a cost-function can be generated only using basic topological features of the nodes of the paths as they are being traversed (i.e. without knowledge of the whole graph), and how it can be improved through introducing a very small amount of knowledge about the vocabularies of the properties that connect nodes in the graph.

コメント

タイトルとURLをコピーしました