Overview of Structural Deep Network Embedding (SDNE), algorithms and examples of implementation

Machine Learning Artificial Intelligence Natural Language Processing Semantic Web Python Collecting AI Conference Papers Deep Learning Ontology Technology Digital Transformation Knowledge Information Processing Graph Neural Network Navigate This blog

Overview of Structural Deep Network Embedding(SDNE)

Structural Deep Network Embedding (SDNE) is a type of graph autoencoder that extends autoencoders to graphs.

An autoencoder, also described in “Autoencoder,” is a neural network that performs unsupervised learning by encoding given data into a low-dimensional vector in the latent space.

There are many variations of graph autoencoders, depending on whether node attributes are also used, what encoders and decoders are used, and what the objective function is.

SDNE is a model reported by Wang et al. in “Structural Deep Network Embedding” and is a stacked autoencoder that aims to maintain first-order and second-order proximity simultaneously. SDNE focuses on capturing the structural features between the nodes of a graph, and the model is capable of transforming the graph into a low-dimensional, dense vector representation while preserving information such as links between adjacent nodes and node adjacencies. Obtaining such a low-dimensional representation allows for efficient processing of large graphs and estimation of similarities and relationships among nodes.

The first-order proximity of SDNE is expressed as follows.

\[L_{1st}=\displaystyle\sum_{i,j=1}^n\mathbf{A}_{i,j}||\mathbf{h}_i^{(K)}-\mathbf{h}_j^{(K)}||_2^2\]

Where \(\mathbf{h}_i^{(K)}\) is the low-dimensional vector of the representation of node i and (K) is the Kth layer (K is the number of hidden layers) The second order proximity is expressed by the following equation

\[L_{2nd}=\displaystyle\sum_{i=1}^n||(\hat{\mathbf{x}}_i-\mathbf{x}_i)\odot\mathbf{b}_i||_2^2\]

where \(\mathbf{x}_i\) is the i-th row of the input adjacency matrix (the vector of no i’s connection to other nodes), \(\hat{\mathbf{x}}_i\) is the i-th row of the autoencoder output matrix, and \(\mathbf{b}_i\) is the vector that penalizes the connection to other nodes If \(A_{i,j}=0\), then \(b_{i,j}=1\), and if \(A_{i,j}=1\), then \(b_{i,j}=\beta>1\) (β is a hyperparameter).

With such a model, SDNE has a structure for preserving adjacency information in the middle layer within the autoencoder, so that nearby nodes are learned to remain close together in the low-dimensional space.

Using these, the objective function of SDNE is defined as follows

\[L=L_{2nd}+\alpha L_{1st}+\lambda L_{reg}\]

where α and λ are hyperparameters and \(L_{reg}\) is the L2 regularization term to prevent overfitting.

\[L_{reg}=\frac{1}{2}\displaystyle\sum_{k=1}^K(||\mathbf{W}^{(K)}||_F^2+||\hat{\mathbf{W}}^{(K)}||_F^2\]

SDNE’s loss function is designed to simultaneously minimize reconstruction errors and preserve adjacencies in low-dimensional representations, while reproducing the original graph structure as accurately as possible and preserving adjacencies in low-dimensional space.

The SDNE library is available at the following website.

Algorithms related to Structural Deep Network Embedding (SDNE)

The following is a description of algorithms related to SDNE.

1. DeepWalk:

Overview of DeepWalk: DeepWalk is a method that combines a random walk and a neural network to convert a graph structure into a low-dimensional vector. For more information on DeepWalk, please refer to “DeepWalk Overview, Algorithm and Example Implementation“.

Relation to SDNE: SDNE uses random walk-based methods like DeepWalk to capture graph adjacency information. However, SDNE uses autoencoders for deeper learning and more effective graph embedding.

2. LINE (Large-scale Information Network Embedding):

Overview of LINE: LINE is a method for efficiently embedding large information networks. For details on LINE, please refer to “Overview of LINE (Large-scale Information Network Embedding), Algorithm and Example Implementation.

Related to SDNE: SDNE also makes effective use of adjacency information to capture graph structure, and SDNE uses autoencoders to incorporate non-linearity to enable learning richer representations.

3. GraREP (Graph Representation learning):

Overview of GraREP: GraREP is a method that aims to capture different hierarchical features of a graph structure. For more information on GraREP, please see “GraREP Overview, Algorithm and Example Implementation“.

Relation to SDNE: SDNE also incorporates information from different hierarchies to capture features of the graph structure; SDNE effectively integrates information from different hierarchies using autoencoders to learn more precise embeddings.

4. GCN (Graph Convolutional Networks):

Overview of GCN: GCN is a type of neural network that can perform convolution on graph data. GCN performs graph filling by updating the representation of each node using information from neighboring nodes. For more information on GCNs, see “Graph Convolutional Neural Networks (GCNs): Overview, Algorithms, and Examples of Implementations.

Related to SDNEs: SDNEs also consider neighbor information to effectively capture the graph structure; like GCNs, SDNEs have a structure that reflects neighbor information, but they learn more sophisticated feature representations through autoencoders.

Application of Structural Deep Network Embedding (SDNE)

The following are examples of SDNE applications.

1. social network analysis: In social networks, it is important to understand the relationships among users and the structure of the network. SDNE effectively captures the relationships among users and groups in a social network and helps to identify node similarities and communities.

Clustering of users: similar users can be grouped together using low-dimensional vector representations learned by SDNE.

Link Prediction: Using node embeddings learned by SDNE, links between new nodes can be predicted.

2. recommendation system: the recommendation system recommends the best items for individual users based on their behavioral history and item characteristics; SDNE can learn feature representations to represent relationships among users and items to make effective recommendations.

Item embedding learning: SDNE is used to learn low-dimensional embeddings that capture similarities between items.

User personalized recommendation: recommend the best items for a particular user using the user’s feature representation learned by SDNE.

3. bioinformatics: Bioinformatics involves the analysis of biological data, such as genes and proteins, to understand their functions and interactions; SDNE is used to extract features of biological networks and estimate protein-protein interactions and functions.

Prediction of protein-protein interactions: Predict novel protein-protein interactions using protein embeddings learned by SDNE.

identification of disease-associated genes: to understand the mechanisms of disease using SDNE-learned relationships between genes and proteins associated with disease.

4. targeting of Internet advertisements: Internet advertisements display effective advertisements based on user attributes and behaviors; SDNE can learn similarities and associations among users to improve targeting and personalized advertising.

User segmentation: identify groups of users in similar segments using user feature expressions learned by SDNE.

Measuring the effectiveness of advertising: Analyze the characteristics of users who participated in the advertising campaign using SDNE to evaluate the effectiveness of the advertising.

SDNE contributes to a variety of data analysis and information inference tasks by effectively capturing graph structure and learning low-dimensional dense representations.

Example implementation of Structural Deep Network Embedding (SDNE)

As an example to implement Structural Deep Network Embedding (SDNE), we show how to use NetworkX, a Python graph processing library, and Keras, a deep learning library.

1. installation of required libraries: First, install the following libraries.

pip install networkx
pip install keras
pip install tensorflow

2. SDNE Implementation: The following are the basic steps to implement SDNE.

Define the SDNE class: First, define the SDNE class. This class is responsible for training the network and obtaining a vector representation.

import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
from keras.optimizers import Adam
import networkx as nx
from sklearn.preprocessing import MinMaxScaler

class SDNE:
    def __init__(self, graph, embedding_size=128, alpha=0.3, beta=0.3, nu1=1e-6, nu2=1e-6, epochs=50):
        self.graph = graph
        self.embedding_size = embedding_size
        self.alpha = alpha
        self.beta = beta
        self.nu1 = nu1
        self.nu2 = nu2
        self.epochs = epochs

    def preprocess_graph(self):
        self.node_num = len(self.graph.nodes())
        self.adj_matrix = nx.adjacency_matrix(self.graph).toarray()
        self.adj_matrix = MinMaxScaler().fit_transform(self.adj_matrix)
        self.adj_matrix = self.adj_matrix + np.eye(self.node_num)

    def create_model(self):
        x = Input(shape=(self.node_num,))
        h = Dense(self.embedding_size, activation='relu')(x)
        y = Dense(self.node_num, activation='sigmoid')(h)

        self.model = Model(inputs=x, outputs=y)
        self.model.compile(optimizer=Adam(lr=0.001),
                           loss='binary_crossentropy')

    def train_model(self):
        self.model.fit(self.adj_matrix, self.adj_matrix,
                       epochs=self.epochs, verbose=1)

    def get_embeddings(self):
        hidden_layer = Model(inputs=self.model.input,
                             outputs=self.model.layers[1].output)
        node_embeddings = hidden_layer.predict(self.adj_matrix)
        return node_embeddings

Example of SDNE usage: The following is an example of actually using SDNE to obtain graph embeddings.

# Load data and create networks
G = nx.karate_club_graph()
sdne = SDNE(G, embedding_size=128, epochs=50)

# Network pre-processing
sdne.preprocess_graph()

# Model Creation
sdne.create_model()

# Model Learning
sdne.train_model()

# Getting Embedded
embeddings = sdne.get_embeddings()
print(embeddings)

In this example, the karate_club_graph() function is used to read the social network of a karate club, and SDNE is applied to obtain a 128-dimensional node embedding.

Challenges of Structural Deep Network Embedding (SDNE) and how to deal with them

The following describes the main challenges of SDNE and the measures taken to address them.

1. graph size and computational cost:

Challenge:
When SDNE is applied to large graphs, the computational cost can be high, and as the number of nodes and edges in the graph increases, the computation time required for model training and embedding increases.

Solution:
Mini-batch learning: Implement mini-batch learning described in “Overview of mini-batch learning and examples of algorithms and implementations” to efficiently train even large graphs.
Distributed learning: Use multiple compute nodes to parallelize learning and increase computation speed.
Graph Sampling: Sampling and learning a subset of a graph is more efficient than working with the entire graph.

2. non-linearity and over-learning:

Challenges:
SDNE learns nonlinear feature representations using autoencoders, but there is a risk of overlearning. In particular, overlearning is more likely to occur with high dimensional data and complex graph structures.

Solution:
Dropout: Use dropout to suppress overlearning.
Regularization: Introduce L1 regularization and L2 regularization to control model complexity.
Early stopping: stop learning when the loss of validation data is no longer improving to prevent over-learning.

3. graph feature extraction and representation learning:

Cahllenge:
SDNE can capture the local adjacencies and structure of a graph, but it can be difficult to effectively represent the overall patterns and features of a graph.

Solution:
Multilayer: Use deeper network architectures to extract more complex features.
Leverage graph expertise: Leverage domain properties of graphs to design more effective feature representations.
Various loss functions: optimize learning by introducing loss functions appropriate for different properties of the graph.

4. node dimensionality reduction and information loss:

Challenge:
SDNE compresses high-dimensional node features into low-dimensional embeddings, resulting in information loss. In low-dimensional embeddings, some of the original graph structure and features may be lost.

Solution:
Select appropriate embedding dimension: Adjust the embedding dimension appropriately to minimize information loss.
Interpret latent features: Interpret latent features of low-dimensional embeddings to supplement lost information.
Multidimensional embedding: learn embeddings in several different dimensions to ensure diversity of information.

5. dealing with noise and incomplete data:

Challenges:
Graph data may contain imperfections such as noise or missing data, and SDNE needs to learn robustly to these imperfections.

Solution:
Data completion: Implement methods to complement missing data to deal with incomplete data.
Noise reduction: Use filtering to remove noise and loss functions that are robust to noise.
Ensemble learning: combine multiple SDNE models to obtain robust embeddings.

Reference Information and Reference Books

For more information on graph data, see “Graph Data Processing Algorithms and Applications to Machine Learning/Artificial Intelligence Tasks. Also see “Knowledge Information Processing Techniques” for details specific to knowledge graphs. For more information on deep learning in general, see “About Deep Learning.

Reference book is

“Hands-On Graph Neural Networks Using Python: Practical techniques and architectures for building powerful graph and deep learning apps with PyTorch“

“Graph Neural Networks: Foundations, Frontiers, and Applications“等がある。

“Introduction to Graph Neural Networks“

“Graph Neural Networks in Action“