Overview of Adversarial Attack Models in GNN with Algorithms and Implementation Examples

Machine Learning Artificial Intelligence Natural Language Processing Semantic Web Python Collecting AI Conference Papers Deep Learning Ontology Technology Digital Transformation Knowledge Information Processing Graph Neural Network Navigate This blog
Hostile Attack Model

Adversarial attack is one of the most widely used attacks against machine learning models, especially for input data such as images, text, and audio. Adversarial attacks aim to cause misrecognition of machine learning models by applying slight perturbations (noise or manipulations). Such attacks can reveal security vulnerabilities, help assess model robustness, and provide analogous information related to model explainability as described in “Explainable Machine Learning.

Types of hostile attacks include:

1. Non-targeted Attack: A non-targeted attack is not concerned with which class the hostile sample falls into, but simply aims to mislead the model.

2. Targeted Attack: A targeted attack aims to misclassify data into a specific wrong class. That is, the attacker targets a specific class and generates a hostile sample to be classified into that class.

3. white-box attack: A white-box attack occurs when the attacker has full access to the structure and parameters of the model under attack. The attacker uses the information in the model to generate an optimal hostile sample.

4. Black-box Attack: A black-box attack occurs when the attacker does not have access to the internal information of the model under attack. The attacker observes the behavior of the model to infer the relationship between external inputs and outputs and generates a hostile sample based on this inference.

Adversary attack methods include the following:

1. Fast Gradient Sign Method (FGSM): FGSM uses the model gradient information to generate a hostile sample. Specifically, it creates perturbations in the input data by adding small values proportional to the sign of the gradient.

2. Projected Gradient Descent (PGD): PGD is a method that generates perturbations by repeating multiple FGSM steps. The perturbations generated in each step are clipped to keep them within a certain range, and the final perturbation is generated.

3. DeepFool: DeepFool is a method for finding the smallest perturbation that will most effectively modify the output of the model. It is efficient for linear models, but can be difficult to apply to nonlinear models.

4. CW Attack (Carlini-Wagner Attack): CW Attack is a method for generating hostile samples using various distance measures (L2, L∞, etc.). In particular, it has small perturbations and high attack success rates generated using nonlinear optimization.

Countermeasures and defenses against hostile attacks include the following

1. Adversarial Training: Adversarial training is retraining the model to include adversarial samples, which makes the model more robust against adversarial samples.

2. Adversarial Detection: Another method is to detect hostile samples. This would be a method that uses features of hostile perturbations to allow the model to identify hostile samples.

3. building robust models: More robust model architectures can be employed to increase resistance to hostile attacks. Examples include ensemble learning and dropout.

In general, neural networks are vulnerable to adversarial attacks due to slight perturbations, as reported in “Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images“.

Hostile Attacks in GNN

Graph neural networks also have the above-mentioned vulnerabilities, and it is necessary to construct robust graph neural networks for practical applications. Regarding adversarial attacks and defenses on graphs, a survey paper “Adversarial Attacks and Defenses on Graphs: A Review, A Tool and Empirical Studies” by Jin et al. In this paper, hostile attacks on graphs are classified based on attacker attributes as follows

  • Attacker’s ability: post-training (evasion attack) or during training (poisoning attack).
  • Type of attack: modifying node features, adding or removing edges, or adding false nodes.
  • Attacker’s goal: misclassify a few test nodes (targeted attack) or degrade overall accuracy (untargeted attack).
  • Attacker knowledge: whether the attacker knows all model parameters and training inputs and labels (white-box attack), has limited knowledge (gray-box attack), or has no knowledge at all (black-box attack).

Then, the typical adversary attack algorithms are divided into white-box attack, gray-box attack, and black-box attack, and each is further classified into targeted attack and non-targeted attack. The report also introduces targeted attacks and untargeted attacks.

In addition, defenses against these attacks are classified into the following five categories and each is introduced.

  • Adversarial training: Adversarial examples are included in the training set so that the trained model can correctly classify future adversarial examples.
  • Adversarial perturbation detection: defends the model by searching for unique differences between hostile and clean node edges.
  • certifiable robustness: inferring the security of a graph neural network and attempting to prove its robustness.
  • Graph purification: Prevent attacks during training, not after. There are two approaches: preprocessing, in which the graph is cleaned before training, and graph learning, in which a clean graph structure is obtained by avoiding hostile patterns during graph neural network training.
  • Attention mechanism: Instead of eliminating hostile perturbations, robust graph neural network models are learned by penalizing the weights of hostile edges and nodes.

In summary, adversarial attacks in GNNs can affect tasks such as node classification, link prediction, and graph generation, and adversarial attack methods include target attacks, path attacks, node deletion attacks, link attacks, etc. Countermeasures include adversarial training, graph structure regularization Countermeasures include adversary training, regularization of graph structure, importance evaluation, and cleanup mechanisms.

Adversarial Attacks and Defenses on Graphs: A Review, A Tool and Empirical Studies” further introduces DeepRobust, a repository developed by the authors for graph attacks and defenses. The authors also present experimental results of attacks and defenses on graphs.

Algorithms Related to Hostile Attacks in GNN

Some typical adversary attack algorithms in GNNs are described below.

1. Metattack: Metattack is an adversarial attack against the node classification task. The goal of this technique is to specify a target node and mislead the classification result for that node. Specifically, the following steps are used in the attack

  • Estimate the importance of adjacent nodes: Estimate the importance of the nodes adjacent to the target node. This importance is calculated using the gradient information for each node’s label prediction.
  • Importance-based attack: Based on the estimated importance, the most important adjacent nodes are selected for attack in order. Attacks are carried out by applying certain perturbations to the selected nodes.
  • Error maximization of the target node’s classification result: In order to mislead the target node’s classification result, the attack proceeds to maximize the objective function while adding perturbations to important neighbors.

2. PGD Attack: A PGD (Projected Gradient Descent) attack is one of the adversary attack methods in GNNs and is a type of white box attack, in which the attacker attacks with knowledge of the model parameters. The specific steps are as follows

  • Initialize perturbations: Initialize perturbations to node features.
  • Iterate PGD: Update the perturbation and perform the attack. In each iteration step, perturbations are updated to maximize the objective function and clipped to stay within a certain range.
  • Maximize Objective Function: Attack to maximize the objective function in order to mislead the node classification result.

3. GCN Attack: GCN (Graph Convolutional Network) Attack is a hostile attack technique against the GCN model, which is a neural network applied to graph data and used for tasks such as node classification. Attack is similar to Metattack and PGD Attack in that perturbations to node features are applied. The goal of the attack is to misclassify a particular node.

4. RL-based Attack: Another adversary attack technique uses Reinforcement Learning. These methods define states, actions, and rewards as the attacker manipulates the graph and learns the optimal attack technique.RL-based Attack techniques are used for exploratory attacks and to find potential attack nodes.

Example of Hostile Attack Implementation in GNN

An example implementation of a hostile attack in a GNN is shown. The following example uses Python and PyTorch Geometric.

Importing libraries:

import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid
from torch_geometric.utils import add_self_loops

import numpy as np
import random

Loading datasets: use Planetoid datasets; Planetoid provides a list of several graphical datasets (Cora, Citeseer, PubMed).

dataset = 'Cora'
dataset = Planetoid(root='./data/' + dataset, name=dataset)
data = dataset[0]

Define a GNN model: Define a simple GCN (Graph Convolutional Network) model.

class GCN(torch.nn.Module):
    def __init__(self):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(dataset.num_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = F.relu(self.conv1(x, edge_index))
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)
        return F.log_softmax(x, dim=1)

Hostile Attack Implementation: use the PGD (Projected Gradient Descent) Attack to attack the GNN model.

def pgd_attack(model, data, target_node, epsilon=0.1, alpha=0.01, num_iters=20):
    model.eval()
    node_feat, edge_index = data.x.clone().detach(), data.edge_index.clone().detach()
    node_feat.requires_grad = True

    for _ in range(num_iters):
        logits = model(data)
        loss = F.nll_loss(logits[data.train_mask], data.y[data.train_mask])
        loss.backward()

        perturbation = alpha * node_feat.grad.detach().sign()
        node_feat.grad.zero_()

        perturbed_feat = node_feat + perturbation
        diff = perturbed_feat - data.x
        diff.clamp_(-epsilon, epsilon)
        perturbed_feat = data.x + diff

        node_feat = perturbed_feat.clone().detach()
        node_feat.requires_grad = True

    return perturbed_feat

Execution:

# Model initialization
model = GCN()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

# training
model.train()
for epoch in range(200):
    optimizer.zero_grad()
    out = model(data)
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()

# Target node selection
target_node = random.choice(data.train_mask.nonzero().view(-1))

# hostile attack
perturbed_feat = pgd_attack(model, data, target_node)

# Evaluation of attack results
model.eval()
output = model(data)
print("Original prediction:", output[target_node].argmax().item())
output = model(data, perturbed_feat)
print("Adversarial prediction:", output[target_node].argmax().item())

In this example, the GCN model is used to perform node classification of the Cora dataset and PGD Attack is used to perform a hostile attack. The nodes to be attacked are randomly selected, and perturbations generated by PGD Attack are added to the node features and the difference from the original node features is clipped to obtain the adversarial node features. Finally, the original model and the adversarial node features are used to evaluate the original and adversarial predictions.

Reference Information and Reference Books

For more information on graph data, see “Graph Data Processing Algorithms and Applications to Machine Learning/Artificial Intelligence Tasks. Also see “Knowledge Information Processing Techniques” for details specific to knowledge graphs. For more information on deep learning in general, see “About Deep Learning.

Reference book is

Hands-On Graph Neural Networks Using Python: Practical techniques and architectures for building powerful graph and deep learning apps with PyTorch

Graph Neural Networks: Foundations, Frontiers, and Applications“等がある。

Introduction to Graph Neural Networks

Graph Neural Networks in Action

コメント

タイトルとURLをコピーしました