Overview of the Girvan-Newman Algorithm and Examples of Implementations

Machine Learning Natural Language Processing Artificial Intelligence Digital Transformation Semantic Web Knowledge Information Processing Graph Data Algorithm Relational Data Learning Recommend Technology Python Time Series Data Analysis Navigation of this blog

Overview of Girvan-Newman Algorithm

The Girvan-Newman algorithm is an algorithm for detecting the community structure of a network in graph theory, By removing these edges, the network is partitioned into communities.

The algorithm is outlined as follows:

1. computation of the betweenness centrality of edges:

The first step is to compute the mediating centrality of each edge in the network. The mediocentricity of an edge is a measure of how many vertices it passes between on its shortest path in the network.

2 Deletion of edges with high mediocentricity:

Edges with high mediation centrality are removed sequentially. This splits the network and creates different communities.

3. community detection:

As edge removal proceeds, the network is split into smaller communities. The state of the division at each step is recorded, and the final resulting division is considered to be the optimal community structure.

The algorithm takes into account how many different communities are connected through edges in the network by using the mediating centrality of edges. By removing edges with high edge mediating centrality, it is expected that community boundaries will be formed and the network will be partitioned into smaller communities.

The Girvan-Newman algorithm is useful for understanding community structure in real-world networks and will be a method applied in various fields such as social network analysis and biological networks.

Girvan-Newman Algorithm Procedure

The basic steps of the Girvan-Newman algorithm are as follows

1. mediate centrality calculation:

Calculate the mediocentricity of each edge in the network. Mediocentricity is a measure of how many vertices an edge passes between on its shortest path on the network.

2. Remove edges with high mediation centrality:

Edges are removed in the order of their calculated mediating centrality. By removing edges, the network is partitioned and different communities are formed.

3. recording the community structure at each step:

Record the community structure of the network as edges are removed at each step. This allows us to track the progress of the segmentation.

4. optimal segmentation selection:

Continue removing edges and recording the community structure until the optimal community structure is reached. The optimal split is defined as a state that satisfies certain conditions regarding, for example, the number of communities and the size of each community.

This procedure detects the community structure of the network. By removing edges with high mediating centrality, the network is pruned of branches and the ties between communities are weakened. This approach assumes that communities have internal ties with high connectivity and low external ties.

Application of the Girvan-Newman algorithm

The Girvan-Newman algorithm is used to detect community structure and has been applied to various network analyses in the real world. The following describes examples of its application.

1. social network analysis:

In social networks, where individual users form communities, the Girvan-Newman algorithm is used to detect these communities. Examples include Twitter follower networks and Facebook friend networks.

2. biological networks:

In fields such as molecular biology and neuroscience, the Girvan-Newman algorithm is applied to analyze protein-protein interaction networks and the structure of connections in neural circuits. This makes it possible to discover functionally related groups or modules.

3. internet link analysis:

The Girvan-Newman algorithm is used in the analysis of the hyperlink structure between web pages and the relevance of web sites. This allows related web pages to belong to the same community.

4. transportation network:

The Girvan-Newman algorithm is used in transportation networks such as road, rail, and air networks to divide regions and cities based on traffic flow. This allows understanding of traffic connections and impacts between different regions.

5. modeling of information propagation:

In modeling influencers and information diffusion, the Girvan-Newman algorithm can help analyze community structure and identify information propagation paths.

These examples illustrate the importance of community detection in network analysis: the Girvan-Newman algorithm identifies community structure within a network and contributes to understanding network structure and interactions.

Example implementation of the Girvan-Newman algorithm

Below is an example implementation of the Girvan-Newman algorithm using Python. In actual use, a data structure representing the network structure is required, and an example using the NetworkX library is shown here.

import networkx as nx
from networkx.algorithms import community
import matplotlib.pyplot as plt

def girvan_newman_algorithm(graph):
    # Draw the initial state of the graph
    pos = nx.spring_layout(graph)
    nx.draw(graph, pos, with_labels=True, font_weight='bold')
    plt.title("Initial Graph")
    plt.show()

    # Calculation of Edge Mediator Centrality
    edge_betweenness = nx.edge_betweenness_centrality(graph)

    while graph.number_of_edges() > 0:
        # Obtain the edge with the maximum edge mediator centrality
        max_betweenness_edge = max(edge_betweenness, key=edge_betweenness.get)
        # Show edges to be removed
        print("Removing edge:", max_betweenness_edge)
        # Remove edges
        graph.remove_edge(*max_betweenness_edge)
        # Compute new mediator centrality
        edge_betweenness = nx.edge_betweenness_centrality(graph)

        # Draw the graph after removing
        pos = nx.spring_layout(graph)
        nx.draw(graph, pos, with_labels=True, font_weight='bold')
        plt.title("Graph After Edge Removal")
        plt.show()

    # Show final community structure
    communities = list(community.girvan_newman(graph))
    print("Final Communities:", communities)

# Creating graphs (using NetworkX)
G = nx.karate_club_graph()

# Application of the Girvan-Newman algorithm
girvan_newman_algorithm(G)

In this example, the social network of the Carat Club is used to remove edges based on the mediating centrality of the edges and to visualize the intermediate and final community structure.

Challenges and Countermeasures for the Girvan-Newman Algorithm

The Girvan-Newman algorithm is a powerful community structure detection method, but challenges exist. The main challenges and countermeasures to address them are described below.

1. high computational cost:

Challenge: The Girvan-Newman algorithm is computationally expensive for large networks because it repeatedly calculates the mediate centrality of edges and then removes edges.
Solution: Approximate methods and fast methods for computing mediation centrality have been proposed to deal with large networks. In addition, methods such as subsampling and divide-and-conquer methods may be combined to reduce computational cost.

2. community size imbalance:

Challenge: The Girvan-Newman algorithm forms communities by deleting edges at each step, but this can lead to unbalanced community sizes.
Solution: If unbalanced community sizes are a problem, post-processing the communities after applying the algorithm or using it in combination with other methods may be considered.

3. handling of overlapping communities:.

Challenge: Although the Girvan-Newman algorithm performs non-overlapping community partitioning, real-world networks may have overlapping communities.
Solution: To detect overlapping communities, the Girvan-Newman algorithm is combined with a duplicate community detection algorithm.

4. solution instability:

Challenge: Depending on the initial conditions and the order in which the mediating centrality of edges is calculated, different community partitions may be obtained, and solution instability may be a problem.
Solution: There are methods to run the algorithm starting from multiple initial conditions and check if the final community structure is stable, or to combine the results of multiple runs.

Reference Information and Reference Books

Detailed information on relational data learning is provided in “Relational Data Learning“, “Time Series Data Analysis, “Graph data processing algorithms and their application to Machine Learning and Artificial Intelligence tasks“, Please refer to that as well.

Reference books include “Relational Data Mining”

“Inference and Learning Systems for Uncertain Relational Data“

“Graph Neural Networks: Foundations, Frontiers, and Applications“

“Hands-On Graph Neural Networks Using Python: Practical techniques and architectures for building powerful graph and deep learning apps with PyTorch“

“Matrix Algebra“

“Non-negative Matrix Factorization Techniques: Advances in Theory and Applications“

“An Improved Approach On Distortion Decomposition Of Magnetotelluric Impedance Tensor“

“Practical Time-Series Analysis: Master Time Series Data Processing, Visualization, and Modeling using Python“

“Time Series Analysis Methods and Applications for Flight Data“

“Time series data analysis for stock indices using data mining technique with R“

“Time Series Data Analysis Using EViews“

“Practical Time Series Analysis: Prediction with Statistics and Machine Learning“