Knowledge graph-based recommendation systems

Machine Learning Probabilistic Generative Model Support Vector Machine Sparse Modeling Artificial Intelligence General Machine Learning and Data Analysis Digital Transformation Clojure Recommendation Technology Navigation of this blog

Knowledge graph-based recommendation systems

A knowledge graph is a graph that represents relationships between entities (people, objects, concepts, etc.) and is a data format that can represent entities with multiple relationships, making recommendation using knowledge graphs one of the recommendation methods that can more accurately reflect user preferences and interests.

Common methods of recommendation using knowledge graphs include the following

Obtaining an embedded representation of an entity: by obtaining an embedded representation of an entity from the knowledge graph, the characteristics of that entity can be ascertained. An embedded representation is a numerical vector representation of an entity and is used to calculate similarity.
Mapping users and items onto the knowledge graph: by mapping users and items onto the knowledge graph, the user’s interests and the item’s features can be matched on the knowledge graph.
Recommendation using graphical similarity: by calculating the similarity on the knowledge graph, the relevance between the user and the item can be assessed. For example, it is possible to recommend items with similar characteristics to items liked by the user.
Recommendation based on path traversal on the graph: based on path traversal on the knowledge graph, it is possible to understand the links between users and items and recommend items that are more relevant. For example, a user who has watched a film can be recommended films by the same director or featuring the same actors.

Recommendation using knowledge graphs is expected to provide more detailed and accurate recommendations than conventional recommendation methods, as it can take into account not only the characteristics of an item, but also the entities to which the item belongs and the relationships between entities.

Recommendation using Matrix Factorization and knowledge graphs

Matrix Factorization and knowledge graphs are both widely used methods in recommendation systems, and some methods combine both.

Matrix Factorization is a method for modelling the interaction between a user and an item, where the characteristics of the user and the item are expressed as a numerical vector and by taking their inner product, a predictive evaluation value can be calculated, whereas a knowledge graph is a graph that expresses the relationships between entities, and by calculating the distance and similarity of the entities on the graph, an association evaluation value can be calculated. A knowledge graph, on the other hand, is a graph that expresses the relationships between entities, and relevance can be evaluated by calculating the distance and similarity of entities on the graph.

A method that combines these two techniques is Matrix Factorisation, which uses an embedded representation of entities on a knowledge graph. In this method, it is possible to obtain embedded representations of entities from the knowledge graph and use them to represent the characteristics of users and items. This makes it possible to calculate the predicted rating values of users and items while taking into account the relationships between entities.

In addition, recommendation using knowledge graphs can take into account the links between users and items based on path traversal on the graph, making it possible, for example, to recommend films by the same director or featuring the same actors to a user who has watched a film. This is one of the methods that make use of the strengths of recommendation based on the knowledge graph.

When mapping users and items onto the knowledge graph, the following steps can be considered

Extracting user and item attributes: first, the attributes of the user and item are extracted. This can be done using information such as the films the user has watched and the products they have purchased. Items can also be assigned attributes such as genre, director, actor, etc., and these attributes can be represented as entities on the knowledge graph.
Searching for entities on the knowledge graph: the next step is to search for entities on the knowledge graph. Entity retrieval can use natural language processing techniques and eigenexpression recognition techniques, and can also make use of existing knowledge graphs.
Obtaining embedded representations of user and item entities: transform user and item entities into embedded representations on the knowledge graph. This can be done using methods based on path traversal on the graph or using an adjacency matrix on the graph.
Performing mapping: finally, mapping users and items onto the knowledge graph using the embedded representations of users and items. This enables recommendations to take into account the relationships between entities on the knowledge graph.

However, there are several issues with mapping on the knowledge graph, e.g., for items and users that do not exist in the knowledge graph, it is not possible to search for entities. In addition, even for entities that exist in the knowledge graph, there may be insufficient information about the entity when retrieving the embedding representation, or it may not be possible to retrieve an appropriate embedding representation. Various innovations are needed to address these problems.

GNN-based solutions to the challenges of recommendation techniques in knowledge graphs

The solution to the challenges in knowledge graph recommendation techniques can be implemented using GNNs (Graph Neural Networks) as follows.

1. challenge: sparsity of knowledge graphs and missing data
Knowledge graphs are often sparse and may lack relationships. This sparsity and missing data have a negative impact on the performance of the recommendation system.

Solution with GNN:
– Use link prediction: by using GNNs, missing links (relationships) can be predicted. For example, potential relationships between users and items can be learnt, complementing missing data in the knowledge graph.
– Graph embedding: a GNN can learn the embedding of nodes (users and items) and use this embedding to predict missing links. For example, it can predict what relationships exist between users and items and make recommendations.

2. challenge: modelling complex relationships
Knowledge graphs have multiple types of nodes and different types of relationships, such as users, items and attributes. It is difficult to model these relationships properly and utilise them for recommendation.

Solution with GNN:.
– Handling multiple relationships: in GNNs, different weights can be assigned to different types of edges (e.g. ‘purchase’, ‘review’, etc.). This allows different types of relationships to be learnt separately and complex relationships to be modelled appropriately.
– Learning embeddings per relationship: by learning different embeddings for each relationship type, interactions between nodes with different relationships can be learnt explicitly separately. This enables complex interactions between users and items to be captured efficiently.

3. challenge: scalability for large data
Knowledge graphs can become very large, especially in large systems such as Netflix, which raises the issue of scalability. Constraints on computational resources and the growing amount of data make it difficult to learn and reason efficiently.

Solutions with GNN:.
– Graph sampling: it is common in GNNs to perform learning by sampling partial sub-graphs instead of the whole graph, which reduces memory consumption and computational complexity. In particular, sampling is used for efficient learning in large knowledge graphs.
– Distributed GNN: As the size of the graph increases, distributed learning can be used to distribute the computational load. In distributed GNNs, parts of the graph are processed by multiple computation nodes, improving scalability through parallel processing.

4. challenge: understanding users’ diverse intentions
Users may have different intentions towards the same item, and it is difficult to make recommendations that take this diversity into account. For example, one user may prefer an item as a ‘rating’, while another may value a ‘review’ or ‘share’.

Solutions by GNN:
– User embedding segmentation: using GNNs, it is possible to learn embeddings that distinguish between different user intentions. For example, by learning embeddings for different intentions (ratings, reviews, purchases, etc.) based on the user’s behavioural history, this enables recommendations to be tailored to the user’s diverse needs.
– Multi-task learning: multi-task learning, which simultaneously learns diverse user behaviours, can be applied to GNNs to simultaneously learn different types of recommendations (e.g. product recommendations, review recommendations) and improve their accuracy.

5. challenge: utilising time-series data
User behaviour and item ratings change over time. This requires recommendations to be based on time-series, but it is difficult to handle temporal factors properly.

Solution with GNN:
– Temporal GNNs: use time-aware GNNs (e.g. Temporal Graph Neural Networks described in “Temporal Graph Neural Network overview and implementation examples“) to handle changes in time series. This allows modelling how user behaviour changes over time and making appropriate recommendations in real-time.
– Dynamic graphs: where the knowledge graph changes over time, dynamic graph GNNs can be used to learn relationships that change over time. This enables recommendations based on temporal changes.

GNNs can effectively address the challenges of recommendation techniques in knowledge graphs (sparsity, complexity of relationships, scalability, diverse user intentions, time-series data) and utilise methods such as link prediction, embedded learning, sampling, multi-task learning and dynamic graphs can be used to build more accurate, flexible and scalable recommendation systems.

implementation example

An example of the implementation of a recommendation technique in a knowledge graph using a graph neural network (GNN) is shown below. The example shows how a GNN model can be built using the PyTorch Geometric (PyG) library and a simple knowledge graph to perform link prediction. Link prediction is the task of predicting missing relationships and is useful for predicting the relationship between users and items in a recommendation system.

1. install the required libraries: first, you need to install PyTorch Geometric. Install it with the following command.

pip install torch torchvision torchaudio
pip install torch-geometric

2. creating a simple knowledge graph: a knowledge graph consists of users, items and the relationships (purchased) that connect them. In this example, a simple knowledge graph is created.

import torch
from torch_geometric.data import Data

# Node information
nodes = torch.tensor([0, 1, 2, 3])  # 0: user1, 1: user2, 2: item1, 3: item2

# Edge information (connections between nodes)
edges = torch.tensor([
    [0, 2],  # user1 -> item1
    [1, 3],  # user2 -> item2
    [0, 3],  # user1 -> item2
    [1, 2],  # user2 -> item1
]).t().contiguous()

# Edge features (relational information).
edge_attr = torch.tensor([1, 1, 1, 1], dtype=torch.float)  # ‘1’ is a purchased relationship

# Create graphical data.
data = Data(x=torch.ones((4, 1)), edge_index=edges, edge_attr=edge_attr)

print(data)

3. defining the GNN model: a simple GNN model is then defined. This model uses a Graph Convolutional Network (GCN) layer to embed node features.

import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class GNNRecommender(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(GNNRecommender, self).__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)

    def forward(self, data):
        x, edge_index, edge_attr = data.x, data.edge_index, data.edge_attr
        x = F.relu(self.conv1(x, edge_index, edge_attr))
        x = self.conv2(x, edge_index, edge_attr)
        return x

# Model definition.
model = GNNRecommender(in_channels=1, hidden_channels=16, out_channels=2)

4. training the model: next, the model is trained. To make link predictions, binary cross-entropy is used as the loss function and the parameters are optimised based on the training data.

import torch.optim as optim
from torch_geometric.data import DataLoader

# Setting up for optimisation.
optimizer = optim.Adam(model.parameters(), lr=0.01)
loss_fn = nn.BCEWithLogitsLoss()

# training function
def train(model, data):
    model.train()
    optimizer.zero_grad()
    out = model(data)
    
    # For link prediction, predictions are made between edges.
    pred = out[data.edge_index[0]] * out[data.edge_index[1]]
    loss = loss_fn(pred, data.edge_attr)
    
    loss.backward()
    optimizer.step()
    return loss.item()

# Execution of training
for epoch in range(100):
    loss = train(model, data)
    print(f"Epoch {epoch+1}, Loss: {loss:.4f}")

5. obtaining recommendation results: after training, the model can be used to obtain recommendation results. Here, item recommendations are made for specific users.

def recommend(model, data, user_index):
    model.eval()
    with torch.no_grad():
        out = model(data)
        user_embedding = out[user_index]  # Embedding of specific users
        item_embeddings = out[2:]  # Embedding of items (from node 2 onwards)
        
        # Calculate similarity between user and item (cosine similarity)
        similarities = torch.matmul(user_embedding, item_embeddings.t())
        recommended_items = torch.argsort(similarities, descending=True)
        
    return recommended_items

# Item recommendation for user 1
recommended_items = recommend(model, data, user_index=0)
print(f"Recommended items for user 1: {recommended_items}")

6. RESULTS: In this example, a system is built to recommend items to users using GNNs. After training, the recommended items for user 1 are output.

Implementation description.

Data preparation: nodes (users, items) and edges (relations) of the knowledge graph are created.Using the Data class, graph data is prepared in PyG format.
GNN model: using the GCNConv layer to embed user and item features and learn relationships.
Training: train link prediction using binary cross-entropy loss.
Recommendation: after training, use user and item embedding to recommend the most suitable item for the user.

reference book

For more information on recommendation systems using knowledge graphs, see the following link.

ナレッジグラフを使った解釈可能な推薦システム

The English version is available here.こんにちは。メルカリで 8 月から機械学習エンジニアのインターンをしている @joisino_ です。インターンでは、ナレッジグラフを使った推薦システムをメルカリの

Reference books on recommendation techniques in knowledge graphs using GNNs (Graph Neural Networks) are described below.

1. ‘Graph Representation Learning’ by William L. Hamilton
– Abstract: A comprehensive guidebook on GNNs and graph representation learning. It builds on the technology associated with knowledge graphs, explaining the basic theory of GNNs and how to apply them to real-world problems.
– Contents: focuses on the basic concepts of GNNs, algorithms, implementation methods and applications to recommendation systems.

2. ‘Deep Learning on Graphs’ by Yao Ma and Jure Leskovec
– Description: describes the design and application of graph-based deep learning techniques, especially GNN-based models. It also touches on how to apply them to recommendation systems and knowledge graphs.
– Contents: starts with the theory of GNNs and presents implementation examples based on concrete applications and case studies.

3. ‘Recommender Systems: The Collaborative Filtering Approach’ by Paolo Cremonesi, Yehuda Koren, Roberto Turrin
– Abstract: Theoretical book on recommender systems in general, with special emphasis on collaborative filtering methods; recommended to be combined with other books on GNN-based recommender techniques.
– Contents: covers the basics and advanced techniques of recommendation systems and provides a foundation for understanding them in conjunction with GNNs.

4. ‘Graph Neural Networks: A Review of Methods and Applications’ by Zonghan Wu, Shirui Pan, et al.
– Abstract: This in-depth review paper on GNNs presents state-of-the-art techniques in the implementation of graph-based data analysis and recommendation systems.
– Contents: comprehensive learning about the architecture of GNNs, application areas and use cases in recommender systems.

5. ‘Hands-On Graph Neural Networks: build and train GNNs with Python and TensorFlow’ by Ankit Jain
– Abstract: This book teaches practical GNN implementations and provides concrete examples of how to apply GNNs to knowledge graphs and recommendation systems.
– Contents: you will learn the steps to actually implement a model of a GNN using Python and TensorFlow, while also understanding how to develop a recommendation system.

6. ‘Graph-Based Semi-Supervised Learning’ by Xiaoyang Li, Wei Wu
– Abstract: This book explains the use of graphs in semi-supervised learning and discusses the development of knowledge graphs and recommendation systems together with GNNs.
– Content: the book is rich in references for applying semi-supervised learning and graph-based methods to recommendation systems.