Overview of diversity promotion ranking and examples of algorithms and implementations.

Mathematics Machine Learning Artificial Intelligence Graph Data Algorithm Programming Digital Transformation Algorithms and Data structures Navigation of this blog

Overview of diversity promotion rankings

Diversity-Enhanced Ranking is a ranking method that aims to display diverse items higher in search results and recommendation systems, rather than simply on the basis of relevance and popularity. This gives users access to a variety of options, increasing satisfaction and increasing opportunities for new discoveries.

Traditional ranking algorithms typically determine the top results based on relevance, click-through rate and popularity for a user’s query, but this method can lead to a concentration of items of the same type or genre at the top, limiting the options available to users. Therefore, diversity promotion rankings have the following objectives

  • Improve the user experience: make it easier for users to discover different types of content and products, thereby increasing satisfaction.
  • Reduce bias: correct rankings that tend to be biased towards certain items or categories and achieve overall balance.
  • Healthier markets: exposure to a diverse range of items creates opportunities for new entrants and minor products and content.

Approaches to ranking that promote diversity include.

1. heuristic methods: a simple rule-based approach that adjusts rankings to include a certain number of specific categories or types. Examples include ensuring that the top 10 results always include at least three different categories.

2. re-ranking approach: re-evaluate and re-rank the initial ranking in terms of diversity. The main steps of re-ranking are as follows.

  • Generate an initial ranking.
  • Calculate a diversity score for the item.
  • adjust the ranking to take into account the diversity score.

3. the Multi-Armed Bandit model: dynamically display different items to promote diversity while balancing exploration and utilisation. Key approaches include real-time adjustments to the items displayed based on user responses.

4. the Maximum Coverage Problem: optimising rankings to cover a variety of categories. Algorithms used include selecting items to maximise category coverage, e.g. using greedy methods.

Algorithms related to diversity promotion ranking

The algorithms associated with the diversity promotion ranking aim to strike a balance between diversity and relevance. The main algorithms and their overview are presented below.

1. Maximal Marginal Relevance (MMR): the MMR described in “Overview of Maximum Marginal Relevance (MMR) and examples of algorithms and implementations” is an algorithm that simultaneously maximises relevance and diversity by considering both already selected items and newly selected items and is represented by the following equation.

MMR=argmaxDiSS[λSim(Di,Q)(1λ)maxDjSSim(Di,Dj)]

Here, the following is shown.

  • S: all candidate items
  • S’: set of already selected items
  • Q: query
  • λ: parameter for balancing relevance and diversity
  • Sim(Di,Q): similarity between Di and query Q
  • Sim(Di,Dj): similarity between item Di and item Dj

2. determinantal point process (DPP): the DPP becomes a stochastic model that promotes diversity in the process of randomly selecting items; the DPP is suitable for maximising coverage and heterogeneity of sets. An overview is as follows.

  • Objective: to maximise the probability of high diversity in selecting a subset of items.
  • probability calculation: the probability of a particular item set being selected is represented by the determinant of the matrix of feature vectors of that set.

3. submodular function optimisation: submodular function optimisation is an approach that uses greedy methods to promote diversity. The submodular function is defined as a function with the property of decreasing reward with the size of the set. The main steps are as follows.

  • Set initialisation: start with an empty set.
  • Greedy addition: at each step, the most beneficial items are added to the current set.
  • Stop condition: repeat until a specified set size is reached or until there is little or no benefit from the addition.

4. cluster-based re-ranking: uses clustering techniques to promote diversity by dividing items into different clusters and selecting items equally from these clusters. The main steps are as follows

  • Clustering: dividing items into clusters based on similarity.
  • Intra-cluster ranking: ranking items within each cluster based on their relevance score.
  • Inter-cluster selection: balanced selection of items from each cluster.

5 Latent Factor Diversification (LFD): uses latent factor modelling to generate a diversity-aware ranking. This method is widely used, especially in recommendation systems.The main methods of LFD are as follows.

  • Latent factor model: decomposes item and user characteristics into latent factors and predicts their relevance.
  • Diversity score: calculates the distance between items in the latent factor space and evaluates diversity.
Examples of the application of diversity promotion rankings

The following are examples of the application of diversity promotion rankings.

1. search engines:
Example: search engines such as Google and Bing display pages on different related topics to promote diversity.
Description: providing users with different sources of information by mixing different types of results, e.g. news, blogs, product reviews, official websites, etc., instead of the same type of page (e.g. only news articles, only product pages).
Algorithms: Maximal Marginal Relevance (MMR) and Cluster-Based Re-ranking are used.

2. online shopping sites:
Example: e-commerce sites such as Amazon and Rakuten consider diversity in their product recommendation lists.
Description: instead of recommending only products from the same category or brand, products from different brands and categories are mixed to provide users with a wider range of choices.
Algorithm: Determinantal Point Process (DPP) and Submodular Function Optimisation are used.

3. video streaming services:
Example: Netflix and YouTube consider diversity in video recommendations.
Description: When recommending relevant videos based on a user’s past viewing history, they do not recommend only videos of the same genre or series, but also videos of different genres and styles.
Algorithm: Latent Factor Diversification (LFD) and MMR are used.

4. news aggregators:
Example: news aggregators such as Google News and Yahoo News display diversity-aware articles.
Description: mixing articles from different perspectives and sources when displaying news articles on a particular topic to provide a balanced presentation of information.
Algorithms: Cluster-Based Re-ranking and Submodular Function Optimisation are used.

5. music streaming services:
Example: Spotify and Apple Music consider diversity in playlist recommendations.
Description: when recommending songs based on a user’s past listening history, they include songs from different artists and genres, rather than only recommending songs from the same artist or genre.
Algorithm: DPP and MMR are used.

Specific examples of their application include.

A. Netflix: Netflix uses the following methods to provide a diverse range of content to its users
Personalised recommendations: recommends relevant content based on the user’s viewing history, but also mixes in different genres and types of content for diversity.
A/B testing: conduct A/B testing to verify the effectiveness of the diversity algorithm and compare user engagement and satisfaction.
Hybrid models: use hybrid models that combine multiple algorithms to balance diversity and relevance.

B. Amazon: on Amazon, the following methods are used to promote diversity in product recommendation lists
Relevant and new products: include relevant products as well as new products and products from different categories in the recommendation lists.
Use of customer reviews: display customer reviews from different perspectives to give users access to a greater variety of information.
Recommendation systems: use latent factor models to analyse user preferences and make recommendations that take diversity into account.

C. Google search: Google search uses the following methods to provide diverse information sources for user queries
Re-ranking search results: re-evaluating search results in terms of diversity and mixing different types of pages (news, blogs, videos, etc.).
Tailoring based on user intent: analysing user search intent and providing results from diverse sources accordingly.
Use of MMR: use MMR to rank search results, balancing relevance and diversity.

Examples of diversity promotion ranking implementations

The following section describes a simple re-ranking method using Python. In this example, a method is implemented to generate a ranking while taking into account the diversity of items. Specifically, we show how to balance relevance and diversity using Maximal Marginal Relevance (MMR).

1. data preparation: first, a dataset of items is prepared. Each item contains a relevance score and a category.

import numpy as np
import pandas as pd

# Creation of sample data
data = {
    'item_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'category': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'B', 'C', 'A'],
    'score': [0.9, 0.85, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4]
}
df = pd.DataFrame(data)

# Initial ranking (in order of score)
initial_ranking = df.sort_values(by='score', ascending=False)

2. implementation of Maximal Marginal Relevance (MMR): the MMR algorithm is then implemented. This algorithm selects items based on relevance and diversity.

def calculate_similarity(vec1, vec2):
    """Function to calculate similarity between simple vectors (dot product)."""
    return np.dot(vec1, vec2)

def mmr(documents, query, lambda_param=0.5):
    selected_docs = []
    remaining_docs = documents.copy()

    while remaining_docs:
        mmr_scores = []
        for doc in remaining_docs:
            sim_with_query = calculate_similarity(doc['vector'], query['vector'])
            sim_with_selected = max([calculate_similarity(doc['vector'], selected_doc['vector']) for selected_doc in selected_docs], default=0)
            mmr_score = lambda_param * sim_with_query - (1 - lambda_param) * sim_with_selected
            mmr_scores.append(mmr_score)
        
        best_doc_index = np.argmax(mmr_scores)
        best_doc = remaining_docs.pop(best_doc_index)
        selected_docs.append(best_doc)
        
    return selected_docs

# Prepare vectors (temporary vectors are used here).
initial_ranking['vector'] = initial_ranking['score'].apply(lambda x: np.array([x, 1 - x]))

# Prepare query vector (here a temporary vector is used)
query = {'vector': np.array([1, 0])}

# MMR application
ranked_docs = mmr(initial_ranking.to_dict('records'), query, lambda_param=0.7)

# results display
ranked_df = pd.DataFrame(ranked_docs)
print(ranked_df[['item_id', 'category', 'score']])

3. display the results: display the results of applying the MMR algorithm.

print(ranked_df[['item_id', 'category', 'score']])

When the code is executed, the MMR algorithm is used to display ranking results that balance relevance and diversity.

The key implementation points are as follows.

  1. Data pre-processing: prepare a dataset containing scores and categories for each item and generate an initial ranking.
  2. Similarity calculation: implement a function to calculate the similarity between items. Here, a simple dot product is used.
  3. Implement MMR algorithm: use the MMR algorithm to select items so as to balance relevance and diversity.
  4. Parameter tuning: by tuning lambda_param, the relevance and diversity weights can be adjusted.

Directions for improvement and extension include

  • More advanced similarity calculation: more appropriate similarity calculation methods can be used, such as cosine similarity and Euclidean distance, in addition to dot product.
  • Parameter optimisation: it is important to tune the optimal values of lambda_param based on the data.
  • Applying real data: use real user and item data to build more practical ranking systems.
Challenges and measures for diversity promotion ranking.

Diversity promotion ranking main challenges and measures to address them.

Challenges:

  1. Trade-off between relevance and diversity: emphasising diversity may sacrifice relevance in ranking and may include items that users are not interested in.
  2. Increased computational costs: diversity-aware ranking algorithms can be computationally expensive. Execution time is particularly problematic for large datasets.
  3. Lack of understanding of user needs: it is difficult to accurately identify the diverse needs of users, and diversity promotion based on incorrect assumptions can be counterproductive.
  4. Data bias: if training data is biased, diversity-aware algorithms may also produce biased results.
  5. Consistency of user experience: overemphasising diversity can lead to a lack of consistency and confusion in user expectations.

Solution:

1. balancing relevance and diversity: use an algorithm such as Maximal Marginal Relevance (MMR) to balance relevance and diversity; MMR optimises a parameter (λ) that adjusts relevance and diversity weights. An example implementation is as follows.

def mmr(documents, query, lambda_param=0.5):
    # ... MMR implementation ...
    return selected_docs

2. use efficient algorithms: use algorithms with low computational cost or streamline existing algorithms. For example, pre-computed similarity matrices can be used to reduce the real-time computational load. Caching and precomputation can be as follows.

def precompute_similarities(documents):
    # Compute and store similarities
    return similarity_matrix

3. user needs feedback collection: user feedback is collected and used to adjust algorithms; A/B testing and user studies are conducted to assess responses to user diversity. Example implementations include.

def get_user_feedback(selected_docs):
    # Collect user feedback
    return feedback_scores

4. data bias mitigation: implement techniques to detect and mitigate biases in data sets. For example, make modifications for data augmentation and fairness. Examples of implementations include.

def mitigate_bias(data):
    # Bias mitigation techniques
    return unbiased_data

5. maintain consistency in the user experience: provide a consistent user experience, balancing diversity and relevance. Provide personalised diversity, taking into account users’ past behaviour and preferences. Example implementations include.

def personalized_diversity_ranking(user_profile, documents):
    # Personalized ranking with diversity
    return ranked_docs

Taking the above as a concrete example, the balancing of diversity and relevance (MMR) is as follows.

import numpy as np

def calculate_similarity(vec1, vec2):
    return np.dot(vec1, vec2)

def mmr(documents, query, lambda_param=0.5):
    selected_docs = []
    remaining_docs = documents.copy()

    while remaining_docs:
        mmr_scores = []
        for doc in remaining_docs:
            sim_with_query = calculate_similarity(doc['vector'], query['vector'])
            sim_with_selected = max([calculate_similarity(doc['vector'], selected_doc['vector']) for selected_doc in selected_docs], default=0)
            mmr_score = lambda_param * sim_with_query - (1 - lambda_param) * sim_with_selected
            mmr_scores.append(mmr_score)
        
        best_doc_index = np.argmax(mmr_scores)
        best_doc = remaining_docs.pop(best_doc_index)
        selected_docs.append(best_doc)
        
    return selected_docs

# サンプルデータ
documents = [{'id': 1, 'vector': np.array([0.9, 0.1])}, {'id': 2, 'vector': np.array([0.85, 0.15])}]
query = {'vector': np.array([1, 0])}

# MMR適用
ranked_docs = mmr(documents, query, lambda_param=0.7)
print([doc['id'] for doc in ranked_docs])
Reference Information and Reference Books

For general machine learning algorithms including search algorithms, see “Algorithms and Data Structures” or “General Machine Learning and Data Analysis.

Algorithms” and other reference books are also available.

1. “Introduction to Information Retrieval

Authors: Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze
Publisher: Cambridge University Press
Why it’s useful: The theory and implementation of information retrieval, including diversity.

Covers theory and implementation of information retrieval, including diversity

Systematically covers relevance, novelty, ranking models (e.g., MMR), evaluation metrics, etc.

2. “Recommender Systems: An Introduction”.

Authors: Dietmar Jannach, Markus Zanker, Alexander Felfernig, Gerhard Friedrich
Publisher: Cambridge University Press
Why it’s useful: Collaborative filtering, content-based filtering

Full of practical recommender algorithms, including collaborative filtering, content-based filtering, and diversity enhancement methods

Includes chapters dedicated to diversity, novelty, and serendipity. 3.

3. “Mining of Massive Datasets”.

Authors: Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman
Publisher: Cambridge University Press
Why it’s useful: The Web is a great place for web search, recommendation, and social network analysis.

Introduces diversity-aware ranking methods in the context of web search, recommendation, and social network analysis

Also covers theories related to MMR and submodular optimization.

4. “Evaluation of Recommender Systems”.

Editors: Guy Shani, Asela Gunawardana
Publisher: Springer
Why it’s useful: The book is a good introduction to the theory of recommender systems.

Theoretical and practical coverage of diversity, novelty, and serendipity in recommender evaluation criteria

In addition to metrics such as NDCG and MAP, it also details the introduction of indicators related to diversity. 5.

5. “Recommender Systems Handbook” (2nd Edition)

Editors: Francesco Ricci, Lior Rokach, Bracha Shapira
Publisher: Springer
Why it’s useful: Covers

Covers everything from academic research to industrial applications

Chapter 11 “Beyond Accuracy: Other Aspects of Recommender Systems” details diversity and novelty.

コメント

タイトルとURLをコピーしました