Overview, algorithms and implementation examples of position bias-corrected ranking.

Mathematics Machine Learning Artificial Intelligence Graph Data Algorithm Programming Digital Transformation Algorithms and Data structures Navigation of this blog

Overview of position bias-corrected rankings.

Position bias-corrected ranking is a method of creating rankings that more accurately reflect actual quality and popularity by correcting click and selection bias (bias) according to the display position of items in search results and product lists. This bias correction can correct the tendency for higher click rates to be displayed at the top and lower click rates to be displayed at the bottom.

Items in search results and listings are more likely to be clicked on if they appear higher up, and less likely to be clicked on if they appear lower down. This ‘position bias’ may not accurately reflect the actual quality or popularity of an item, and the purpose of position bias correction is to correct this bias and provide a ranking that reflects the true value of the item.

There are a number of specific methods for position bias correction, but the following methods are representative

1. click-through rate (CTR) normalisation: calculating the average click-through rate per display position and normalising the click-through rate of individual items on that basis. For example, if the top display position has a higher click-through rate, clicks on that position can be given lower weight than on other positions.

2. use learning algorithms: use machine learning algorithms to model positional bias based on click rate data, thereby estimating the inherent popularity or quality of each item. It will be common to use ranking models that specialise in position bias correction (e.g. Position Bias Correction models).

3. A/B testing: different display orders are randomly offered to users and the results are used to estimate position bias. This method allows the influence of display position to be isolated and the original evaluation of the item to be made.

Position bias-corrected ranking is an important technique for eliminating position bias that affects user selection behaviour and providing a ranking that reflects the actual value and popularity of an item, and this method can be used to improve user experience and achieve fair evaluation.

Algorithms associated with position bias-corrected ranking.

Algorithms related to position bias-corrected ranking are used to correct for click and selection bias due to display position and to assess the true popularity and quality of an item. The main algorithms associated with position bias correction are described below.

1. Position Bias Model (PBM): The Position Bias Model assumes that the click rate depends on the display position and corrects for position bias in the following steps

Feature: Normalise the observed click rate by the expected click rate per position.
Learning: a machine learning algorithm is used to model the relationship between display position and click rate. This estimates the bias for each position and calculates the click rate corrected for it.

2. Rank-Biased Model (RBM): The Rank-Biased Model is an extension of the Position-Biased Model that takes into account the impact of an item’s ranking on overall click behaviour.

Inputs: click data for an item and its display position.
Learning: modelling the probability of each item being clicked in the overall ranking and correcting the click rate for the context of the overall ranking.

3. gradient boosting decision trees (GBDT): GBDT is a machine learning algorithm that performs well for regression and classification problems and can also be applied to position bias correction.

Feature engineering: use display position, item characteristics, user behaviour history, etc. as features.
Learning: using the GBDT, a model to predict the click rate is trained, which improves the prediction accuracy of the click rate by taking into account positional bias as a feature.

4. Random Forest: Random forests are also used to correct for positional bias. Ensemble learning using multiple decision trees effectively corrects for bias.

Feature selection: use the display position and other relevant features as input.
Learning: build a large number of decision trees to create a predictive model that accounts for position bias.

5. deep learning: deep learning approaches are particularly useful when large amounts of data are available. For example, recurrent neural networks (RNNs) and transformer models can be used.

Feature extraction: using the user’s click behaviour history and item characteristics as input.
Learning: using deep learning models to predict click rates corrected for location bias.

6. Pairwise Learning: Pairwise learning is a method for learning ranking relationships between pairs of items. Pairs of items displayed in different positions are compared in order to eliminate the influence of display position.

Data preparation: pairs of items in different display positions are created and the one clicked between the pairs is taken as the positive example.
Learning: compare each pair and train a ranking model that corrects for position bias.

Application of position bias-corrected ranking.

There are a wide range of application cases for position bias-corrected ranking. Specific application cases are described below.

1. search engines:
Examples: Google, Bing
Summary: Search engines display a large number of search results when a user enters a search query. The results that appear at the top tend to have a higher click-through rate.
How applied: a positional bias correction algorithm is used to remove positional bias from the click data and assess the true relevance of the results. This ensures that the most relevant results for users are displayed at the top.

2. e-commerce sites:
Case studies: Amazon, Rakuten Market
Summary: The order in which product listings are displayed has a significant impact on sales. Products displayed at the top of the list have a higher click-through rate and generate more sales.
How it is applied: A positional bias correction algorithm is used to analyse user click behaviour and create rankings that more accurately reflect the popularity and quality of the products themselves. This makes it easier for users to find the products they are looking for.

3. ad serving platforms:.
Examples: Google Ads, Facebook Ads
Summary: Click rates vary greatly depending on where the adverts are displayed. Ads displayed at the top are more likely to be clicked on, while ads displayed at the bottom are less likely to be clicked on.
How it is applied: By analysing ad click data, biases caused by display position are compensated for. This provides valuable impressions for advertisers and accurately assesses the effectiveness of ads.

4. news feeds and social media:
Examples: Facebook, Twitter
Summary: Posts that appear at the top of news feeds and timelines have higher click and engagement rates.
How it is applied: use an algorithm that analyses user engagement data and compensates for bias due to display position. This ensures that content that is important to users is displayed at the top and optimises engagement.

5. video streaming services:
Examples: YouTube, Netflix
Summary: The order in which videos are displayed influences the number of views. Videos displayed at the top are more likely to be viewed, while those at the bottom are less likely to be viewed.
How it is applied: Corrects for positional bias based on viewing data and creates a ranking that reflects the actual popularity and quality of the video. This enables efficient recommendation of videos that users are interested in.

6. recommendation systems:
Examples: Spotify, Pandora
Summary: In music and podcast recommendations, the recommended position also influences the click rate and playback frequency.
How it is applied: analysing users’ playback history and click data and compensating for location bias to recommend content of value to users.

7. job sites:
Examples: LinkedIn, Indeed
Summary: The order in which jobs are displayed influences the number of applications received. Jobs that appear at the top are more likely to be applied for, while those at the bottom are less likely to be applied for.
How it is applied: analysed job click and application data to correct for positional bias. This provides the most relevant job information for jobseekers.

These examples illustrate the importance of improving the user experience through location bias correction and accurately reflecting the actual value of items and content. By implementing location bias correction, information and products that are useful to users can be provided more effectively.

Example implementation of position bias-corrected ranking.

Below is an example of a basic position bias correction implementation using Python. This example shows a technique for correcting the click-through rate (CTR) based on the display position.

Dataset preparation: first, sample data is prepared. This data includes the display position and click status of each item.

import pandas as pd

# Create sample data.
data = {
    'query': ['query1', 'query1', 'query1', 'query2', 'query2', 'query2', 'query3', 'query3', 'query3'],
    'item_id': [1, 2, 3, 1, 2, 3, 1, 2, 3],
    'position': [1, 2, 3, 1, 2, 3, 1, 2, 3],
    'clicked': [1, 0, 0, 0, 1, 0, 1, 0, 1]
}

df = pd.DataFrame(data)

Calculation of CTR for position bias correction: the click rate (CTR) per position is then calculated to correct for position bias.

# Calculate the click rate for each position.
position_ctr = df.groupby('position')['clicked'].mean().to_dict()

# Normalised click rate for position bias correction
df['corrected_ctr'] = df.apply(lambda x: x['clicked'] / position_ctr[x['position']], axis=1)

Recalculation of corrected ranking: re-ranking for each query based on the corrected CTR.

# Recalculate the ranking by the corrected click rate for each query.
df['rank'] = df.groupby('query')['corrected_ctr'].rank(ascending=False, method='first')

# View results
print(df.sort_values(by=['query', 'rank']))

Full implementation example: below is the complete code integrating the above steps.

import pandas as pd

# Create sample data.
data = {
    'query': ['query1', 'query1', 'query1', 'query2', 'query2', 'query2', 'query3', 'query3', 'query3'],
    'item_id': [1, 2, 3, 1, 2, 3, 1, 2, 3],
    'position': [1, 2, 3, 1, 2, 3, 1, 2, 3],
    'clicked': [1, 0, 0, 0, 1, 0, 1, 0, 1]
}

df = pd.DataFrame(data)

# Calculate the click rate for each position.
position_ctr = df.groupby('position')['clicked'].mean().to_dict()

# Normalised click rate for position bias correction
df['corrected_ctr'] = df.apply(lambda x: x['clicked'] / position_ctr[x['position']], axis=1)

# Recalculate the ranking by the corrected click rate for each query.
df['rank'] = df.groupby('query')['corrected_ctr'].rank(ascending=False, method='first')

# View results
print(df.sort_values(by=['query', 'rank']))

In this example implementation, the following steps are taken

Data preparation: prepare data on query, item ID, display position and click status.
Calculate click rate: calculate the click rate for each position.
Correcting for positional bias: normalise the click rate by positional bias.
Re-ranking: re-ranking based on corrected click rates.

This basic implementation shows a method for correcting for positional bias for a simple data set. In real systems, it is common to use more complex models and additional features, and may also use metrics other than click rate (e.g. conversion rate).

Challenges and countermeasures for position bias-corrected ranking.

There are several challenges to position bias-corrected ranking, and we discuss how to address these challenges.

1. data bias:

Challenge: large amounts of data are needed for location bias correction, but if the data is biased towards a particular query or item, the accuracy of the model may be compromised.

Solution:
Data augmentation: introduce new data collection methods and ensure data diversity.
Data augmentation: use data augmentation techniques to increase data for queries or items with low data.
Cross-data learning: utilise transfer learning to find common patterns between different queries or items.

2. real-time processing difficulties:

Challenges: performing positional bias correction in real-time can be difficult due to computational resources and algorithmic complexity.

Solution:.
Pre-computation and caching: improve real-time performance by pre-computing the position bias correction in the back-end and caching the results.
Use of approximation algorithms: employ approximation algorithms or on-line learning algorithms that allow real-time processing.

3. dynamic content changes:

Challenge: if content is updated frequently, the position bias correction model needs to be updated accordingly.

Solution:
Continuous model updating: re-train the model regularly to reflect bias corrections based on the latest data.
Online learning: introduce online learning methods that sequentially update the model as new data arrive.

4. selection of evaluation metrics:

Challenge: it is sometimes difficult to select appropriate metrics to assess the effectiveness of position bias correction.

Solution:
Use a variety of evaluation indicators: evaluate using multiple indicators such as conversion rate, engagement rate, etc., in addition to click rate.
A/B testing: conduct A/B tests to evaluate actual user behaviour in order to compare with and without position bias correction.

5. diversity of user behaviour:

Challenge: different user groups exhibit different behaviour patterns, so one correction model may not be applicable to all users.

Solution:
Personalised models: build different bias correction models for different user segments.
Context-aware models: build models that take into account the context of the user (e.g. past behaviour, time of day, device).

6. computational resource constraints:

Challenge: complex correction algorithms are computationally expensive and may be difficult to implement in resource-constrained environments.

Solution:
Adopt efficient algorithms: select computationally efficient algorithms and optimise performance.
Use cloud computing: utilise cloud services that can provide the required computing resources in a scalable manner.

Reference Information and Reference Books

For general machine learning algorithms including search algorithms, see “Algorithms and Data Structures” or “General Machine Learning and Data Analysis.

“Algorithms” and other reference books are also available.

“Recommender Systems: The Textbook“ by Charu C. Aggarwal

“Learning to Rank for Information Retrieval and Natural Language Processing“ by Hang Li

“Evaluating Learning Algorithms: A Classification Perspective“ by Nathalie Japkowicz, Mohak Shah

“Deep Learning“ by Ian Goodfellow, Yoshua Bengio, Aaron Courville

“Introduction to Information Retrieval“ by Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze