Overview of metric MDS and examples of algorithms and implementations

Machine Learning Artificial Intelligence Digital Transformation Deep Learning Information Geometric Approach to Data Mathematics Navigation of this blog
Metric MDS (Metric Multidimensional Scaling, Metric Multidimensional Scaling) Overview

Metric Multidimensional Scaling (Metric MDS) is a method for embedding multidimensional data in a low-dimensional space and visualising similarities and distances between data, so that given a distance (or similarity) between data, the points that represent it as faithfully as possible are It will find a low-dimensional space in which to place the points so as to represent them as faithfully as possible.

The basic concept of metric MDS is to map high-dimensional or abstract data into a lower dimension (usually 2D or 3D) for visual comprehension, with the input being a distance matrix between objects \( D = \{d_{ij}\} \) (\( d_{ij} \): object \(i)\) and \(j)\)), and the output is a point configuration \(X = \{x_{i}\} \) in a \((k)\) dimensional space (usually \((k = 2)\ or \ (k = 3) \)), which approximates the Euclidean distance between the configured points to the original distance \((d_{ij})\).

The basic algorithm consists of the following steps.

  1. Preparing the distance matrix: from the input data, calculate the distances of all pairs and construct the distance matrix \( D \).
  2. Conversion from distance to inner product matrix: Calculate the Gram matrix (inner product matrix) \( B \) using \( D \). The Gram matrix is calculated using the following formula
    \[
    B = -\frac{1}{2} J D^2 J\frac{1}{2}
    J = I – \frac{1}{n} \mathbf{1}\mathbf{1}^\top(centred matrix)\]
  3. Eigenvalue decomposition: eigenvalue decomposition of \( B \) to obtain eigenvalues and eigenvectors, select the \( k \) with the largest eigenvalues and calculate new coordinates using the corresponding eigenvectors.
  4. Embedding in low-dimensional space: The point \( X \) in \( k \) dimensional space is calculated as follows.
    \[
    X = V_k \Lambda_k^{1/2}\
    V_k : Matrix of eigenvectors\\
    \Lambda_k : the\ diagonal\ matrix\ with\ the\ chosen\\ k\ eigenvalues\ in\ its\ diagonal\ components\]
  5. Interpreting the results: patterns and clusters are analysed by interpreting the arrangement of the embedded points.

The characteristics of metric MDS refer to the case where the distance matrix is an exact numerical distance (e.g. Euclidean distance) (in non-metric MDS, only the ordinal relationship is important for the distance) and the goal function of metric MDS is to minimise the squared difference between the original distance and the distance in the embedded space.
\{Stress}
\text{Stress} = \sum_{i < j} \left( d_{ij} – ||x_i – x_j|| \right)^2
\}]

The benefits and challenges will be as follows

Advantages:

  • Conversion of high-dimensional data into easily interpretable forms.
  • Non-linear relationships can be partially captured.

Challenges:

  • Computational cost depends on the size of the distance matrix (high computational load for large data).
  • Results are significantly affected if distance data is imprecise.
implementation example

An example implementation of metric multidimensional scaling (Metric MDS) using Python is given below. The scikit-learn library is used here.

1. install the required libraries

pip install scikit-learn numpy matplotlib

2. implementation code

import numpy as np
from sklearn.manifold import MDS
import matplotlib.pyplot as plt

# Creation of distance matrices (e.g. Euclidean distance)
distance_matrix = np.array([
    [0, 2, 5, 9],
    [2, 0, 3, 8],
    [5, 3, 0, 6],
    [9, 8, 6, 0]
])

# Dimensionality reduction through metric MDS.
mds = MDS(n_components=2, dissimilarity='precomputed', random_state=42)
embedding = mds.fit_transform(distance_matrix)

# Display of embedded 2D coordinates.
print("2D Coordinates:")
print(embedding)

# Visualisation of results
plt.figure(figsize=(8, 6))
plt.scatter(embedding[:, 0], embedding[:, 1], color='blue')

# Label each point.
labels = ['A', 'B', 'C', 'D']
for i, label in enumerate(labels):
    plt.text(embedding[i, 0], embedding[i, 1], label, fontsize=12, ha='right')

plt.title('Metric MDS Visualization')
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.grid()
plt.show()

3. code description

  • Preparing a distance matrix: as an example, a distance matrix between four data points was created.
  • Using MDS class:
    • n_components=2: embed the data in a 2D space.
    • dissimilarity=‘precomputed’: setting to use the distance matrix as input.
  • fit_transform method: transforms the distance matrix to low-dimensional coordinates.
  • Visualisation: plot the embedded 2D space using Matplotlib. Each point is labelled and identified.

4. run results: the output ‘2D Coordinates’ shows where each data point is located in 2D space and the plot visualises the relative distances between the data.

5. application examples

Example of using a dataset: for example, a metric MDS could be applied using a sample dataset from scikit-learn.

from sklearn.datasets import load_iris
from sklearn.metrics import pairwise_distances

# Load the Iris dataset.
iris = load_iris()
data = iris.data
labels = iris.target

# Calculate distance matrix (Euclidean distance)
distance_matrix = pairwise_distances(data, metric='euclidean')

# Apply metric MDS
mds = MDS(n_components=2, dissimilarity='precomputed', random_state=42)
embedding = mds.fit_transform(distance_matrix)

# Plotting the results
plt.figure(figsize=(8, 6))
scatter = plt.scatter(embedding[:, 0], embedding[:, 1], c=labels, cmap='viridis', s=50)
plt.colorbar(scatter, label='Target Label')
plt.title('Metric MDS on Iris Dataset')
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.grid()
plt.show()
Application examples

Metric Multidimensional Scaling (Metric MDS) is widely used to understand patterns and structures in high-dimensional data and has the following applications

1. marketing and customer segmentation

Case study: similarity between customers is calculated as a distance matrix based on customer purchase data. Visualised in 2D space using metric MDS to identify segments (clusters).

Application example: group customers with similar buying habits in supermarket and e-commerce data. Visualise the distribution of customers in graphs and develop targeting strategies for specific products.

2. analysis of similarities between products

Example: based on product functions and features, similarity is calculated as a distance matrix, embedded in a two-dimensional space using MDS to visualise the competitive relationship and positioning between products.

Application example: comparison of smartphone features (battery life, camera performance, price, etc.). Based on the results, the positioning of competing products is analysed.

3. questionnaire data analysis in psychology and social sciences.

Example: calculation of distance matrices based on people’s answers (e.g. personality assessments, satisfaction surveys), visualisation in MDS to analyse similarities and patterns between respondents.

Application examples: assessment of personality traits (e.g. Big Five personalities). Visualisation of perceptions and cultural differences between groups.

4. analysis of genetic data

Example: calculating distances between genes (e.g. evolutionary similarity) and using MDS to display relationships between genes in low dimensions.

Application examples: creation of phylogenetic trees between species. 4. classification and diagnosis of diseases based on gene expression data.

5. social network analysis

Example: representation of relationships between users (message exchanges, number of follows, etc.) as distance matrices, using MDS to embed network structures in low-dimensional space.

Application examples: influencer analysis in social media. 5. visualisation and interpretation of community structures.

6. geographical information analysis.

Example: distance matrices based on traffic distances and travel times between cities; 2D maps created with MDS to visualise relationships between cities.

Application examples: design of new transport networks (railways and roads). Analysis of logistical efficiency and proximity between cities.

7. clustering of documents and texts

Example: calculating the similarity between documents (TF-IDF or cosine similarity) and using it as a distance matrix; visualising the distribution of documents using MDS.

Application examples: grouping news articles and research papers by subject. Improvement of search engines and recommendation systems.

8. medical data visualisation

Example: calculation of similarity of symptoms and diagnoses between patients as a distance matrix; visualisation of the distribution of patient groups in MDS.

Application examples: identification of disease subtypes (e.g. cancer classification). Trend analysis of patient treatment outcomes.

9. music and media recommendation systems

Example: calculate similarities based on feature vectors of music and films and create distance matrices; generate 2D maps with MDS to visualise relationships between contents.

Application example: personalised recommendations based on user preferences.

10. citation relationships in academic literature

Example: citation relations between academic papers and co-authorship networks are converted into a distance matrix; visualise the distribution with MDS and explore topics in the research field.

Application example: trend analysis of academic fields. Understanding relationships in research communities.

reference book

Reference books on Metric Multidimensional Scaling (Metric MDS) and related multidimensional data analysis methods are described.

1. an introduction to comprehensive statistical analysis and multidimensional scaling
An Introduction to Applied Multivariate Analysis with R

2. books specialising in practical analysis methods
A Handbook of Statistical Analyses Using R

3. for those who want to learn the mathematical background in depth
Introduction to Mathematical Statistics

4. specialist book on multidimensional data analysis
Multidimensional Data Analysis and Data Mining, Black Book

5. famous book
Modern Multidimensional Scaling: Theory and Applications
– Authors: Ingwer Borg, Patrick J.F. Groenen
– Publisher: Springer
– Description: A classic book that comprehensively explains MDS from theory to applications. It provides a wealth of mathematical background and application examples.
– Features: covers the latest research from a global perspective.

6. data visualisation practice books
Data Visualization: Principles and Practice
– Author: Alexandru C. Telea
– Publisher: CRC Press
– Abstract: Provides practical explanations of data visualisation methods, including MDS. Useful for learning about visualisation techniques in general.

7. multidimensional analysis using Python
The Python Data Science Handbook.
– Author: Jake VanderPlas
– Publisher: O’Reilly Media
– Summary: Provides an overview of data science using Python, including examples of MDS implementations using Scikit-learn for practical use.

8. application-focused books
Mastering Marketing Data Science: A Comprehensive Guide for Today’s Marketers

Online resources.
Scikit-learn official documentation
– A good reference for implementing MDS.

コメント

Exit mobile version
タイトルとURLをコピーしました