Overview of SMACOF(Scaling by Majorizing a Complex Function)
SMACOF is a type of MDS described in ‘Multidimensional Scaling (MDS)’ and is an algorithm for placing data in a low-dimensional space based on distance information, which is particularly effective when dealing with non-linear data or approximate distance information. It is an approach that is particularly effective when dealing with non-linear data and approximate distance information.
MDS is a method of visualisation by embedding distance information between high-dimensional data points in a low-dimensional space (usually 2D or 3D), with the aim of representing the structure of the data in a lower dimension while maintaining the ‘fidelity’ of the distances, and SMACOF is an iterative optimisation algorithm for efficient computation of MDS.
The main features of SMACOF include
-
- Minimisation of the stress function: SMACOF iteratively minimises the stress function (a function representing the difference between the distance between the placed points and the original distance).
General form of stress function:
\[
\sigma(X) = \sum_{i<j} w_{ij} \left( d_{ij} – \delta_{ij} \right)^2
\] – \( \delta_{ij} \): distance in the original higher-dimensional space (observed value)
– \( d_{ij} \): distance in the lower dimensional space after embedding
– \( w_{ij} \): weight indicating the importance of the distance
- Minimisation of the stress function: SMACOF iteratively minimises the stress function (a function representing the difference between the distance between the placed points and the original distance).
- Global optimisation: while ordinary MDS may fall prey to locally optimal solutions, SMACOF uses ‘global optimisation’ techniques to iteratively seek the optimal solution.
- Accelerated convergence: SMACOF uses a specific ‘global convergence criterion’ to achieve convergence efficiently.
The flow of the SMACOF algorithm is as follows
- Initialisation: initial placement is set at random or in a predefined manner.
- Evaluate stress function: compute the value of the stress function at the current placement.
- Update placement: update the placement to minimise the stress function.
- Convergence decision: if the change in the stress function falls below a certain threshold, the process is finished.
- Result output: outputs the coordinates of the data in low-dimensional space.
The advantages and disadvantages of SMACOF are as follows
- Advantages:
- Can be used for non-linear data.
- Highly accurate low-dimensional representation due to iterative optimisation.
- Fidelity-oriented embedding of distances can be obtained.
- Disadvantages:
- Computationally expensive (especially for large data).
- May depend on initial conditions.
- Low quality of distance data does not yield a suitable placement.
SMACOF is essentially based on MDS, but the following derivatives have been proposed
- Weighted SMACOF: introduces a weighting for the importance of the distances.
- Multivariate SMACOF: a version that deals with several distance matrices simultaneously.
- Robust SMACOF: Improved tolerance to outliers and noise.
SMACOF is a powerful tool for dimensionality reduction and understanding the structure of data, and is particularly effective in data analysis where distance information is important.
Implementation example
Below is an example implementation of SMACOF in Python, SMACOF can be realised using the manifold.MDS class in the SciKit-Learn library.
SMACOF implementation example
import numpy as np
from sklearn.manifold import MDS
import matplotlib.pyplot as plt
# Sample distance matrix (symmetric matrix)
distance_matrix = np.array([
[0.0, 1.0, 2.0, 3.0],
[1.0, 0.0, 1.0, 2.0],
[2.0, 1.0, 0.0, 1.0],
[3.0, 2.0, 1.0, 0.0]
])
# MDS using the SMACOF algorithm.
mds = MDS(
n_components=2, # Dimensions of embedding (2D)
dissimilarity="precomputed", # Direct use of distance matrices
random_state=42 # Random seeding for reproducibility.
)
# Transformation to low-dimensional coordinates
embedded_coordinates = mds.fit_transform(distance_matrix)
# Get the coordinates of the result
x_coords, y_coords = embedded_coordinates[:, 0], embedded_coordinates[:, 1]
# Plotting the results
plt.figure(figsize=(8, 6))
plt.scatter(x_coords, y_coords, color='blue', s=100)
# Add labels to each point.
for i, (x, y) in enumerate(zip(x_coords, y_coords)):
plt.text(x, y, f"Point {i+1}", fontsize=12, ha='right')
plt.title("SMACOF - 2D Embedding")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.grid(True)
plt.show()
Code description.
- Input data:
- The distance_matrix is a symmetric distance matrix between the four data points.
- To use the distance matrix directly, dissimilarity=‘precomputed’ is specified.
- MDS object settings:
- Embedded in 2D space with n_components=2.
- The SMACOF algorithm is automatically used internally.
- Co-ordinate computation:
- The fit_transform method calculates the coordinates in low-dimensional space while preserving the original distance relationships.
- Result visualisation:
- Plotting the embedded points in 2D space using matplotlib.
Output example
- Four data points are placed in 2D space and visualised in a way that reflects the original distance relationship.
- Each point is labelled, allowing the structure to be understood intuitively.
Applicable data.
- The system works with distance matrices such as Euclidean and Manhattan distances.
- If the data is given directly in coordinate information, the distance matrix needs to be pre-computed (e.g. with scipy.spatial.distance).
Application examples
Scaling by Majorising a Complex Function (SMACOF) is used in various fields as a method of dimensionality reduction. Specific applications are described below.
1. cognitive mapping in psychology: SMACOF is widely used for cognitive mapping and rating scales in psychology, especially when visualising how individuals or groups perceive similarities between different objects or ideas.
Case studies:
– Brand ratings: based on consumers’ ratings of several brands, similarities between brands are calculated and visualised by mapping them in 2D or 3D space. This method can be used to understand which brands are perceived by consumers and how.
– Visualisation of psychological distance: based on data collected in psychological experiments, the distance between different objects as perceived by the participants can be visualised in order to understand unconscious cognitive patterns.
2. marketing and consumer behaviour analysis: SMACOF is also used in marketing to understand customer preferences and purchasing behaviour. It is used as a tool to visualise similarities between products and services at lower dimensions and to identify factors that influence consumers’ purchasing decisions.
Case studies:
– Product positioning: creating a product positioning map based on consumers’ purchasing history and similarities between products. For example, it can visualise the characteristics, price range and quality of competing products, which can be used for strategic decision-making.
– Consumer segmentation: analysing customer behavioural and preference data to map the characteristics of different customer segments in a low-dimensional space for use in targeting strategies
3. visualisation of gene expression data (bioinformatics): in the field of bioinformatics, SMACOF is often used to embed gene expression data in lower dimensions. Based on the similarity of gene expression, different samples can be placed in a low-dimensional space to visually understand correlations and clusters between genes.
Case study:
– Gene clustering: multiple gene samples are visualised using SMACOF and analysed for similarities between genes. This enables the identification of co-expression patterns of genes and the discovery of genes associated with diseases.
– Identification of disease subtypes: expression data of genes associated with diseases such as cancer are dimensionally reduced using SMACOF and used as a method to identify disease subtypes. Visualising differences in gene expression in different patient groups in a low-dimensional space may yield new disease classifications.
4. social network analysis: in social network analysis, SMACOF is used to visualise the relationships between nodes in a network. To understand how people, organisations, ideas, etc. are related, distances on the network are visualised by dimensionality reduction.
Case study:
– Social media analytics: tweets and Facebook ‘likes’ Based on data, visualise relationships between users with dimensionality reduction. Analysis of how users exchange information with each other and which topics are relevant.
– Visualising cooperation: visualise the relationships between companies and between collaborators in the academic field with SMACOF. It can identify which companies cooperate most with which companies and which researchers interact most in which fields.
5. image similarity analysis: SMACOF can be used as a method to group visually similar images by measuring the similarity between image data and representing it in a low-dimensional space.
Case study:
– Image search engine: based on image features, similar images are embedded and visualised in a low-dimensional space and displayed as search results. When a user uploads a particular image, similar images can be returned quickly.
– Image clustering: large amounts of image data are dimensionally reduced using SMACOF to group visually similar images for use in image classification and recommendation systems.
reference book
The following is discussed in the reference books on SMACOF (Scaling by Majorising a Complex Function).
Reference books.
2. ‘Modern Multidimensional Scaling: Theory and Applications’ by Ingwer Borg and Patrick J. Groenen
– Description: a comprehensive description of MDS, from its fundamentals to its applications; details of different dimensionality reduction methods, including the SMACOF algorithm, are presented and practical approaches and challenges in numerical computation are also discussed.
3. “Nonmetric Multidimensional Scaling”
4. ‘Applied Multivariate Statistical Analysis’ by Richard A. Johnson and Dean W. Wichern
– Description: this book explains the fundamentals of multivariate analysis within statistics, detailing the use of MDS and its derived algorithms; includes many applied examples to help understand SMACOF; includes a number of examples of the use of MDS and its derived algorithms; includes a number of examples of the use of MDS and its derived algorithms; includes a number of examples of the use of MDS and its derived algorithms; includes a number of examples of the use of SMACOF and its derived algorithms.
5. “Introduction to Multidimensional Scaling”
6. ‘The Handbook of Multivariate Experimental Psychology’ edited by Norman R. Anderson.
– Description: this handbook covers a range of methods in multivariate experimental psychology and introduces some of the experimental designs using SMACOF. In particular, there are chapters on the application of the MDS algorithm in psychology and the social sciences.
7. “An Introduction to MDS “
コメント