Stick-breaking Process Overview, Algorithm and Implementation Example

Machine Learning Artificial Intelligence Digital Transformation Probabilistic Generative Models Machine Learning with Bayesian Inference Small Data Nonparametric Bayesian and Gaussian Processes python Economy and Business Physics & Mathematics Navigation of this blog

Overview of Stick-breaking Process

The Stick-breaking Process (Stick-breaking Process) is a typical method for intuitively understanding the Dirichlet Process (DP) described in “Overview of the Dirichlet Process (DP), its algorithms, and examples of implementations“, an approach that generates an infinite-dimensional probability distribution by infinitely and repeatedly randomly splitting a stick of length 1. This makes it a visually and mathematically beautiful way to construct discrete probability measures of Dirichlet processes.

The process flows as follows.

Fold the first bar.
- Let the length of the entire bar be 1.
- First, sample the random variable \(v_i\) from \(Beta(1,\alpha)\): \(v_1\sim Beta(1,\alpha)\)
- Determine the weight to assign to the first cluster: \(\pi_1=v_1\)
Fold the next bar
- The next weight is given as part of the remaining bar (length\(1-v1\):\(v_2\sim Beta(1,\alpha)\)\), \(\pi_2=(1-v_1)v_2\)
Repeat this infinitely
- The \(k\)th weight is given by \(\pi_k=v_k\prod_{j=1}^{k-1}(1-v_j)\)
- All the \(v_k\) independently follow \(Beta(1,\alpha)\).
Final distribution
- The sum of the weights converges to 1. The weights (\pi_k\) continue indefinitely, but become smaller and smaller, so that in effect, large weights are concentrated in a finite number of clusters.

An intuitive understanding is as follows.

The image of dividing in order from the largest to the smallest: instead of dividing all the bars equally at once, the bars are first divided into larger ones, and then the rest are divided again. This naturally results in several large clusters and many small clusters.
Divide infinitely, but only a real finite number of things come out: in fact, most of the weights are concentrated in a few clusters, so only a finite number of clusters are visible in the real world.

The relation to the Dirichlet process is as follows.

The bar-cutting partitioning process is one way to construct the Dirichlet process\(DP(\alpha,G_0\)).
Using the weights \(\pi_k\), the probability measure of the Dirichlet process is expressed as follows:\[G=\sum_{k=1}^{\infty}\pi_k\delta_{\theta_k}\] where\(\theta_k\) is the sample following a base distribution\(G_0\). \(\delta_{\theta_k}\) is the delta function in \(\theta_k\).

Application examples

Clustering: Probabilistic assignment of clusters to which data actually belongs from an infinite number of cluster candidates.
Topic Model: Automatically determines the number of topics to assign to an article from an infinite number of topics.
Hierarchical Dirichlet Process (HDP): Represent cluster structures common to multiple groups (e.g., multiple documents). See “Hierarchical Dirichlet Process (HDP) Overview, Algorithm and Implementation Examples“

Implementation Example

This section summarizes the Python implementation of the Stick-breaking Process. The following sections describe basic examples using numpy and visualization using matplotlib.

Implementation of the basic bar-cutting and splitting process

import numpy as np

# Parameter α (concentration parameter of the Dirichlet process)
alpha = 1.0

# Divide the bar infinitely (realistically, about 100 steps to terminate)
n_clusters = 100

# Array to record the length of the bar
weights = np.zeros(n_clusters)

# Stick-cut partitioning process
remaining_stick = 1.0
for k in range(n_clusters):
    # Sample v_k following a Beta(1, α) distribution
    vk = np.random.beta(1, alpha)
    weights[k] = remaining_stick * vk
    remaining_stick *= (1 - vk)

# Confirmation of results
print("Cluster weights by bar-cutting process:")
print(weights)
print("total amount:", np.sum(weights))  # It should be approximately 1

Visualize results

import matplotlib.pyplot as plt

# Cluster Weights Plot
plt.figure(figsize=(10, 6))
plt.bar(range(1, n_clusters + 1), weights)
plt.xlabel('Cluster index')
plt.ylabel('Weight')
plt.title('Stick-breaking Process')
plt.show()

Brief explanation

Sampling of vk: Generate \(v_k\sim Beta(1,\alpha)\) at each step and divide the remaining portion of the bar by that percentage.
Cluster weights: The weight \(\pi_k\) of each cluster\(k\) is:\(\pi_k=v_k\prod_{j=1}^{k-1}(1-v_j)\)
Censoring: Cut off at up to 100 clusters, since the dimension cannot be infinite. In practice, 100 is sufficient because large weights are concentrated in the first few clusters.

Application (sample generation for Dirichlet process): By generating data based on the cluster weights \(\pi_k\), we can obtain samples from the Dirichlet process.

# sample data generation
n_samples = 1000
samples = np.random.choice(np.arange(n_clusters), size=n_samples, p=weights/weights.sum())

# histogram display
plt.figure(figsize=(10, 6))
plt.hist(samples, bins=n_clusters, density=True)
plt.title('Samples from Dirichlet Process via Stick-breaking')
plt.xlabel('Cluster index')
plt.ylabel('Frequency')
plt.show()

This implementation leads to the following applications

Bayesian nonparametric clustering
Topic models (LDA, HDP)
Infinite mixture Gaussian models

Based on this code, combinations with Hierarchical Dirichlet Processes (HDP) and Gaussian processes can also be implemented.

Application Examples

This section describes actual applications where the Stick-breaking Process is used.

1. topic model (HDP-LDA)

Application: Clustering of document sets

- While conventional LDA requires the number of topics to be fixed in advance, the Hierarchical Dirichlet Process (HDP) does not require the number of topics to be determined in advance.
- The bar-cutting partitioning process automatically selects the required number of topics from an infinite number of potential topic candidates.

Example: It has been applied to clustering research papers and classifying news articles.

Specific example: In one study, using a dataset of scientific papers, research topics were automatically extracted by HDP-LDA, and topics such as “machine learning,” “computational biology,” and “natural language processing” appeared. As new research areas are added, new topics are automatically added.

2. infinite Gaussian mixture model (IGMM)

Application: Data clustering with an unknown number of clusters

- The number of clusters (K) must be set in advance in the ordinary Gaussian Mixture Model (GMM), but it is not necessary in the IGMM.
- The number of clusters can be flexibly increased or decreased as needed through the bar-cutting process.

Specific example: In a study in which customer data was clustered by IGMM, clusters were automatically created according to customer purchase patterns, and “one-time buyers,” “repeaters,” “high-volume buyers,” etc. were discovered.

3. image recognition (Nonparametric Bayesian Models)

Applications: Object classification in images

- Nonparametric Bayesian methods, including the bar-cutting segmentation process, are used to recognize the number of object types present in an image without pre-setting the number of types.
- Particularly effective for image sets with complex backgrounds or unknown classes.

Specific example: In an object detection task, the classes of cars, pedestrians, signs, etc. in the image are automatically determined according to the data. If a new type of object appears, a new cluster (object class) is added accordingly.

4. gene expression analysis

Application: Clustering of gene expression data

- In biology, clustering is used to group gene expression patterns to find functional commonalities.
- Using the bar-cutting partitioning process, it is possible to analyze how many types of gene expression patterns can be classified without setting a priori how many types of gene expression patterns can be classified.

Specific example: In a study using HDP, the expression patterns of cells were automatically grouped by clustering time series data of expression levels. When a new gene cluster is discovered, a new cluster is automatically generated.

5. recommendation system

Application: Item recommendation based on user preferences

- A nonparametric Bayesian model utilizing a bar-cutting process is used to cluster user preference patterns.
- Since the number of clusters does not need to be determined in advance, the system can flexibly respond to new users and new items as they are added.

Specific example: In a study of video distribution services, clustering is performed based on users’ viewing history, and groups of users such as “action lovers,” “romance lovers,” and “documentary lovers” are automatically extracted. When a new genre was added, clusters matching that genre were dynamically generated.

In all cases, the strength of the bar-cutting process, “finding appropriate clusters automatically without pre-determining the number of groups,” was utilized.

reference book

This section will discuss reference books on the Stick-breaking Process and related Dirichlet Process (DP) and Hierarchical Dirichlet Process (HDP).

Books for basic understanding

Bayesian Nonparametric Data Analysis

Foundations of Machine Learning (Adaptive Computation and Machine Learning series)

Authors: Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar
Contents: Covers the fundamentals and advanced topics of machine learning. The fundamentals of Dirichlet processes, infinite mixture models, and bar-cutting partitioning processes are well explained.
Recommendation: The book has a good balance between theory and implementation, so it is easy to get an idea of the applications.

A book for in-depth study of HDP and applications

Machine Learning: A Probabilistic Perspective

Author：Kevin P. Murphy
Contents: Detailed explanations of probabilistic graphical models and Bayesian nonparametric methods, with many concrete examples of HDPs, Dirichlet processes, and bar-cutting partitioning processes.
Recommendation: Ideal for those who want to learn hands-on with Python and other languages, with implementation examples!

Pattern Recognition and Machine Learning

Author：Christopher M. Bishop
Description: Explains mixture models, Dirichlet distributions, Dirichlet processes, bar-cut segmentation processes, and more. A wide range of application examples.
Recommendation: Recommended for those who want to understand the theory in depth because of the detailed mathematical explanations!

Strong book on implementation and applications

Bayesian Analysis with Python

Author： Osvaldo Martin
Description: Explains how to implement Bayesian statistics using Python, including examples of implementing Dirichlet processes using PyMC3 and modeling HDP.
Recommendation: If you want to learn the bar-cut partitioning process while actually writing code, this is the one.