Overview
One-shot learning is a learning approach designed for situations where only one training example is available per class. Its primary goal is to enable models to achieve high generalization performance even when data is scarce. This method focuses on effectively learning patterns from limited datasets, allowing models to identify novel classes with minimal prior information.
In many practical scenarios, collecting large amounts of labeled data is challenging or even impossible, making one-shot learning a powerful approach for applications like rare disease diagnosis, facial recognition, and new product classification.
One-Shot vs. Few-Shot vs. Zero-Shot Learning
Both One-Shot Learning and Few-Shot Learning, as described in “Few-Shot Learning: Overview, Algorithms, and Implementations,” aim to achieve high-precision classification and recognition from a small number of training examples. However, there are clear differences in their definitions:
-
One-Shot Learning
-
Each class is represented by only one training example (1-shot).
-
The goal is to accurately classify a wide variety of unseen classes based on this single example, without relying on large-scale, data-intensive learning.
-
-
Few-Shot Learning
-
Typically uses 2 to 10 training examples per class.
-
It aims to capture more detailed class features and provides more stable recognition compared to one-shot learning.
-
-
Zero-Shot Learning
-
No actual training examples are provided.
-
Instead, the model learns to classify new classes based on descriptive attributes or external semantic information.
-
Each of these methods addresses different levels of data scarcity, providing flexible solutions for various application scenarios.
Key Approaches in One-Shot Learning
-
Similarity-Based Learning (Siamese Networks)
-
Siamese Networks consist of two subnetworks that share weights and extract feature vectors from paired inputs.
-
The distance between these feature vectors is used to determine whether the inputs belong to the same class.
-
This approach allows the model to recognize new classes effectively, even with only one training sample per class.
-
-
Memory-Based Learning (Matching Networks)
-
Matching Networks classify a given query sample based on its similarity to a small support set of known classes.
-
The model learns to match queries with the most similar examples in the support set, improving generalization to new classes.
-
It uses a process called episodic learning, which strengthens its ability to generalize from few examples.
-
-
Prototype-Based Learning (Prototypical Networks)
-
Prototypical Networks represent each class with a single prototype vector, calculated as the mean of all feature vectors in that class.
-
New queries are classified based on their proximity to these prototypes, making this approach both simple and highly effective.
-
-
Meta-Learning (MAML: Model-Agnostic Meta-Learning)
-
MAML aims to optimize the initial model parameters so that it can quickly adapt to new tasks with a small amount of data.
-
It trains the model to learn how to learn, enabling rapid adaptation to novel classes.
-
Each of these approaches provides a unique perspective on the core challenge of one-shot learning and offers powerful solutions for recognizing new classes with minimal data.
Strengths and Challenges of One-Shot Learning
Strengths:
-
Reduced Data Collection and Annotation Costs
-
Unlike conventional learning methods, one-shot learning requires only a single labeled example per class, significantly reducing the time and cost associated with data collection.
-
This is particularly valuable in fields like medical imaging, where data is often scarce.
-
-
Robust Performance in Low-Data Scenarios
-
One-shot learning is highly effective in environments where labeled data is limited, making it ideal for new product identification, rare disease detection, and low-resource languages.
-
-
Rapid Adaptation
-
Once pre-trained, these models can adapt to new classes quickly, making them suitable for real-time systems like customer support or security authentication.
-
Challenges:
-
Defining Similarity
-
Accurately defining class similarity is a critical challenge, as classes within the same category can vary widely in appearance or structure.
-
-
Instability in Learning
-
While powerful with limited data, one-shot models are prone to overfitting, reducing their generalization capability.
-
-
Need for Meta-Learning
-
To achieve high precision, many one-shot learning approaches rely on sophisticated meta-learning algorithms, which add complexity to the model design.
-
Practical Applications of One-Shot Learning
-
Rapid Identification of New Products, Classes, or People
-
Useful for environments where new items or categories frequently emerge, such as product catalogs, fashion, or customer identification.
-
-
Low-Data Scenarios
-
Ideal for applications where collecting large amounts of data is impractical or costly, such as rare disease diagnosis, planetary exploration, or specialized manufacturing.
-
-
Facial Recognition Systems
-
In security or user authentication systems, each user typically registers only one facial image, making one-shot learning an effective approach for rapid and accurate identification.
-
These strengths and challenges highlight the critical role one-shot learning plays in modern machine learning, particularly as the demand for flexible, data-efficient AI systems continues to grow.
Algorithms for One-Shot Learning
Below are some of the most well-known algorithms used in One-Shot Learning:
1. Siamese Networks
Siamese Networks are a type of neural network designed to determine whether two inputs (e.g., pairs of images) belong to the same class based on distance calculations. Specifically, they consist of two subnetworks with shared weights that extract feature vectors from each input, followed by a distance calculation to evaluate the similarity between these vectors. If the distance is small, the inputs are considered to belong to the same class, while a larger distance indicates different classes.
Key Characteristics of Siamese Networks:
-
Similarity-Based Learning:
Instead of performing absolute classification for each class, they learn to assess the similarity between pairs of inputs, making them highly adaptable to new classes. -
Task-Agnostic:
These networks can be applied to a wide range of data types, including images, text, and speech, making them versatile across different tasks. -
Effective for One-Shot Learning:
They can effectively learn similarities from a few samples, making them particularly strong for one-shot scenarios.
Basic Structure of Siamese Networks:
-
Shared-Weight Neural Networks:
Two subnetworks with identical architectures and shared weights extract feature vectors from each input. Convolutional Neural Networks (CNNs) or encoders are commonly used for this purpose. -
Distance Calculation:
The distance between the feature vectors is computed, typically using measures like L1 distance, L2 distance, or cosine similarity, to assess the similarity between inputs.
The foundational architecture of Siamese Networks was introduced by Koch et al. in the paper “Siamese Neural Networks for One-shot Image Recognition” (2015), which demonstrated their effectiveness for high-accuracy classification even with limited training samples.
Applications of Siamese Networks:
-
Facial Recognition: Security systems and personal device authentication
-
Handwritten Character Recognition: Classification of new characters, like in the Omniglot dataset
-
Signature Verification: Authenticating handwritten signatures
-
Speech Recognition: Evaluating similarity between voice samples
2. Matching Networks
Matching Networks classify a query sample by evaluating its similarity to a set of support examples (support set). Unlike conventional neural networks, which pre-learn fixed class representations, Matching Networks dynamically learn an optimal distance function through episodic meta-learning, making them highly effective for few-shot learning.
Key Characteristics of Matching Networks:
-
Attention-Based Similarity:
They use an attention mechanism to calculate the similarity between the query and support set, resulting in more flexible and accurate similarity assessments compared to simple distance metrics. -
Support-Dependent Output:
The classification output is strongly dependent on the support set, allowing for rapid adaptation to new classes by merely updating the support set. -
Episodic Learning:
Matching Networks train using multiple “episodes,” where each episode consists of a support set and corresponding query set, improving the model’s generalization to new tasks.
This approach was proposed by Vinyals et al. in the paper “Matching Networks for One Shot Learning” (2016), which introduced techniques like Long Short-Term Memory (LSTM) and contextual embedding to improve the mapping between queries and support sets.
Key Techniques in Matching Networks:
-
Contextual Embeddings:
Use LSTMs or other context-aware embeddings to more effectively map the relationship between queries and support examples. -
Attention Mechanism:
Calculate similarity based on context, allowing for more nuanced distance measurement beyond simple Euclidean distances.
3. Prototypical Networks
Prototypical Networks classify query samples by comparing their feature vectors to “prototypes” – the mean vectors of each class in the support set. This approach is both simple and effective, offering a highly interpretable method for few-shot learning.
Key Characteristics of Prototypical Networks:
-
Simplicity and Speed:
Prototypes are calculated as the average feature vector of each class, making the approach computationally efficient and straightforward to implement. -
Distance-Based Classification:
Uses straightforward distance metrics (e.g., Euclidean distance, cosine similarity) to determine class membership, improving interpretability. -
Strong for One-Shot and Few-Shot Learning:
The ability to form prototypes from just a few examples makes this approach highly effective for one-shot and few-shot tasks.
This method was introduced by Snell et al. in the paper “Prototypical Networks for Few-shot Learning” (2017), demonstrating that prototype-based distance calculations can significantly improve classification accuracy in low-data regimes.
Common Distance Metrics:
-
Euclidean Distance:
-
Cosine Similarity:
4. Relation Networks
Relation Networks focus on learning the relationships between query samples and support examples, rather than merely calculating distances. This approach explicitly models the relation between each query-support pair using a separate neural network, allowing for more complex similarity learning.
Key Characteristics of Relation Networks:
-
Learning Relationships Instead of Distances:
Unlike traditional distance-based methods, Relation Networks learn the relationship itself, capturing non-linear similarities that simple distance metrics cannot. -
CNN-Based Scoring:
Typically uses Convolutional Neural Networks (CNNs) to compute relation scores, making it capable of capturing spatial and structural features. -
Flexible Data Formats:
Can be applied to various data types, including images, audio, and text.
This approach was proposed by Sung et al. in the paper “Learning to Compare: Relation Network for Few-Shot Learning”(2018), which demonstrated substantial improvements over distance-based methods for few-shot tasks.
5. MAML (Model-Agnostic Meta-Learning)
MAML is a meta-learning algorithm designed to optimize initial model parameters so that the model can quickly adapt to new tasks with only a few steps of gradient descent. Unlike traditional meta-learning methods, MAML is model-agnostic, meaning it can be applied to a wide variety of neural networks, including those for image classification, natural language processing, and reinforcement learning.
Key Characteristics of MAML:
-
Rapid Adaptation:
Allows models to adapt to new tasks with minimal gradient steps, making it ideal for one-shot and few-shot learning. -
Optimized Initial Parameters:
Trains a set of initial parameters that can quickly adapt to diverse tasks, significantly reducing training time for new tasks. -
General Framework:
Not limited to specific tasks or data types, providing flexibility across various domains.
This method was introduced by Finn et al. in the paper “Model-Agnostic Meta-Learning for Fast Adaptation” (2017), representing a major breakthrough in the field of meta-learning.
6. Memory-Augmented Networks (External Memory Models)
Memory-Augmented Networks incorporate external memory into conventional neural networks, allowing the model to “remember” previously encountered data and retrieve this information when needed. This significantly enhances the model’s ability to adapt to new tasks with minimal data.
Key Characteristics of Memory-Augmented Networks:
-
Experience Retention:
Stores past observations in external memory, enabling rapid adaptation to new inputs based on previously encountered data. -
Enhanced One-Shot Classification:
Utilizes past experiences to improve one-shot and few-shot classification accuracy. -
Task-Agnostic Flexibility:
Applicable to a wide range of tasks, including image classification, natural language processing, and reinforcement learning.
Pioneering work in this area includes “Matching Networks for One Shot Learning” (2016) by Vinyals et al. and “Memory-Augmented Neural Networks” (2016) by Santoro et al., which demonstrated the power of external memory in improving few-shot learning performance.
Implementation Example
The following is an example of a PyTorch implementation of the Siamese Network, a typical example of One-Shot Learning.
This network learns whether two images are of the same class, and is very robust for One-Shot classification.
- Required Libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader
import random
2. data set definition (creation of image pairs)
class SiameseMNIST(Dataset):
def __init__(self, train=True):
self.mnist = dsets.MNIST(root='./data', train=train, download=True, transform=transforms.ToTensor())
def __getitem__(self, index):
img1, label1 = self.mnist[index]
should_get_same_class = random.randint(0, 1)
if should_get_same_class:
while True:
index2 = random.randint(0, len(self.mnist) - 1)
img2, label2 = self.mnist[index2]
if label1 == label2:
break
else:
while True:
index2 = random.randint(0, len(self.mnist) - 1)
img2, label2 = self.mnist[index2]
if label1 != label2:
break
label = torch.tensor([int(label1 == label2)], dtype=torch.float32)
return img1, img2, label
def __len__(self):
return len(self.mnist)
3. Siamese Network Model Definition
class SiameseNet(nn.Module):
def __init__(self):
super().__init__()
self.cnn = nn.Sequential(
nn.Conv2d(1, 64, 3), nn.ReLU(), nn.MaxPool2d(2),
nn.Conv2d(64, 128, 3), nn.ReLU(), nn.MaxPool2d(2),
nn.Flatten()
)
self.fc = nn.Sequential(
nn.Linear(128*5*5, 256),
nn.ReLU(),
nn.Linear(256, 128)
)
def forward_once(self, x):
out = self.cnn(x)
out = self.fc(out)
return out
def forward(self, x1, x2):
emb1 = self.forward_once(x1)
emb2 = self.forward_once(x2)
return emb1, emb2
4. loss function (contrast loss)
class ContrastiveLoss(nn.Module):
def __init__(self, margin=1.0):
super().__init__()
self.margin = margin
def forward(self, output1, output2, label):
euclidean_distance = F.pairwise_distance(output1, output2)
loss = (label * euclidean_distance.pow(2)) +
((1 - label) * F.relu(self.margin - euclidean_distance).pow(2))
return loss.mean()
5. learning loop (simplified version)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = SiameseNet().to(device)
criterion = ContrastiveLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
train_loader = DataLoader(SiameseMNIST(train=True), batch_size=64, shuffle=True)
for epoch in range(5):
for img1, img2, label in train_loader:
img1, img2, label = img1.to(device), img2.to(device), label.to(device)
output1, output2 = model(img1, img2)
loss = criterion(output1, output2, label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
Example of inference (One-Shot classification)
#Store a single image of each class as "support"
#Calculate the similarity between the query image and the support images → Classify the query as the class with the highest similarity
Note:
-
This example uses the MNIST dataset, but more advanced datasets for few/one-shot learning, like Omniglot, are also available.
-
Similarity is calculated using F.pairwise_distance() (Euclidean distance).
-
Contrastive Loss is the key component of Siamese Networks.
Application Examples
1. Face Recognition Systems
Face recognition systems are technologies that use facial images to identify individual users. These systems commonly employ advanced deep learning models such as Siamese Networks, FaceNet, and ArcFace, which excel at measuring the similarity between different images. One-shot learning is particularly effective in this context, as users typically register only a single facial image, eliminating the need for large amounts of training data when recognizing new users.
These systems have been widely adopted in real-world applications, including Apple Face ID, Facebook’s photo tagging, and automated entry systems for security and access control. By leveraging one-shot learning, these technologies enable fast and accurate personal identification, significantly enhancing both convenience and security in everyday life and business operations.
2. Medical Imaging
Medical image diagnosis involves analyzing patient data from X-rays, CT scans, MRIs, and other medical imaging modalities to detect diseases. In particular, the diagnosis of rare diseases often faces the challenge of having extremely limited labeled training data, making one-shot learning highly effective in this field. Algorithms like Prototypical Networks, Matching Networks, and MAML (Model-Agnostic Meta-Learning) are frequently used to address these challenges.
These algorithms can effectively learn from small samples, making them well-suited for identifying rare disease patterns. Examples include ISIC datasets for skin cancer detection, pathology image classification, and MRI anomaly detection. By improving the early diagnosis of diseases and supporting rapid clinical decision-making, these techniques contribute significantly to patient outcomes and healthcare efficiency.
3. Handwriting Recognition
Handwriting recognition involves identifying specific characters or words based on their unique shapes and strokes, including new Chinese characters or the irregular handwriting of children. Given the limited number of samples (often just 1-2 per character), conventional data-intensive methods struggle to achieve high accuracy. In these cases, one-shot learning provides a powerful solution.
This field often relies on Siamese Networks and the Omniglot dataset. Siamese Networks are designed to learn similarities between paired images, allowing for effective classification even with limited samples. The Omniglot dataset, which contains over 1,600 distinct characters, is a standard benchmark for one-shot classification. These methods enable rapid and accurate recognition of previously unseen characters, making them essential for handwriting recognition systems.
4. New Product Classification in E-commerce
In e-commerce and retail, classifying new products into appropriate categories and recommending similar items is a critical task. This is particularly challenging when each new product is represented by only a single image, making traditional data-intensive approaches impractical. One-shot learning is highly effective in these cases.
Common approaches include Siamese Networks and Transformer-based embedding models. Siamese Networks learn to assess the similarity between different products, mapping new product images to existing categories quickly. Transformer-based models can further enhance this by integrating product descriptions and textual information, providing more precise similarity assessments.
These technologies are widely used by major e-commerce platforms like Amazon and Rakuten for search engines and recommendation systems, helping to improve product discovery and personalized recommendations.
5. Robotics and Reinforcement Learning
Reinforcement learning (RL) in robotics focuses on training robots to autonomously learn tasks in physical environments. However, real-world training often involves high time and cost constraints, making one-shot learning and meta-learning critical for efficient adaptation from minimal demonstrations.
Key methods include MAML (Model-Agnostic Meta-Learning), Meta-RL, and One-Shot Imitation Learning. MAML optimizes initial weights for rapid adaptation to new tasks, while Meta-RL extends this concept to reinforcement learning. One-Shot Imitation Learning allows robots to learn complex tasks from a single demonstration, making it particularly valuable for manipulation tasks.
These techniques have been widely adopted by research institutions like OpenAI and DeepMind, enabling robots to learn complex tasks and adapt to dynamic environments efficiently. This approach significantly broadens the scope of robotics applications in industrial automation and service robotics.
6. Natural Language Processing and Prompt AI (Zero-to-One-Shot Learning)
Prompt-based AI in natural language processing (NLP) refers to the use of minimal examples or prompts to train language models, without the need for extensive fine-tuning. Large language models like GPT-3 and GPT-4 excel at this approach, leveraging their in-context learning capabilities to produce accurate outputs from minimal input.
Techniques like Few-Shot Prompting provide context to the model by presenting a few examples, enabling rapid adaptation to new tasks. Applications include custom chatbot responses, FAQ generation from single examples, and style imitation from a single translation.
This approach significantly reduces the need for large-scale labeled data, making it highly effective for domain-specific knowledge and task adaptation. It has been widely adopted across industries for customer support, automated content generation, and personalized digital assistants.
Recommended Books, Papers, and Resources for One-Shot Learning
Beginner-Friendly (Basic Understanding + Implementation)
-
“Deep Learning with PyTorch” (Eli Stevens, Luca Antiga, Thomas Viehmann)
-
Comprehensive guide on PyTorch, including the fundamentals of deep learning and transfer learning.
-
Language: English
-
-
“Deep Learning for Coders with fastai & PyTorch” (Jeremy Howard, Sylvain Gugger)
-
Practical approach to deep learning, covering few-shot and one-shot learning techniques.
-
Language: English
-
-
“Hands-On One-Shot Learning with Python” (Shruti Jadon, Harveen Singh Chadha)
-
Focuses on one-shot learning with Python, including Siamese Networks and practical implementations.
-
Language: Englis
-
Advanced Theoretical and Algorithmic Books
-
“Meta-Learning: Theory, Algorithms and Applications” (2024)
-
Comprehensive coverage of meta-learning, including One/Few-Shot Learning.
-
Language: English
-
-
“Machine Learning Yearning” (Andrew Ng)
-
Insights into real-world challenges of One-Shot Learning and practical ML problem-solving.
-
Language: English (Free PDF available)
-
Foundational Papers (Origins of One-Shot Learning)
-
“Siamese Neural Networks for One-shot Image Recognition“
-
Algorithm: Siamese Network
-
Authors: Koch et al., 2015
-
-
“Matching Networks for One Shot Learning“
-
Algorithm: Matching Networks
-
Authors: Vinyals et al., 2016
-
-
“Prototypical Networks for Few-shot Learning“
-
Algorithm: Prototypical Networks
-
Authors: Snell et al., 2017
-
-
“Model-Agnostic Meta-Learning (MAML)“
-
Algorithm: Meta-Learning Method
-
Authors: Finn et al., 2017
-
Video Courses and Online Lectures
-
-
Platform: Coursera / Udacity
-
Content: Practical and theoretical aspects of One-Shot Learning and meta-learning
-
Practical GitHub Resources
-
-
Example of Siamese Network implementation for facial similarity in PyTorch
-
-
-
Official implementation of Prototypical Networks
-
-
-
Standard dataset and experimental code for One-Shot Learning (Omniglot)
-
コメント