Solutions

Datasets

Research

Resources

Company

Talk to us

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Teaching machines with minimal data: one-shot learning

Toloka Team

October 2, 2024

Essential ML Guide

Humans can recognize objects, patterns, or faces after seeing them only once. However, machines learn differently. Traditional algorithms, particularly in deep learning, require thousands or even millions of examples to master a task.

A branch of artificial intelligence called one-shot learning aims to train machines to learn from just one or a minimal number of examples. One-shot learning seeks to mimic the human ability to recognize objects fast by training models that can generalize from just a single or a very limited set of data.

Letting models learn with minimal input data and one-shot learning opens up a wide range of opportunities for areas with difficult data collection, contributing to the efficiency and adaptability of AI. Here, we review the basic concepts, methods, applications,, and challenges of one-shot learning.

What is one-shot learning?

One-shot learning is a machine learning-based object classification algorithm where a model learns to recognize objects, patterns, or tasks from only a single example or a minimal number of examples. This contrasts with traditional machine learning models trained through supervised learning, which typically requires large amounts of training data to perform well.

For example, one-shot learning modelsassessg the similarities and differences between two images. Unlike traditional methods that require large datasets, one-shot learning aims to classify objects with minimal examples, often just a single instance. Usually, there's only one image for one class; otherwise, if there is more than one, it's called few-shots learning.

One-shot learning is mostly helpful in computer vision (CV). Computer vision tasks such as facial recognition or object detection demand accurate classification despite limited training data. By learning to compare features across images, this technique enables a machine learning model to generalize from one or a few examples, so even sparse data is not a hindrance for computer vision models to do their job successfully.

Why is one-shot learning important?

One-shot learning addresses a fundamental challenge in machine learning: data scarcity. Collecting and labeling large amounts of data can be expensive andtime-consuming. Many real-world use cases involve limited data. For instance, in medical imaging, rare diseases may not have enough examples for traditional learning methods. One-shot learning enables effective classification and diagnosis in such scenarios.

One-shot learning algorithms also allow for adaptability in dynamic environments. New classes or categories often emerge over time in real-world applications. For instance, e-commerce platforms are constantly faced with new products, and social media platforms deal with new forms of content. One-step learning makes it possible for models to adapt quickly without requiring complete re-training on new data.

How does one-shot learning work?

One-shot learning in computer vision shifts focus from the traditional object classification or detection tasks to a more straightforward comparison or matching task. In most standard computer vision models, the task is to classify images into predefined categories or detect objects within an image. These models require extensive training data to recognize patterns and learn the distinct features of each category.

In contrast, the one-shot learning model operates by simply comparing a new input image to a reference or database of images to determine if they match. The model doesn't need to explicitly classify the image into one of many categories. Instead, it focuses on answering the question: Is this image similar to another image I've seen before?

For instance, in face recognition, constantly adding new people to the system would require retraining the model frequently, which is massively impractical in border control or access systems. For border control tasks, the goal is not to classify people into predefined categories but rather to verify if the person standing in front of the camera matches the image on their ID or passport.

Instead of classifying the image into a predefined category, the model’s job is to compare two images: the live image from the camera and the passport photo. Then one shot learning model evaluates the degree of similarity between them.

Imagine a facial recognition system used for security purposes, where the model is given only a single image of a person, for example, an employee, and needs to recognize them later from different angles or lighting conditions. Traditional algorithms might struggle with only one training image, but a one-shot learning system would compare the new images to the original, assess how similar they are, and make a classification decision based on that comparison.

Practical application

Input images. The system receives two inputs—one from the live camera feed and one from the passport or ID photo.

Feature extraction. One-shot learning techniques, such as siamese neural networks or triplet networks, are used to extract important features from both images. These features represent the key characteristics of each face like the distance between the eyes, shape of nose, etc.

Similarity comparison. The features from both images are then compared through the evaluation of the distance between features. If the difference between the two feature sets is below a certain threshold, the system concludes that the two images are similar enough to be the same person.

Difference between zero shot, one shot and few shot learning

Zero-shot learning

Zero-shot learning (ZSL) allows models to make predictions on classes they have never encountered during training. To infer the properties of these new categories the model uses semantic, auxiliary information or metadata about the new, unseen classes, such as descriptions or relationships between classes. In ZSL, the model must rely on this additional knowledge to make guesses, rather than learning directly from examples of the new class.

For example, if a model has been trained to recognize animals like dogs, cats, and horses, it may still be able to identify a zebra in a picture based on the fact that a zebra is a four-legged mammal with stripes, even if it has never seen an image of a zebra before. ZSL is widely used in natural language processing (NLP) and image recognition tasks, especially when there is a need to generalize across unseen categories without requiring new training data.

Few-shot learning

Few-shot learning (FSL) is when a model is trained with only a few examples (usually between 2 to 100) for each class but is still expected to perform well on unseen examples. It is a middle ground between zero-shot and one-shot learning.

Few-shot learning relies on meta-learning techniques, where the model learns how to adapt quickly to new tasks with minimal examples by being trained on a variety of tasks with few data points. A good example of few-shot learning is a model tasked with identifying different types of flowers after seeing only a few images of each flower.

Key differences between zero shot, one shot and few shot learning

Zero-shot, one-shot, and few-shot learning are all machine learning techniques that address the challenge of generalizing to new classes with limited data. However, they differ in the amount of data they require for each class and how they learn to recognize new categories.

One-shot learning enables the model to classify based on one example per class by focusing on similarity comparisons;
Zero-shot learning generalizes to unseen classes using additional semantic information without seeing any examples from those classes;
Few-shot learning extends one-shot learning by allowing the model to learn from a small number of examples per class, using meta-learning techniques to generalize well with limited data.

Popular one-shot learning techniques

Siamese networks

Siamese networks are one of the earliest and most widely used architectures for one-shot learning. They consist of two identical neural networks that share weights and are trained to compare two inputs, usually images. Siamese neural networks are a type of convolutional neural networks (CNNs).

A siamese neural network is designed specifically for comparison tasks rather than direct classification. It has two identical branches of convolutional neural networks that share the same weights and are trained simultaneously. Both branches take separate inputs, and the core idea is to combine their output as a function of the input data, which shows a similarity between two images rather than assigning a label to each image independently.

Given a pair of images, the siamese network computes feature vectors for both inputs. The model is then trained to output a similarity score between these vectors. During inference, the model checks whether a new input is similar to the reference image based on this similarity score.

Memory-augmented neural networks

Memory-augmented neural networks (MANNs) are designed to store and retrieve information from an external memory module, allowing the model to recall past examples effectively. These networks can adapt to new tasks quickly using their memory, making them suitable for one-shot learning.

All of this is available due to the fact that memory-augmented neural networks are a specialized type of recurrent neural networks (RNNs) that integrate external memory structures, enabling them to store and retrieve information over long sequences, much like a computer's memory. RNNs store data on past inputs in their memory to pass it on to the next sequence output.

MANNs consist of a controller (e.g., a neural network) and an external memory bank. The controller reads from and writes to the memory during training, enabling the model to store and recall representations of training examples. In one-shot learning, the model stores information about the support examples and uses it to classify new examples.

Metric learning

In one-shot learning, the goal is to correctly classify a new example using only a single example from each class during training. Since there's very limited data per class, traditional learning methods often don't work well. Metric learning helps address this problem by focusing on learning a distance metric between examples that can measure how "close" two examples are to each other. The idea is that examples from the same class should be close together in some feature space, while examples from different classes should be far apart.

Prototype-based learning

Prototypical networks are a popular neural network architecture used for prototype-based learning, especially in the few-shot and one-shot learning paradigm. Here each class is represented by a prototype, typically computed as the mean of the feature embeddings of all examples in that class. Feature embeddings are a way to represent data as vectors of numbers that capture important features or characteristics of that data in a form that a machine-learning model can understand and process efficiently.

When the model sees something new, it checks which prototype the new item is closest to and labels it accordingly. For example, if we want to teach a model to recognize animals (dogs, cats, birds) with only 5 pictures of each, it will create a prototype for each group. When showing the model a new picture of a dog, it compares this picture to the dog, cat, and bird prototypes. Since the new picture is closest to the dog prototype, it correctly guesses that it's a dog.

If it is a one-shot learning and there's only one picture per class (like one picture of a dog, one of a cat, and one of a bird), the concept still works, but instead of creating an "average" prototype from multiple pictures, the single picture itself becomes the prototype for that class.

Matching networks

Matching networks rely on memory and attention mechanisms to compare new examples to a set of labeled instances, called the support set, and then predict based on similarity. They focus on the idea of finding the best match between new data and previously learned examples, which is why matching networks carry the name.

When a new example is presented to the model, the network tries to match it to the closest examples in the support set that it has already learned from. This is done by comparing the features of the new example to the features of the known examples.

Conclusion

One-shot learning represents a significant advancement in the field of machine learning, particularly for applications requiring rapid adaptation to new classes with minimal data. Allowing models to be trained on just one example, one-shot learning challenges the traditional notion that more data always means better performance.

Moreover, one-shot learning leverages the innate human ability to recognize patterns and make judgments based on limited information, bridging the gap between human cognitive processes and artificial intelligence. As we continue to explore the possibilities of this technique, we are not only advancing the technology, but also moving closer to creating intelligent systems that can learn, adapt, and thrive in real-time, just like humans.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Recent articles

View all articles

Beyond Next-Token Prediction: How Post-Training Teaches LLMs to Reason

Jul 1, 2025

Why data for AI must prioritize integrity now

Jun 25, 2025

The new frontier of cybersecurity: a guide to AI agent security

Jun 18, 2025

Beyond Next-Token Prediction: How Post-Training Teaches LLMs to Reason

Jul 1, 2025

Why data for AI must prioritize integrity now

Jun 25, 2025

The new frontier of cybersecurity: a guide to AI agent security

Jun 18, 2025

Agent Evaluation: Why Simulated Environments are the New Frontier for Data

Jun 17, 2025

More about Toloka

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?