How to label images for machine learning

Natalie Kudan
by
Image

Subscribe to Toloka News

Subscribe to Toloka News

For those of you looking to learn more about how to perform image labeling to train a machine learning model, this article is for you. We’ve outlined what labeling images is, what it’s used for, what the typical process looks like, what the types and methods are, and how to optimize with crowdsourcing. Keep reading to find out more. 

Get high-quality data. Fast.

Try Toloka Hybrid Labeling for image, audio, or text annotation from LLM+Humans in 40+ languages
Talk to us
Image

What is image labeling?

Image labeling is a type of data labeling where the goal is to add meaningful information to images. The task in general is to identify certain features or objects in an image, and then to use this information to select said objects in an image, to classify the image according to the presence of these features. This type of labeling is often used to obtain training data for machine learning models, especially in the field of computer vision.

In other words, it’s when you annotate certain objects or features in an image. These labels teach a computer vision model how to identify a particular object. For example, in a series of high-angle images of a city, you could annotate all the skyscrapers. These labels help a model determine what a skyscraper is.

Image annotation is used to create datasets with different objects for computer vision models, which are split into training sets — for initial model training — and test/validation sets — to evaluate model performance. Data scientists use the dataset to train and evaluate their model. Then the model can automatically assign labels to unlabeled data.

By using the right image labeling strategy, you can create a high-quality dataset that will help a model better learn how to identify objects. Labeling images is a dynamic process which machine learning engineers are continuously adapting and improving upon.

How to label image data for machine learning

To label images for training a computer vision model, you need to follow these steps.

1. Define which kind of data you need for model training

The type of data labeling task you will do will depend on that. For example, in some cases you might need sets of images representing certain categories (image classification task), in other cases you might need images with certain types of objects identified and selected (object detection task).

2. Define the characteristics of labeled data your model needs

For an image classification task, you need to define classes. For object detection tasks, the rules of markup: do you need precise selection via polygons, or is it enough to use bounding boxes?

3. Decide how much labeled data of each type you need

Before collecting and labeling data, you need to understand how much of each type of data you need to train a balanced and unbiased ML model. You wouldn't want to skew your model's performance due to imbalanced training data.

4. Choose the optimal way to label training data

In general, there are a few ways: human data labeling or automation. Human labeling is more time consuming and expensive, but tends to be more reliable. If you decide you need human input, there is in-house labeling, outsourcing, and crowdsourcing. For more information on how AI-powered businesses label data today, check out our blog post.

5. Decompose the labeling task

If you decide to employ human data labeling to ensure high-quality results, you'll need to break your image labeling task down into steps that are clear enough for anyone to handle. To ensure optimal labeling, break your task down into parts by replacing one large problem with a series of smaller, separate problems that are easier to solve.

6. Write clear instructions

The more straightforward and clear your labeling instructions are, the more reliable the whole process will be. Oftentimes, things that seem obvious to you might not be clear to everyone else. Write concise and comprehensive instructions, provide examples, and foresee common mistakes.

7. Set up quality control

Think in advance about what you will do to ensure the quality of labeled data, preferably automatically, without the need to check the results yourself. This usually means you need to create a pipeline: a series of labeling and verification steps for your image labeling process. For example, divide your object detection task between three groups of people: the first person defines whether the desired object is present on an image, the second person selects the said object, and the third person checks if the object has been selected correctly.

Image labeling also plays a significant role in AI and ML as a key component of developing supervised models with computer vision capabilities. Image labeling helps train machine learning models to label images or identify classes of objects within an image. Training with labels helps these models identify patterns until they can recognize objects on their own.

What is image labeling and annotation used for?

Image annotation is a dynamic process that involves labeling digital images and is a vital part of training computer vision models that process image data for object detection, classification, segmentation, and more. A dataset of images that have been labeled and annotated to identify and classify an object, for example, is required to train an object detection model. This kind of computer vision projects is an increasingly important technology. For example, manufacturers of self-driving cars rely on millions of correctly labeled data points to ensure the safety and efficiency of their vehicles.

Image labeling is used across a wide variety of industries for various computer vision tasks such as:

Retail and e-commerce

  • Product recognition on store shelves
  • Virtual fitting rooms
  • People counting for retail stores

Transportation

  • Pedestrian detection
  • Traffic prediction
  • Parking occupancy detection
  • Road condition monitoring

Manufacturing

  • Personal protective equipment detection

Biometrics

  • Facial feature detection
  • Iris recognition

Agriculture

  • Plant disease detection
  • Object detection in agriculture

Marketing

  • Logo recognition

Methods used in image labeling

Image annotation sets a standard that computer vision algorithms try to learn from. Therefore, accurate labeling is essential in training neural networks. There are three methods for image labeling: manual, semi-automated, and synthetic.

Manual image annotation

This process involves manually defining labels for an entire image or drawing regions in an image and adding textual descriptions for each region. There is a special kind of computer vision annotation tool that allow operators to rotate through multiple images, draw regions (bounding boxes or polygons) on an image and assign labels, and save this data to a standardized format that can be used for data training.

However, an in-house approach to manual image annotation has some drawbacks: labels can be inconsistent when multiple annotators are involved, and it’s time consuming, costly, and difficult to scale for large datasets. To ensure consistency, annotators must be provided with clear instructions and consideration needs to be given to quality control of the labeling.

Semi-automated image annotations

An automated image annotation tool can help manual annotators by attempting to detect object boundaries in an image and providing a starting point for the annotator. The algorithms of image annotation software are not 100% accurate, but they can save time for human annotators by providing at least a partial map of objects in the image.

Synthetic image labeling

As an alternative to manual image annotation, synthetic image labeling is an accurate and cost-effective technique. It involves automatically generating images that are similar to real-life objects or human faces. The main benefit of synthetic images is that labels are known in advance.

How to label images via crowdsourcing

Scaling data labeling from a few in-house labelers to an industrial solution would require large data labeling teams as well as dozens of managers to supervise this vast workforce. Driving quality for in-house data labeling means dramatically increasing the time involved in labeling that data, making the entire image labeling process slow and costly.

However, there are alternative ways to label data for ML models. One of them is crowdsourcing. Crowdsourcing refers to a specific process of labeling data that employs many annotators who have signed up on a particular platform. Simply put, teams working with artificial intelligence post unlabeled data and labeling tasks, and people choose and complete tasks they are interested in.

The main challenge lies in correctly formulating a small, simple task. You need a specialist to correctly configure the data annotation pipeline and quality control. Then you can scale annotation and get large volumes of marked-up data quickly, efficiently, and inexpensively.

Overlap is the key to crowdsourcing and is defined as the number of annotators who should complete each task in a pool. Most commonly, it’s set to three. Toloka assigns confidence to Toloker responses for complex image classification tasks. When confidence drops below a specified level, Toloka increases the overlap value until the confidence reaches the set value or the overlap reaches the predefined maximum.

What does the typical image annotation process look like?

We’ve outlined the standard image annotation process below using an example of image markup via Toloka’s data labeling platform.

Step 1: Decompose the task and classify content

In any image classification task, start with the main question you want to ask. If it’s complex, you may want to break the job down into subtasks. After you’ve defined the question, ask yourself what classes you expect — which can help you define the answers, prepare a task interface, and write instructions for annotators.

Example: If you want to create a dataset with marked-up photos of cars, you can assign three consecutive tasks to three groups of annotators. The first task would be to select all the images showing a car, the second to highlight that required object (or multiple objects) with a polygon, and the third to check that the car is indeed highlighted correctly.

Step 2: Prepare instructions for annotators

The more complete and clear your instructions are for annotators, the better labeling quality you will get. Mix in some control tasks with the real ones. That way, you can compare annotator answers with the answers to the control tasks to get an idea of labeling accuracy.

To improve your instructions, you can run labeling for a small part of your dataset without control tasks first. Read through the results. This will reveal the most common errors and identify cases the instructions don’t cover.

Example: A potential error in the scenario mentioned above is if toy cars are also included in the images. This leads to the question: should toy cars be counted or not?

Step 3: Direct markup

The task interface defines what the job looks like for the annotators, and the logic they follow to process their responses. If it’s simple and clean, they can work faster and on different devices. Adding automatic verifications improves labeling quality.

As another example, read our case study on image classification for self-driving cars.

Types of image labeling

There are several different image labeling approaches, which we’ve outlined below as well as use cases from our Toloka platform.

Image classification

Image classification algorithms receive images as an input and automatically classify them into one of several labels (also known as classes). Machine learning models must learn to recognize such objects in the images themselves. To create a training dataset for image classification, you need to manually review images and annotate them with labels used by the algorithm.

How to do it: Take a look at an image and see what’s being shown – for example, a cat or a dog. Then choose which class the object in the image belongs to.

On Toloka: Match visual content to predefined categories. Use for search relevance, recommendation systems, image moderation, and more.

Image comparison (side-by-side)

Image comparison is used to determine the opinion of a large group of people — what they like, what’s more convenient, and so on. For example, understanding which images in advertising banners people prefer, or which illustration prompts them to click on an article.

How to do it: When comparing two images, choose the one that is more suited to the given context, for example, when choosing which interface design is superior.

On Toloka: Compare two images and find out which one is better. Use for data verification or test user preferences with ads or website designs.

Object detection

An object detection algorithm detects an object in an image and its location in the image frame. The location is marked with different shapes. For example, facial recognition dots are used for models in facial recognition systems. Bounding box — the smallest rectangle that contains the entire object in the image — can be used to define the location of objects.

How to do it: You may be given the task of finding and highlighting an object in an image. For example, traffic lights, pedestrians, or cars, which can be used to train models in self-driving cars. A machine learning engineer selects a shape (rectangle, polygon, or other) that is best suited to the given model. Then the engineer assigns a specific task to you (the annotator) such as “highlight all the traffic lights with rectangles”.

On Toloka: Supports bounding boxes, polygons, or keypoints, as well as image segmentation and tagging based on your own ontology. Use for computer vision applications.

Text recognition from images

Text recognition for images is used for various text recognition systems — think using your camera app to hover over text written in a foreign language and getting a translation.

How to do it: Generally, these tasks include images that contain text (such images are selected in advance). Your task is to read the text and type what is written there.

On Toloka: Identify and transcribe text in images. Train text recognition algorithms or validate and fine-tune the output of your OCR models.

With Toloka, you can control data labeling accuracy to build a predictable pipeline of high-quality training data that impacts your CV algorithms. Our platform supports annotation for image classification, semantic segmentation, object detection and recognition, and instance segmentation. Labeling tools include bounding boxes, polygons, and keypoint annotation.

Moreover, Toloka offers a ready-to-use object detection pipeline to get a human-labeled dataset for your images.

To sum up: what’s next

Image labeling and annotation helps improve computer vision accuracy by enabling object recognition. Annotated data is particularly important when the model is trying to solve a new field or domain. Training AI and machine learning with labels helps these models identify patterns until they can recognize objects on their own. This technology is increasingly being used across industries and is showing no signs of slowing down.

Moreover, Toloka provides a data labeling platform where millions of annotators from all around the world perform tasks posted by AI teams and companies. The platform brings these two audiences together, and its smart technologies transform the crowd into computing power. Toloka provides AI-powered businesses with the tools they need to manage the quality of data labeling and allows them to smoothly build a pipeline that delivers high-quality labeled data for machine learning.

The platform contains various annotation tools, including image labeling tools to train computer vision or image classification models. There, you can collect new data or label your own training data sets with relevant objects specifically for your project. You can make your own labeling instructions and set up the quality assurance process, or ask Toloka's engineers for help.

We invite you to browse through some of our step-by-step instructions and templates for different types of image labeling below:

About Toloka

Toloka is a European company based in Amsterdam, the Netherlands that provides data for Generative AI development. Toloka empowers businesses to build high quality, safe, and responsible AI. We are the trusted data partner for all stages of AI development from training to evaluation. Toloka has over a decade of experience supporting clients with its unique methodology and optimal combination of machine learning technology and human expertise, offering the highest quality and scalability in the market.

Article written by:
Natalie Kudan
Updated: 

Recent articles

Have a data labeling project?

Take advantage of Toloka technologies. Chat with our expert to learn how to get reliable training data for machine learning at any scale.
Fractal