How Toloka helped train Yandex autonomous vehicles

Toloka Team
by Toloka Team
Image

Subscribe to Toloka News

Subscribe to Toloka News

About the company

Yandex began developing its autonomous vehicle system in early 2017 and it has grown to 200 self-driving cars, which drove more than 4 million autonomous miles on city streets in Russia, Israel, and the US.

Task

Yandex uses Toloka to label tens of thousands images needed to train neural network to detect surrounding objects in the cities of Russia.

An important task for the creator of a self-driving vehicle is to train it to extract information about its surroundings from the data it receives from sensors. During the ride, the car records everything it sees around it. This data is uploaded to the cloud, where the preliminary analysis is completed, and then it goes to post-processing, which includes labeling the data. The labeled data is sent to the machine learning algorithms, the result is returned to the vehicle, and the cycle repeats, improving the quality of object detection through multiple iterations.

Many different objects are encountered in the city, and all of them need to be labeled. This task requires certain skills and takes a lot of time, and tens of thousands of images are needed to train the neural network. You can use open datasets, but those are created abroad, so the images don't match the reality of Russian roads. You can buy labeled images at a starting price of $4 each, but it's about 10 times cheaper to do the labeling in Toloka.

Because you can embed any interface in Toloka and send data via the API, the developers added their own visual editor, which has layers, transparency, selection, zoom, and classification. This increased the speed and quality of the data labeling by a long way.

In addition, the API allows you to automatically split tasks into simpler ones and then piece the results together. For example, before labeling an image, you can select what objects there are in it. This will make it clear which classes to use for labeling the image.

Image
Example of Yandex task on Toloka

After that, the objects in the image can be classified. For example, you can offer tolokers a selection of images of people, and ask them to specify if they see pedestrians, cyclists, motorcyclists, or someone else.

Image
Example of Yandex task on Toloka

When a toloker has finished labeling an image, it needs checking. Verification tasks are offered to other tolokers for that purpose.

Image
Example of Yandex task on Toloka

In addition to "Tolokers", neural networks can also be used to perform labeling. Some networks have already learned to do this task as well as people do. But the quality of their work also needs to be evaluated. That's why tasks have a mix of images labeled by Tolokers and by a neural network.

This way, Toloka is integrated directly into the training of neural networks and becomes part of the general machine learning pipeline.

Article written by:
Toloka Team
Toloka Team
Updated: 

Recent articles

Have a data labeling project?

Take advantage of Toloka technologies. Chat with our expert to learn how to get reliable training data for machine learning at any scale.
Fractal

More about Toloka

  • Our mission is to empower businesses with high quality data to develop AI products that are safe, responsible and trustworthy.
  • Toloka is a European company. Our global headquarters is located in Amsterdam. In addition to the Netherlands, Toloka has offices in the US, Israel, Switzerland, and Serbia. We provide data for Generative AI development.
  • We are the trusted data partner for all stages of AI development–from training to evaluation. Toloka has over a decade of experience supporting clients with its unique methodology and optimal combination of machine learning technology and human expertise. Toloka offers high quality expert data for training models at scale.
  • The Toloka team has supported clients with high-quality data and exceptional service for over 10 years.
  • Toloka ensures the quality and accuracy of collected data through rigorous quality assurance measures–including multiple checks and verifications–to provide our clients with data that is reliable and accurate. Our unique quality control methodology includes built-in post-verification, dynamic overlaps, cross-validation, and golden sets.
  • Toloka has developed a state-of-the-art technology platform for data labeling and has over 10 years of managing human efforts, ensuring operational excellence at scale. Now, Toloka collaborates with data workers from 100+ countries speaking 40+ languages across 20+ knowledge domains and 120+ subdomains.
  • Toloka provides high-quality data for each stage of large language model (LLM) and generative AI (GenAI) development as a managed service. We offer data for fine-tuning, RLHF, and evaluation. Toloka handles a diverse range of projects and tasks of any data type—text, image, audio, and video—showcasing our versatility and ability to cater to various client needs.
  • Toloka addresses ML training data production needs for companies of various sizes and industries– from big tech giants to startups. Our experts cover over 20 knowledge domains and 120 subdomains, enabling us to serve every industry, including complex fields such as medicine and law. Many successful projects have demonstrated Toloka's expertise in delivering high-quality data to clients. Learn more about the use cases we feature on our customer case studies page.