Tutorial at CVPR 2020

In this tutorial, leading researchers and engineers share their unique industry experience in efficient data annotation (labeling) for self-driving cars.



In this tutorial, leading Yandex researchers and engineers share their unique industry experience in efficient data annotation (labeling) for self-driving cars. We present the data processing pipeline required for the cars to learn how to behave autonomously on the roads, and we also demonstrate the crucial role of data annotation in making the learning process effective. This is followed by an introduction to public crowdsourcing marketplaces and key crowdsourcing techniques for efficient annotation: task decomposition, quality control methods, aggregation, incremental relabeling, and others.

In the practice session, participants choose one real label collection task, experiment with selecting settings for the labeling process, and launch their own labeling project on Toloka, one of the world’s largest crowdsourcing marketplaces. During the tutorial, all projects run on the real Toloka crowd. Participants also receive feedback and practical advice on making their projects more efficient.


  • Using crowdsourcing to collect and label data for SDC: pros & cons
  • Key components of crowdsourcing for efficient data labeling
  • Decomposition approach
  • Selecting and training performers
  • 2D and 3D object segmentation demo
  • Hands-on practice session: object segmentation pipeline
  • Advanced crowdsourcing techniques: aggregation, incremental relabeling & pricing


Alexey Drutsa
TolokaHead of Efficiency & Growth Division
Denis Rogachevsky
Yandex Self-Driving GroupAnalyst
Olga Megorskaya
Daria Baidakova
TolokaDirector of Educational Programs
Ivan Semchuk
Yandex Self-Driving GroupAnalyst


— The concept of crowdsourcing
— Crowdsourcing task examples
— Crowdsourcing platforms
— Yandex crowdsourcing experience

— Reasons for crowdsourcing
— The kind of data we collect and label
— Most common tasks and their applications

10:00 - 10:15

Coffee Break

— Decomposition for an effective pipeline
— Task instruction & interface: best practices
— Quality control techniques

— Project: creation & configuration
— Pool: creation & configuration
— Tasks: uploading & golden set creation
— Statistics in flight and results downloading

11:00 - 12:00

Lunch Break

— Demos of 2D and 3D object segmentation tasks
— Performer training and selection for complex tasks
— Q&A

— Dataset and required labels
— Discussion: how to collect labels?
— Data labeling pipeline for implementation

› create
› configure
› run data labeling projects on real performers in real-time

15:00 - 15:15

Coffee Break

— Aggregation models
— Incremental relabeling
— Performance-based pricing

— Project results
— Ideas for further work and research
— References to literature and other tutorials


Part 1: Crowdsourcing for SDC
Part 2: Main components of data collection via crowdsourcing
Part 3: Introduction to Toloka for requesters
Part 4: Data labeling demos for SDC
Part 5: Brainstorming the pipeline for object segmentation
Part 6: Setting up and running label collection projects
Part 7: Theory on efficient aggregation
Part 8: Results and conclusions

Don't miss out

Be the first to hear about our workshops, 
tutorials, and webinars.