2D objects detection
3D objects detection
Moving object tracking


In this tutorial, leading Yandex researchers and engineers share their unique industry experience in efficient data annotation (labeling) for self-driving cars. We present the data processing pipeline required for the cars to learn how to behave autonomously on the roads, and we also demonstrate the crucial role of data annotation in making the learning process effective. This is followed by an introduction to public crowdsourcing marketplaces and key crowdsourcing techniques for efficient annotation: task decomposition, quality control methods, aggregation, incremental relabeling, and others.

In the practice session, participants choose one real label collection task, experiment with selecting settings for the labeling process, and launch their own labeling project on Toloka, one of the world’s largest crowdsourcing marketplaces. During the tutorial, all projects run on the real Toloka crowd. Participants also receive feedback and practical advice on making their projects more efficient.

  • Using crowdsourcing to collect and label data for SDC: pros & cons
  • Key components of crowdsourcing for efficient data labeling
  • Decomposition approach
  • Selecting and training performers
  • 2D and 3D object segmentation demo
  • Hands-on practice session: object segmentation pipeline
  • Advanced crowdsourcing techniques: aggregation, incremental relabeling & pricing


Alexey Drutsa
Head of Efficiency & Growth Division
Denis Rogachevsky
Yandex Self-Driving Group
Olga Megorskaya
Daria Baidakova
Education & Customer Success Team Lead
Ivan Semchuk
Yandex Self-Driving Group


09:00 - 09:30
Part 0: Introduction
— The concept of crowdsourcing
— Crowdsourcing task examples
— Crowdsourcing platforms
— Yandex crowdsourcing experience
09:30 - 10:00
Part I: Crowdsourcing for SDC
— Reasons for crowdsourcing
— The kind of data we collect and label
— Most common tasks and their applications
10:00 - 10:15
Coffee Break
10:15 - 10:50
Part II: Main components
of data collection via crowdsourcing
— Decomposition for an effective pipeline
— Task instruction & interface: best practices
— Quality control techniques
10:50 - 11:00
Part III: Introduction to Toloka for requesters
— Project: creation & configuration
— Pool: creation & configuration
— Tasks: uploading & golden set creation
— Statistics in flight and results downloading
11:00 - 12:00
Lunch Break
12:00 - 12:40
Part IV: Data labeling demos for SDC
— Demos of 2D and 3D object segmentation tasks
— Performer training and selection for complex tasks
— Q&A
12:40 - 13:00
Part V: Brainstorming the pipeline
for object segmentation
(practice session)
— Dataset and required labels
— Discussion: how to collect labels?
— Data labeling pipeline for implementation
13:00 - 15:00
Part VI: Setting up & running
label collection projects
(practice session)
› create
› configure
› run data labeling projects on real performers in real-time
15:00 - 15:15
Coffee Break
15:00 - 16:15
Part VII: Theory on efficient aggregation
— Aggregation models
— Incremental relabeling
— Performance-based pricing
16:15 - 16:30
Part VIII: Discussion of results and conclusions
— Project results
— Ideas for further work and research
— References to literature and other tutorials
Don't miss
Don't miss our informative workshops, tutorials, and webinars.
Mon Aug 02 2021 12:28:17 GMT+0300 (Moscow Standard Time)