In this tutorial, leading Yandex researchers and engineers share their unique industry experience in efficient data annotation (labeling) for self-driving cars. We present the data processing pipeline required for the cars to learn how to behave autonomously on the roads, and we also demonstrate the crucial role of data annotation in making the learning process effective. This is followed by an introduction to public crowdsourcing marketplaces and key crowdsourcing techniques for efficient annotation: task decomposition, quality control methods, aggregation, incremental relabeling, and others.
In the practice session, participants choose one real label collection task, experiment with selecting settings for the labeling process, and launch their own labeling project on Toloka, one of the world’s largest crowdsourcing marketplaces. During the tutorial, all projects run on the real Toloka crowd. Participants also receive feedback and practical advice on making their projects more efficient.