Conference

Tutorial at CVPR 2020

In this tutorial, leading researchers and engineers share their unique industry experience in efficient data annotation (labeling) for self-driving cars.

Jun 15, 2020, 10:00 UTC

Overview

In this tutorial, leading Yandex researchers and engineers share their unique industry experience in efficient data annotation (labeling) for self-driving cars. We present the data processing pipeline required for the cars to learn how to behave autonomously on the roads, and we also demonstrate the crucial role of data annotation in making the learning process effective. This is followed by an introduction to public crowdsourcing marketplaces and key crowdsourcing techniques for efficient annotation: task decomposition, quality control methods, aggregation, incremental relabeling, and others.

In the practice session, participants choose one real label collection task, experiment with selecting settings for the labeling process, and launch their own labeling project on Toloka, one of the world’s largest crowdsourcing marketplaces. During the tutorial, all projects run on the real Toloka crowd. Participants also receive feedback and practical advice on making their projects more efficient.

Topics

Using crowdsourcing to collect and label data for SDC: pros & cons
Key components of crowdsourcing for efficient data labeling
Decomposition approach
Selecting and training performers
2D and 3D object segmentation demo
Hands-on practice session: object segmentation pipeline
Advanced crowdsourcing techniques: aggregation, incremental relabeling & pricing

Speakers

Alexey Drutsa

TolokaHead of Efficiency & Growth Division

Denis Rogachevsky

Yandex Self-Driving GroupAnalyst

Olga Megorskaya

TolokaCEO

Daria Baidakova

TolokaDirector of Educational Programs

Ivan Semchuk

Yandex Self-Driving GroupAnalyst

Schedule

Introduction

— The concept of crowdsourcing
— Crowdsourcing task examples
— Crowdsourcing platforms
— Yandex crowdsourcing experience

Part I: Crowdsourcing for SDC

— Reasons for crowdsourcing
— The kind of data we collect and label
— Most common tasks and their applications

10:00 - 10:15

Coffee Break

Part II: Main components of data collection via crowdsourcing

— Decomposition for an effective pipeline
— Task instruction & interface: best practices
— Quality control techniques

Part III: Introduction to Toloka for requesters

— Project: creation & configuration
— Pool: creation & configuration
— Tasks: uploading & golden set creation
— Statistics in flight and results downloading

11:00 - 12:00

Lunch Break

Part IV: Data labeling demos for SDC

— Demos of 2D and 3D object segmentation tasks
— Performer training and selection for complex tasks
— Q&A

Part V: Brainstorming the pipeline for object segmentation (practice session)

— Dataset and required labels
— Discussion: how to collect labels?
— Data labeling pipeline for implementation

Part VI: Setting up & running label collection projects (practice session)

Participants:
› create
› configure
› run data labeling projects on real performers in real-time

15:00 - 15:15

Coffee Break

Part VII: Theory on efficient aggregation

— Aggregation models
— Incremental relabeling
— Performance-based pricing