Overview

In this tutorial, we introduce data labeling via public crowdsourcing marketplaces and present some key techniques for efficiently collecting labeled data, including aggregation, incremental relabeling, and dynamic pricing.


This is followed by a practice session, where participants choose one real label collection task, experiment with selecting settings for the labeling process, and launch their own label collection project on one of the largest crowdsourcing marketplaces. During the tutorial, all projects are run on the real Toloka crowd. While we are waiting for the crowd performers to annotate participants’ projects, we present the major theoretical results in efficient aggregation, incremental relabeling, and dynamic pricing. We also discuss the strengths and weaknesses of crowdsourcing, as well as applicability to real-world tasks, summarizing our five years of research and industrial expertise in crowdsourcing. All participants receive feedback on their projects and practical advice.

Speakers

Alexey Drutsa
Toloka
Head of Efficiency & Growth Division
Valentina Fedorova
Toloka
Analyst
Dmitry Ustalov
Toloka
Analyst / Software Developer
Olga Megorskaya
Toloka
CEO
Evfrosiniya Zerminova
Toloka
Technical Product Manager
Daria Baidakova
Toloka
Education & Customer Success Team Lead

Schedule

08:00 - 08:15
Part 0: Introduction
— The concept of crowdsourcing
— Crowdsourcing task examples
— Crowdsourcing platforms
— Yandex crowdsourcing experience
08:15 - 08:35
Part I: Main components
of data collection via crowdsourcing
— Decomposition for an effective pipeline
— Task instruction & interface: best practices
— Quality control techniques
08:35 - 08:45
Part II: Introduction to Toloka for requesters
— Project: creation & configuration
— Pool: creation & configuration
— Tasks: uploading & golden set creation
— Statistics in flight and results downloading
08:45 - 09:00
Part III: Brainstorming the pipeline
— Dataset and required labels
— Discussion: how to collect labels?
— Data labeling pipeline for implementation
09:00 - 10:00
Part IV: Practical Session
Participants:
— create
— configure
— run data labeling projects on real performers in real-time
10:00 - 10:30
Part V: Theory on efficient aggregation
— Aggregation models
— Incremental relabeling
— Dynamic pricing
10:30 - 11:00
Break
11:00 - 11:20
Part VI: Practical Session
— Completing the label collection process
11:20 - 11:30
Part VII: Discussion of results and conclusions
— Project results
— Ideas for further work and research
— References to literature and other tutorials
Don't miss
Don't miss our informative workshops, tutorials, and webinars.
Wed Apr 28 2021 16:41:41 GMT+0300 (Moscow Standard Time)