Conference

Tutorial at SIGMOD 2020

In this tutorial, we present some key techniques for efficiently collecting labeled data, including aggregation, incremental relabeling, and dynamic pricing.

Jun 14, 2020, 10:00 UTC

+3

Overview

In this tutorial, we introduce data labeling via public crowdsourcing marketplaces and present some key techniques for efficiently collecting labeled data , including aggregation, incremental relabeling, and dynamic pricing.

This is followed by a practice session, where participants choose one real label collection task, experiment with selecting settings for the labeling process, and launch their own label collection project on one of the largest crowdsourcing marketplaces. During the tutorial, all projects are run on the real Toloka crowd. While we are waiting for the crowd performers to annotate participants’ projects, we present the major theoretical results in efficient aggregation, incremental relabeling, and dynamic pricing. We also discuss the strengths and weaknesses of crowdsourcing, as well as applicability to real-world tasks, summarizing our five years of research and industrial expertise in crowdsourcing. All participants receive feedback on their projects and practical advice.

Speakers

Alexey Drutsa

TolokaHead of Efficiency & Growth Division

Valentina Fedorova

TolokaAnalyst

Olga Megorskaya

TolokaCEO

Evfrosiniya Zerminova

TolokaTechnical Product Manager

Dmitry Ustalov

TolokaHead of Research

Daria Baidakova

TolokaDirector of Educational Programs

Schedule

Part 0: Introduction

— The concept of crowdsourcing
— Crowdsourcing task examples
— Crowdsourcing platforms
— Yandex crowdsourcing experience

Part I: Main components of data collection via crowdsourcing

— Decomposition for an effective pipeline
— Task instruction & interface: best practices
— Quality control techniques

Part II: Introduction to Toloka for requesters

— How Toloka works
— Types of tasks in Toloka
— Creating a project in Toloka

Part III: Brainstorming the pipeline

— Dataset and required labels
— Discussion: how to collect labels?
— Data labeling pipeline for implementation

Part IV: Practical Session

Participants:
— create
— configure
— run data labeling projects on real performers in real-time

Part V: Theory on efficient aggregation

— Aggregation models
— Incremental relabeling
— Dynamic pricing

10:30 - 11:00

Break

Part VI: Practical Session

— Completing the label collection process

Part VII: Discussion of results and conclusions

— Project results
— Ideas for further work and research
— References to literature and other tutorials

Slides

Introduction

Part 1: Main components of data collection

Part 2: Toloka requester interface

Part 3: Brainstorming the pipeline

Part 4: Practical Session

Part 5: Theory on efficient aggregation

Part 6: Practical Session part 2

Part 7: Results and conclusions

(

Don't miss out

Be the first to hear about our workshops,
tutorials, and webinars.

Fractal