In this tutorial, leading researchers and engineers from Toloka share their unique industry experience in achieving efficient natural language annotation with crowdsourcing.
Overview
In this tutorial, leading researchers and engineers from Toloka share their unique industry experience in achieving efficient natural language annotation with crowdsourcing. We will introduce data labeling via public crowdsourcing marketplaces and present the key components of efficient label collection. Then, in the practice session, participants choose one real language resource production task, experiment with selecting settings for the labeling process, and launch their label collection project on Toloka, one of the world's largest crowdsourcing marketplaces. During the tutorial session, all projects are run on the real Toloka crowd. We also present useful quality control techniques and give the attendees an opportunity to discuss their own annotation ideas.
Topics
Reasons for collecting and labeling data via crowdsourcing for SDC: pros & cons
Key components of crowdsourcing for efficient data labeling