Most ML projects require training data, and often this data can only be obtained through human labeling. As new applications of AI emerge, there is ever-growing demand for human-labeled data collected in nontrivial tasks. Large-scale data production requires a technological pipeline that can successfully manage quality control and smart distribution of tasks between performers.
In this tutorial, we introduce you to data labeling via public crowdsourcing marketplaces and present the key techniques for efficiently collecting labeled data. In the practice session, participants choose one real label collection task, experiment with selecting settings for the labeling process, and launch their own labeling project on Toloka, one of the world’s largest crowdsourcing marketplaces. During the tutorial, all projects run on the real Toloka crowd. Participants also receive feedback and practical advice on making their projects more efficient.