We present a portion of our six-year-long unique industry experience in efficient natural language data annotation via Crowdsourcing.
In this introductory tutorial, we present a portion of our six-year-long unique industry experience in efficient natural language data annotation via Crowdsourcing. We will make an introduction to data labeling via public crowdsourcing marketplaces, and will present the key components of efficient label collection that includes task design and decomposition, quality control, and annotator selection. This will be followed by a practical session, where participants address a real-world language resource production task, experiment with selecting settings for the labeling process, and launch their label collection project on one of the largest crowdsourcing marketplaces. The projects will be run on real crowds within the tutorial session. We will present useful mathematical foundations, quality control techniques and tricks, and provide the attendees with an opportunity to discuss their own annotation ideas.