Data collection is a very challenging task: it requires human labor, needs more time, and it is expensive. Crowdsourcing address most of the issues: collect data in a short period of time and it is less costly. However, a great deal of challenges is facing in assuring the quality of the collected data. Unless a proper quality control mechanism is devised, the collected data will be useless, misleading, and more costly. In this talk, I will present the different quality control mechanisms considered for the collection of different NLP datasets using the well-known MTurk crowdsourcing and an "in-house" built social media-based crowdsourcing tool.


Seid Muhie Yimam
University of Hamburg
Postdoctoral researcher at the Language Technology group
Don't miss
Mon Aug 02 2021 12:24:51 GMT+0300 (Moscow Standard Time)