In this talk, we present quality control mechanisms considered for collection of NLP datasets using the well-known MTurk crowdsourcing and an in-house built crowdsourcing tool.
Data collection is a very challenging task: it requires human labor, needs more time, and it is expensive. Crowdsourcing address most of the issues: collect data in a short period of time and it is less costly. However, a great deal of challenges is facing in assuring the quality of the collected data. Unless a proper quality control mechanism is devised, the collected data will be useless, misleading, and more costly. In this talk, I will present the different quality control mechanisms considered for the collection of different NLP datasets using the well-known MTurk crowdsourcing and an "in-house" built social media-based crowdsourcing tool.