In our tutorial, we will share more than six years of our crowdsourcing experience and bridge the gap between human-in-the-loop and information retrieval communities by showing how one can incorporate human-in-the-loop into their retrieval system to gather the real human feedback on the model predictions. Most of the tutorial time is devoted to a hands-on practice, when the attendees will, under our guidance, implement an end-to-end process for information retrieval from problem statement and data labeling to machine learning model training and evaluation.
Our learning objectives are:* The time is indicated in Irish Standard Time (IST, UTC+01:00)
Introduction
Crowdsourcing essentials
Hands-on practice session
Learning from crowds
Conclusion
In this part, we will discuss quality control techniques. We will talk about the approaches that are applicable before task performance (selection of annotators, training of annotators, and exam tasks), the ones applicable during task performance (golden tasks, motivation of annotators, tricks to remove bots and cheaters), and approaches applicable after task performance (post verification/acceptance, inter-annotator agreement). We will share best practices, including critical aspects and pitfalls when designing instructions and interfaces for annotators, vital settings in different types of templates, training and examination for annotators selection, pipelines for evaluating the labeling process.
We will conduct a hands-on practice session, which is the vital and the longest part of our tutorial. We will encourage the attendees to apply the techniques and best practices learned during the first part of the tutorial. For this purpose, we will let the attendees run their own crowdsourcing project for intent classification for the conversational agents on real crowd annotators. As the input for the crowdsourcing project the attendees will have search queries from the Web. The output of the project will be intent classes for each search query. We will use the CLINC150 dataset for the practice.
In this part, we will describe how to process the raw labels obtained from the crowdsourcing marketplace and transform them into knowledge suitable for a downstream human-in-the-loop application. Since in crowdsourcing every object has multiple labels provided by different annotators, we will consider a few popular answer aggregation models in crowdsourcing, including methods for aggregating categorical responses (Dawid-Skene, GLAD, etc.) and recent methods for deep learning from crowds (CrowdLayer, CoNAL, etc.).
We will present Crowd-Kit, an open-source Python library implementing all these methods. We will put a special attention to the Crowd-Kit Learning module for training a complete deep learning model from crowdsourced data without an extra aggregation step.