Conference

ECIR 2023

Crowdsourcing for Information Retrieval

Image
Image
Image
Image

Crowdsourcing for Information Retrieval

In our tutorial, we will share more than six years of our crowdsourcing experience and bridge the gap between human-in-the-loop and information retrieval communities by showing how one can incorporate human-in-the-loop into their retrieval system to gather the real human feedback on the model predictions. Most of the tutorial time is devoted to a hands-on practice, when the attendees will, under our guidance, implement an end-to-end process for information retrieval from problem statement and data labeling to machine learning model training and evaluation.

Our learning objectives are:
  • to teach how to apply human-in-the-loop pipelines with crowdsourcing to address information retrieval problems
  • to improve the attendees data labeling and machine learning skills on a real problem related to information retrieval, namely, a query intent classification task, by annotating data and training a machine learning model
  • to introduce the mathematical methods and their open-source implementations to increase the annotation quality and the accuracy of the learned model without additional labeling

Agenda

* The time is indicated in Irish Standard Time (IST, UTC+01:00)

09:00 - 09:15

Introduction

09:15 - 10:00

Crowdsourcing essentials

10:00 - 11:00

Hands-on practice session

11:00 - 11:45

Learning from crowds

11:45 - 12:00

Conclusion

Crowdsourcing essentials

In this part, we will discuss quality control techniques. We will talk about the approaches that are applicable before task performance (selection of annotators, training of annotators, and exam tasks), the ones applicable during task performance (golden tasks, motivation of annotators, tricks to remove bots and cheaters), and approaches applicable after task performance (post verification/acceptance, inter-annotator agreement). We will share best practices, including critical aspects and pitfalls when designing instructions and interfaces for annotators, vital settings in different types of templates, training and examination for annotators selection, pipelines for evaluating the labeling process.

Image
Alisa Smirnova
TolokaHead of ResearchProfile link

Hands-on practice session

We will conduct a hands-on practice session, which is the vital and the longest part of our tutorial. We will encourage the attendees to apply the techniques and best practices learned during the first part of the tutorial. For this purpose, we will let the attendees run their own crowdsourcing project for intent classification for the conversational agents on real crowd annotators. As the input for the crowdsourcing project the attendees will have search queries from the Web. The output of the project will be intent classes for each search query. We will use the CLINC150 dataset for the practice.

Image
Daniil Likhobaba
TolokaResearcherProfile link

Learning from crowds

In this part, we will describe how to process the raw labels obtained from the crowdsourcing marketplace and transform them into knowledge suitable for a downstream human-in-the-loop application. Since in crowdsourcing every object has multiple labels provided by different annotators, we will consider a few popular answer aggregation models in crowdsourcing, including methods for aggregating categorical responses (Dawid-Skene, GLAD, etc.) and recent methods for deep learning from crowds (CrowdLayer, CoNAL, etc.).

We will present Crowd-Kit, an open-source Python library implementing all these methods. We will put a special attention to the Crowd-Kit Learning module for training a complete deep learning model from crowdsourced data without an extra aggregation step.

Image
Dmitry Ustalov
TolokaHead of Ecosystem DevelopmentProfile link

Materials & slides

Introduction
Crowdsourcing essentials
Hands-on practice session
Hands-on practice manual
Learning from crowds
Learning from crowds: Code examples
(

Don't miss out

Be the first to hear about our workshops, 
tutorials, and webinars.
Fractal