Products

Resources

Impact on AI

Company

ECIR 2023

Conference

ECIR 2023

Crowdsourcing for Information Retrieval

Apr 6, 2023

10:00 GMT+2

Daniil Likhobaba
Dmitry Ustalov
ECIR 2023

Conference

ECIR 2023

Crowdsourcing for Information Retrieval

Apr 6, 2023

10:00 GMT+2

Daniil Likhobaba
Dmitry Ustalov
ECIR 2023

Conference

ECIR 2023

Crowdsourcing for Information Retrieval

Apr 6, 2023

10:00 GMT+2

Daniil Likhobaba
Dmitry Ustalov
ECIR 2023

Conference

ECIR 2023

Crowdsourcing for Information Retrieval

Apr 6, 2023

10:00 GMT+2

Daniil Likhobaba
Dmitry Ustalov

ECIR 2023

Where:

Online

Date:

Apr 6, 2023

10:00 GMT+2

ECIR 2023

Where:

Online

Date:

Apr 6, 2023

10:00 GMT+2

ECIR 2023

Where:

Online

Date:

Apr 6, 2023

10:00 GMT+2

ECIR 2023

Where:

Online

Date:

Apr 6, 2023

10:00 GMT+2

Crowdsourcing for Information Retrieval

In our tutorial, we will share more than six years of our crowdsourcing experience and bridge the gap between human-in-the-loop and information retrieval communities by showing how one can incorporate human-in-the-loop into their retrieval system to gather the real human feedback on the model predictions. Most of the tutorial time is devoted to a hands-on practice, when the attendees will, under our guidance, implement an end-to-end process for information retrieval from problem statement and data labeling to machine learning model training and evaluation.

Our learning objectives are:

  • to teach how to apply human-in-the-loop pipelines with crowdsourcing to address information retrieval problems

  • to improve the attendees data labeling and machine learning skills on a real problem related to information retrieval, namely, a query intent classification task, by annotating data and training a machine learning model

  • to introduce the mathematical methods and their open-source implementations to increase the annotation quality and the accuracy of the learned model without additional labeling


Agenda

* The time is indicated in Irish Standard Time (IST, UTC+01:00)

Crowdsourcing essentials

In this part, we will discuss quality control techniques. We will talk about the approaches that are applicable before task performance (selection of annotators, training of annotators, and exam tasks), the ones applicable during task performance (golden tasks, motivation of annotators, tricks to remove bots and cheaters), and approaches applicable after task performance (post verification/acceptance, inter-annotator agreement). We will share best practices, including critical aspects and pitfalls when designing instructions and interfaces for annotators, vital settings in different types of templates, training and examination for annotators selection, pipelines for evaluating the labeling process.

Hands-on practice session

We will conduct a hands-on practice session, which is the vital and the longest part of our tutorial. We will encourage the attendees to apply the techniques and best practices learned during the first part of the tutorial. For this purpose, we will let the attendees run their own crowdsourcing project for intent classification for the conversational agents on real crowd annotators. As the input for the crowdsourcing project the attendees will have search queries from the Web. The output of the project will be intent classes for each search query. We will use the CLINC150 dataset for the practice.

Learning from crowds

In this part, we will describe how to process the raw labels obtained from the crowdsourcing marketplace and transform them into knowledge suitable for a downstream human-in-the-loop application. Since in crowdsourcing every object has multiple labels provided by different annotators, we will consider a few popular answer aggregation models in crowdsourcing, including methods for aggregating categorical responses (Dawid-Skene, GLAD, etc.) and recent methods for deep learning from crowds (CrowdLayer, CoNAL, etc.).

We will present Crowd-Kit, an open-source Python library implementing all these methods. We will put a special attention to the Crowd-Kit Learning module for training a complete deep learning model from crowdsourced data without an extra aggregation step.

Materials & slides

Introduction

Crowdsourcing essentials

Hands-on practice session

Hands-on practice manual

Learning from crowds

Learning from crowds: Code examples

Toloka representatives

Alisa Smirnova

Toloka AI

Profile link

Daniil Likhobaba
Daniil Likhobaba
Daniil Likhobaba

Researcher

Profile link

Dmitry Ustalov
Dmitry Ustalov
Dmitry Ustalov

Head of Ecosystem Development

Profile link

Don't miss a thing!

Get all the latest on our webinars, meetups, and other events.

Subscribe

Don't miss a thing!

Get all the latest on our webinars, meetups, and other events.

Subscribe

Don't miss a thing!

Get all the latest on our webinars, meetups, and other events.

Subscribe

Don't miss a thing!

Get all the latest on our webinars, meetups, and other events.

Subscribe