
ICWE 2022: web engineering with human-in-the-loop

In this tutorial we present a portion of our six-year experience in solving real-world tasks that combine efforts made by humans and machines.



Modern Web applications employ sophisticated Machine Learning models to rank news, posts, products, and other items presented to the users or contributed by them. To keep these models useful, one has to constantly train, evaluate, and monitor these models using freshly annotated data, which can be done using crowdsourcing.

In this tutorial we present a portion of our six-year experience in solving real-world tasks with human-in-the-loop pipelines that combine efforts made by humans and machines. We introduce data labeling via public crowdsourcing marketplaces and present the critical components of efficient data labeling. Then, we run a practical session, where participants address a challenging real-world Information Retrieval for e-Commerce task, experiment with selecting settings for the labeling process, and launch their label collection project on real crowds within the tutorial session. We present useful quality control techniques and provide the attendees with an opportunity to discuss their annotation ideas. Methods and techniques described in this tutorial can be applied to any crowdsourced data and are not bound to any specific crowdsourcing platform.


09:00 - 09:45

The role of Human-in-the-Loop in building Search engines

09:45 - 10:30

Ranking and Quality Metrics

10:30 - 11:00


11:00 - 11:45

Human-in-the-Loop Essentials

11:45 - 12:30

Hands-On Practice Session

12:30 - 14:00

Lunch Break

14:00 - 14:45

Results aggregation and implementation into the ML pipeline

14:45 - 15:30

Metric computation

15:30 - 16:00


16:00 - 17:30

Results discussion and Conclusion


Alexey Drutsa
TolokaDeputy CEO, COOProfile link
Dmitry Ustalov
TolokaHead of ResearchProfile link
Nikita Pavlichenko
TolokaMachine Learning ResearcherProfile link
Daria Baidakova
TolokaDirector of Educational ProgramsProfile link
Boris Tseytlin
TolokaMachine Learning ResearcherProfile link


Part I: The role of Human-in-the-Loop in building Search engines
Part II: Ranking and Quality Metrics
Part III: Human-in-the-Loop Essentials
Part IV: Hands-On Practice Session
Part V: Results aggregation and implementation into the ML pipeline


Hands-On Practice Session: tutorial guide
Hands-On Practice Session: instructions for annotators
Hands-On Practice Session: interface code
Hands-On Practice Session: data input code
Hands-On Practice Session: training dataset
Hands-On Practice Session: main dataset
Metric computation: practice collab
Metric computation: example crowd-labeled dataset
Metric computation: website screenshots from dataset

Don't miss out

Be the first to hear about our workshops, 
tutorials, and webinars.