High-Quality Data Labeling at Scale with Toloka

This ICML 2021 workshop by Toloka aims to provide a comprehensive picture of how crowdsourcing can be applied to real life AI production.

Past conference

Overview

AI development today rests on three pillars: algorithms, hardware, and data. Ironically, the further AI moves towards new application areas, the more it depends on human efforts: more and more often data for training and validating AI models cannot be collected in any other way than by humans.

AI solutions require data for training and validating models that are not only high-quality and scalable to support growing industry needs but also flexible enough to support a large variety of use cases and data collection scenarios.

Toloka's mission is to create an environment for AI data production that is fully aligned with industry needs: quality, scalability, flexibility.

As a result, Toloka is a multifaceted solution with:
  • a global pool of 9 million Tolokers with around 200,000 active on the platform every month
  • multiple methods and mechanisms for advanced automated quality control at scale, available for any platform using the Crowd-Kit library for Python
  • instruments for integrating the crowd into the ML production process using the Toloka-Kit library for Python
  • academic research and education initiatives in the field of Crowd Science for ML specialists

The Toloka workshop aims to cover these aspects and provide a comprehensive picture of how crowdsourcing can be applied to real life AI production.

Agenda

The workshop will feature:

Keynotes:
  • Olga Megorskaya, Toloka’s CEO, will give a talk “Evolution of data production paradigm in AI.” Olga will discuss the creation of an environment for AI data production that is fully aligned with industry needs: quality, scalability, flexibility.
  • Omar Alonso, Senior Engineering Manager at Instacart, will give a talk “The Practice of Crowdsourcing.” Omar will discuss the practical considerations for designing and implementing tasks that require the use of humans and machines in combination with the goal of producing high-quality labels.
  • Daria Baidakova, Director of Educational Programs at Toloka will give a talk “Data Annotation at Scale: a Core Expertise of Modern ML.” Daria will provide insights into what it takes to become a Crowd Solutions Architect and touch upon the Toloka research grants program and the Crowd Science initiative.
  • Saiph Savage, Assistant Professor at Northeastern University, and co-director of the Civic Innovation Lab at UNAM, will give a talk “The Future of Work for Performers: Empowering the People behind AI.”

Demo: Automated Pipeline for E-Commerce Item Retrieval and Ranking

Dmitry Ustalov, Vladimir Losev, and Oleg Pavlov will provide a hands-on demonstration of how crowdsourcing can help address an e-commerce item retrieval and ranking task. In particular, they will show the attendees how to build a human-in-the-loop pipeline that combines both crowdsourced data and ML models to obtain a reliable ground-truth dataset on the Toloka platform.

The Toloka team will demonstrate how interdependent data labeling processes can be programmatically combined using the Toloka-Kit Python library, and how the final annotation results can be obtained using the Crowd-Kit computational quality control library.

Speakers

Image
Olga Megorskaya
TolokaCEOProfile link
Image
Omar Alonso
InstacartSr. Engineering Manager, and part-time Faculty at Northeastern UniversityProfile link
Image
Saiph Savage
Northeastern UniversityAssistant Professor, and co-director of the Civic Innovation Lab at UNAMProfile link
Image
Daria Baidakova
TolokaDirector of Educational ProgramsProfile link
Image
Dmitry Ustalov
TolokaAnalyst / Software DeveloperProfile link
Image
Vladimir Losev
TolokaAnalyst / Software DeveloperProfile link
Image
Oleg Pavlov
TolokaCrowd Solutions ArchitectProfile link

Schedule

15:00 - 15:30

Evolution of data production paradigm in AI

15:30 - 16:00

The Practice of Crowdsourcing

16:00 - 16:30

Data Annotation at Scale: a Core Expertise of Modern ML

16:30 - 17:00

The Future of Work for Performers: Empowering the People behind AI

17:00 - 18:00

Automated Pipeline for E-Commerce Item Retrieval and Ranking
Demo by Dmitry Ustalov, Vladimir Losev, Oleg Pavlov

17:00 - 18:00

Q/A Session

Don't miss

Be sure to attend our informative workshops,
tutorials, and webinars.

Fractal