Practical Crowdsourcing
for Efficient Machine Learning
Our online course at the Conférence universitaire de Suisse occidentale will show you how to take control of your data labeling.

7th June – 25th June 2021
Tuesdays and Thursdays | 10:00 UTC +2

Why master crowdsourcing?
  • Data is essential
    Great amounts of reliable data are essential for creating and training ML-based algorithms and models, both for industrial purposes and scientific research. Accurate data allows to train efficient models and to evaluate their quality. This is the reason why efficient data labeling is a demanded and essential skill for professionals and researchers dealing with ML. 

  • Optimize quality, speed and cost
    The crowdsourcing approach is a popular way to collect and label large datasets with faster turnaround and lower costs compared to using a limited group of experts for data collection and annotation. Our 10-year experience in industry and research shows that building top-quality datasets requires a strict methodology.
Why this course?
3-week online course, lectures twice a week
Taught by
field experts
Receive grants
for research
Mix of lectures
and seminars
Grants for research
Toloka supports the use of crowdsourcing for research purposes. Course organizers will offer research grants of up to $500 for course participants. To apply for the grant, please describe your research projects and your data labeling needs in your application for the course. Grants will be awarded on Week 3 of the course, when participants will start working on their personal projects. Grant recipients should commit to the following: 

  • Any publication that relies on the data collected using the awarded funds should acknowledge that the study was supported by the Toloka research grant.

  • The dataset collected in the experiment should be released publicly in the Toloka repository of datasets within 6 months after data collection ends.
What will I gain?
This course will introduce you to crowdsourcing as a practical methodology and help you master the essential steps and techniques to ensure top-quality data. More importantly, you will be able to practice the crowdsourcing approach straight away by designing and running your own data labeling projects. By the end of this course, you will:
  • Understand the benefits and limitations of the crowdsourcing approach, from training computer vision algorithms to creative copywriting.
  • Integrate an on-demand workforce directly into your business and data processes.
  • Know the state of the current research related to crowdsourcing. 
  • Understand how crowdsourcing can be applied to various research challenges. 
  • Be able to design a pipeline for a data labeling task. 
Crowdsourcing basics
Crowdsourcing for NLP tasks
Crowdsourcing for CV tasks
Research challenges related to data labeling

Application of crowdsourcing to participants’ research needs
Learn from field experts
  • ML engineers, researchers and crowd solution architects from Yandex have pooled their expertise to share with you.
  • Our team has created shorter versions of this course and presented as tutorials and
    a workshop at several leading data analysis conferences: KDD 2019, CVPR 2020,
    WSDM 2020, SIGMOD 2020, NeurIPS 2020; WWW 2021 and NAACL 2021.
  • In addition to working at Yandex, course instructors are engaged in research and teach at prestigious universities and the Yandex School of Data Analysis.
Head of Efficiency
and Growth Division
Analyst / Software Developer
Register today
7th June – 25th June 2021
Tuesdays and Thursdays | 10:00 UTC +2

Tue May 11 2021 18:00:12 GMT+0300 (Moscow Standard Time)