Don't miss this
Tutorial at NAACL 2021
In this introductory tutorial, we share some of the unique insights we have gained from six years of industry experience in efficient natural language data annotation via crowdsourcing.
Attend online
Events
Workshop at VLDB 2021
Crowd Science: Trust, Ethics, and Excellence in Crowdsourced Data Management at Scale. This workshop focuses on the best practices for efficient and trustworthy crowdsourcing.
Tutorial at NAACL 2021
In this introductory tutorial, we share some of the unique insights we have gained from six years of industry experience in efficient natural language data annotation via crowdsourcing.
Crowd Science Seminar #11
Weighted Majority Vote is a workhorse of crowdsourcing. But how do we assign these weights to different performers? In this talk, we will discuss a NeurIPS 2020 paper which theoretically answers this question!
Joint session at NVIDIA GTC
Olga Megorskaya (Head of Toloka) and Igor Kuralenok (Head of ML Department, Yandex) will have a joint session at NVIDIA GTC. In their talk, they will dive deep into the pipeline of speech recognition, including GPU usage and the data annotation process. Register now to hear from them and other incredible speakers and discover the advanced technologies that are transforming today’s industries.
Tutorial at TheWebConf 2021 (WWW '21)
In our hands-on tutorial, we present a systematic view on using Human-in-the-Loop to obtain scalable offline evaluation processes and, in particular, high-quality relevance judgements. We will introduce the ranking problem to the attendees, discuss the commonly used ranking quality metrics, and then focus on Human-in-the-Loop-based approach to obtain relevance judgements at scale.
Crowd Science Seminar #10
In this talk, we will reduce the skill-estimation problem to a statistical problem of matrix completion. Next, we will establish necessary and sufficient conditions under which efficient skill recovery is possible. Finally, we propose practical algorithms that carefully estimate the skills of the performers given their responses.
Crowd Science
Seminar #9
This talk presents the Social Media Mining for Health Applications (#SMM4H) shared tasks, and the first Russian adverse drug reaction corpus of tweets. We focus on current challenges and the imbalanced nature of datasets.
Crowd Science
Seminar #8
In this talk, we discuss several cases where advanced crowdsourcing, ranging from offline data collection to software testing, helps various business products – from search engines to self-driving cars.
Crowd Science
Seminar #7
In this talk, we discuss practical considerations for designing and implementing tasks that require the use of humans and machines in combination with the goal of producing high-quality labels.
Crowd Science
Seminar #6
We discuss several major challenges researchers face with conducting interactive studies online such as controlling for attention and drop-offs. We report our experience on running behavioral studies on Toloka crowdsourcing platform.
Pie & AI
Suisse Event
Will BERT kill data labeling? In recent years, large-scale unsupervised models like BERT and GPT-3 have made impressive progress with textual data, from text generation to question-answering and more. 
In this talk we discuss how crowdsourcing helps these algorithms to achieve their spectacular performance.
Crowd Science
Seminar #5
This talk presents principled approaches towards designing and evaluating large human decision-making systems. We focus on how to control sources of noisy data, like individual mistakes, dishonesty, and bias.
Crowd Science
Seminar #4
In this talk, we address the general problem of algorithmic and AI bias and identify known solutions, with a targeted discussion of the challenges inherent to crowdsourcing.
Crowd Science
Seminar #3
Top experts from the Toloka team discuss selected papers on crowd science from the NeurIPS conference and share their perspective on crowd research trends.
Crowd Science
Seminar #2
This talk introduces RuBQ, the first Russian-language Knowledge Base Question Answering (KBQA) dataset. Vladislav Korablinov describes the efficient data-collection pipeline designed in his work and shares valuable insights.
Crowd Science
Seminar #1
This talk focuses on an essential function for conversational agents: evaluating humor. We present an evaluation of 30 dialogue jokes to demonstrate that crowdsourcing is a viable option for humor evaluation and examine variation in humor judgments.
Y-DATA
Tel Aviv #14
During this meetup, we discussed how the new generation of methods and tools allows the collection of high-quality human-labeled data on a large scale and why every ML specialist should know how to use crowdsourcing.
Workshop
at MLConfEU 2020
In this tutorial, “Crowdsourcing Practice for Efficient Data Labeling,” leading researchers and engineers share their industry expertise and research insights on key components of crowdsourcing, including efficient aggregation, incremental relabeling, and dynamic pricing.
Workshop
at NeurlPS 2020
Some of the world’s top experts discuss key issues in preparing labeled data for machine learning, with a focus on remoteness, fairness, and mechanisms in the context of crowdsourcing for data collection and labeling.
Tutorial
at SIGMOD/PODS 2020
We explore the practical aspects of how crowdsourcing can be applied to information retrieval. Learn how to create a dataset with relevant products.
Tutorial
at CVPR 2020
We present a data processing pipeline used for training self-driving cars. Gain practical experience launching an annotation project in Toloka.
Paper accepted
to KDD 2020
The paper “Prediction of Hourly Earnings and Completion Time on a Crowdsourcing Platform” was accepted for this year’s Conference on Knowledge Discovery and Data Mining (KDD 2020).
Workshop at VLDB 2021
Crowd Science: Trust, Ethics, and Excellence in Crowdsourced Data Management at Scale. This workshop focuses on the best practices for efficient and trustworthy crowdsourcing.
Tutorial at NAACL 2021
In this introductory tutorial, we share some of the unique insights we have gained from six years of industry experience in efficient natural language data annotation via crowdsourcing.
Joint session at NVIDIA GTC
Olga Megorskaya (Head of Toloka) and Igor Kuralenok (Head of ML Department, Yandex) will have a joint session at NVIDIA GTC. In their talk, they will dive deep into the pipeline of speech recognition, including GPU usage and the data annotation process. Register now to hear from them and other incredible speakers and discover the advanced technologies that are transforming today’s industries.
Tutorial at TheWebConf 2021 (WWW '21)
In our hands-on tutorial, we present a systematic view on using Human-in-the-Loop to obtain scalable offline evaluation processes and, in particular, high-quality relevance judgements. We will introduce the ranking problem to the attendees, discuss the commonly used ranking quality metrics, and then focus on Human-in-the-Loop-based approach to obtain relevance judgements at scale.
Workshop at NeurlPS 2020
Some of the world’s top experts discuss key issues in preparing labeled data for machine learning, with a focus on remoteness, fairness, and mechanisms in the context of crowdsourcing for data collection and labeling.
Workshop at MLConfEU 2020
In this tutorial, “Crowdsourcing Practice for Efficient Data Labeling,” leading researchers and engineers share their industry expertise and research insights on key components of crowdsourcing, including efficient aggregation, incremental relabeling, and dynamic pricing.
Tutorial at SIGMOD/PODS 2020
We explore the practical aspects of how crowdsourcing can be applied to information retrieval. Learn how to create a dataset with relevant products.
Tutorial at CVPR 2020
We present a data processing pipeline used for training self-driving cars. Gain practical experience launching an annotation project in Toloka.
Paper accepted to KDD 2020
The paper “Prediction of Hourly Earnings and Completion Time on a Crowdsourcing Platform” was accepted for this year’s Conference on Knowledge Discovery and Data Mining (KDD 2020).
Crowd Science Seminar #11
Weighted Majority Vote is a workhorse of crowdsourcing. But how do we assign these weights to different performers? In this talk, we will discuss a NeurIPS 2020 paper which theoretically answers this question.
Crowd Science Seminar #10
In this talk, we will reduce the skill-estimation problem to a statistical problem of matrix completion. Next, we will establish necessary and sufficient conditions under which efficient skill recovery is possible. Finally, we propose practical algorithms that carefully estimate the skills of the performers given their responses.
Crowd Science
Seminar #9
This talk presents the Social Media Mining for Health Applications (#SMM4H) shared tasks, and the first Russian adverse drug reaction corpus of tweets. We focus on current challenges and the imbalanced nature of datasets.
Crowd Science
Seminar #8
In this talk, Olga Megorskaya discusses several cases where advanced crowdsourcing, ranging from offline data collection to software testing, helps various business products – from Search engines to Self-Driving cars.
Crowd Science
Seminar #7
In this talk, we discuss practical considerations for designing and implementing tasks that require the use of humans and machines in combination with the goal of producing high-quality labels.
Crowd Science
Seminar #6
We discuss several major challenges researchers face with conducting interactive studies online such as controlling for attention and drop-offs. We report our experience on running behavioral studies on Toloka crowdsourcing platform.
Pie & AI
Suisse Event
Will BERT kill data labeling? - In recent years, large-scale unsupervised models like BERT and GPT-3 have made impressive progress with textual data, from text generation to question-answering and more.
Crowd Science
Seminar #5
This talk presents principled approaches towards designing and evaluating large human decision-making systems used to avoid unexpected results. We focus on how to control sources of noisy data, like individual mistakes, dishonesty, and bias.
Crowd Science
Seminar #4
In this talk, we address the general problem of algorithmic and AI bias and identify known solutions, with a targeted discussion of the challenges inherent to crowdsourcing.
Crowd Science
Seminar #3
Top experts from the Toloka team discuss selected papers on crowd science from the NeurIPS conference and share their perspective on crowd research trends.
Crowd Science
Seminar #2
This talk introduces RuBQ, the first Russian-language Knowledge Base Question Answering (KBQA) dataset. Vladislav Korablinov describes the efficient data-collection pipeline designed in his work and shares valuable insights.
Crowd Science
Seminar #1
This talk focuses on an essential function for conversational agents: evaluating humor. We present an evaluation of 30 dialogue jokes to demonstrate that crowdsourcing is a viable option for humor evaluation and examine variation in humor judgments.
Pie & AI Suisse Event
Will BERT kill data labeling? In recent years, large-scale unsupervised models like BERT and GPT-3 have made impressive progress with textual data, from text generation to question-answering and more.
Y-DATA Tel Aviv #14
During this meetup, we discussed how the new generation of methods and tools allows the collection of high-quality human-labeled data on a large scale and why every ML specialist should know how to use crowdsourcing.
Don't miss
Don't miss our informative workshops, tutorials, and webinars.
Cookie files
Yandex uses cookies to personalize its services. By continuing to use this site, you agree to this cookie usage. You can learn more about cookies and how your data is processed in the Privacy Policy.
Wed May 12 2021 16:54:32 GMT+0300 (Moscow Standard Time)