Crowd science
Our methodology based on five years of research and unique industry expertise
can help you successfully tap into the wisdom of the crowd on a large scale.
Keys to clean and accurate training data
Data quality depends on a strong business process more than on performers' individual expertise
The crowdsourcing approach is a popular way to collect and label large datasets for training, tuning, and evaluating machine learning algorithms with faster turnaround and lower costs compared to using a limited group of experts for data collection and annotation.

Machine learning models can only be as good as the data they are trained on. No matter how robust your models are, you won’t get reliable results if your data is inaccurate or irrelevant. If you want to efficiently use the knowledge of thousands of people to get clean and accurate data for your ML needs, follow our tips for each of these essential steps.
Decomposition
Break your task down into steps until each separate level is clear enough for any performer to handle. 
Instructions
The more comprehensive the instructions, the more accurate
the results. 
Interface
A good interface makes it easy for users to perform the same  repeated actions quickly and correctly. 

Quality control
Carefully plan and configure a quality control system to ensure high-quality results. 
Pricing
Find the optimal price based
on speed and quality.

Results
After the pool is finished, aggregate
the results and check statistics.

Tools that support quality
The task pool has additional settings like dynamic overlap and dynamic pricing, which you can use to obtain better quality without overspending your budget. Learn more
Don't miss this
Workshop at NeurlPS 2020
We will gather the world's best experts to discuss the key issues of preparing labelled data for machine learning. We will focus on remoteness, fairness, and mechanisms in the context of crowdsourcing for data collection and labelling.
Tutorial at SIGMOD/PODS 2020
We explore the practical aspects of how crowdsourcing can be applied to information retrieval. Learn how to create a dataset with relevant products.
Tutorial at CVPR 2020
We present a data processing pipeline used for training self-driving cars. Gain practical experience launching an annotation project in Yandex.Toloka.
Paper accepted to KDD 2020
The paper "Prediction of Hourly Earnings and Completion Time on a Crowdsourcing Platform" was accepted for this year's Conference on Knowledge Discovery and Data Mining (KDD 2020).
Webinars
Dmitry Brazhenko, data analyst from Yandex, shared his experience in working with Side-by-Side (SBS) comparisons of images, audio and video, as well as application interfaces. He also covered classification for UX testing. In the second part of the meetup, Dmitry configured and launched an SBS project in real time and answered participants' questions along the way.
Application for corporate training
We are offering corporate training to help you solve existing challenges and develop an internal team of Crowd Science Architects (CSA). 
Public datasets
Yandex.Toloka has accumulated large sets of quality data that can be applied to a variety of machine learning challenges. We share these datasets publicly as a way to support academic research and encourage innovation — watch for new additions on the Public datasets page
Yandex.Toloka News
Receive information about platform updates, partners, training materials, and other news.
Файлы cookies
Для персонализации сервисов Яндекс использует файлы cookies. Продолжая использование сайта, вы соглашаетесь с этим. Подробности о файлах cookies и об обработке ваших данных в Политике конфиденциальности.
Tue Nov 24 2020 10:23:02 GMT+0300 (Moscow Standard Time)