Hands-on tutorials
Launch projects successfully by following our step-by-step instructions. Use our datasets for practice projects, or run projects on your own datasets and adapt the settings to your needs. You can also run the projects programmatically using the Toloka-Kit.
Image classification
The challenge: classify a set of photographs according to what animal is shown in the picture – a cat or a dog

There’s a set of real-life photos of cats and dogs. Performers are asked to look at the photos and decide what animal is in the picture.

Object detection
The challenge: get a set of contours defined by an array of points that represent road signs in each photo

There’s a set of real-life photos of roads. Performers are asked to outline every traffic sign in each photo.

The challenge: find out which icon people prefer and determine the top icon out of the set

There’s a set of 3 icons in 3 pairs. Performers are asked to look at the icons and choose the one they prefer. The results are then aggregated to obtain the top icon.

Image collection
The challenge: collect a set of categorized, real-life photos of pets — cats and dogs

Performers are asked to send a picture of their cat or dog (or a random photo if they don’t have a cat or a dog) and select the appropriate label for their picture.

Video collection
The challenge: collect a set of video recordings of people imitating popular emoji gestures.

There’s a set of emoji combinations, and performers are asked to record videos showing these emojis with their hands. Each video must meet certain quality criteria.

Text classification
The challenge: classify news article headlines according to whether they are clickbait or not

There’s a set of news article headlines. Performers are asked to read a headline and decide whether it’s clickbait.

Sentiment analysis
The challenge: classify reviews by putting them into two categories – “Positive” or “Negative”

There’s a set of customer reviews, and performers are asked to read a review and decide whether it’s positive or negative.

The challenge: collect information about how people manage stress, and if they are prepared to get a meditation app to do so.

There’s a survey where performers answer questions about their stress levels and stress management techniques, meditation practices, and habits relating to paid apps.

Intent classification
The challenge: define which class a search query belongs to and distribute the queries between several categories inside the class.

There’s a list of queries (related to travel and dining), each with an unknown class and category. Performers are asked to first select the search query’s class and then define the category it belongs to within this class.

Text recognition
The challenge: obtain water meter readings from images.

There’s a set of water meter images. Performers are asked to look at the images and write down the digits on each water meter.

Search relevance evaluation
The challenge: determine the extent to which each query is relevant to the corresponding product on the website.

There’s a a set of search queries and products on a website. Performers are asked to rate the relevance level.

Audio classification
The challenge: classify audio recordings according to the speaker’s gender

There’s a set of audio recordings from different people. Performers are asked to listen to the recordings and decide whether the speaker is a man or a woman.

Audio collection
The challenge: collect audio recordings of texts

There’s a set of texts, and performers are asked to read the texts aloud and record themselves. Recordings like these are used for training voice assistants.

Audio transcription
The challenge: obtain a transcription of audio recordings.

There's a set of audio recordings. We ask performers to listen to the recordings and type what they hear.

Field task
The challenge: collect a set of real-life photos of metro station entrances

Performers are asked to visit a specific metro entrance and take a few pictures of it. These pictures can be used to decide which entrances need cleaning.

Keys to clean and accurate training data
Our methodology based on years of research and unique industry expertise can help you successfully tap into the wisdom of the crowd on a large scale. If you want to efficiently use the knowledge of thousands of people to get clean and accurate data for your ML needs, follow our tips for each of these essential steps.
1. Decomposition
Break your task down into steps until each separate level is clear enough for any performer to handle.

2. Instructions
The more comprehensive the instructions, the more accurate the results. 

3. Interfaces
A good interface makes it easy for users to perform the same repeated actions quickly and correctly. 

4. Quality control
Carefully plan and configure a quality control system to ensure high-quality results. 
5. Pricing
Find the optimal price based on speed and quality.

6. Results
After the pool is finished, aggregate the results and check statistics.

Research Benchmarks

Quality control lies at the heart of crowdsourcing. Use our examples as benchmarks to achieve
the described levels of quality on popular research datasets.
Useful resources 
Integrate on-demand global crowdforce & build fully automated ML pipelines.
Python library
We have an open-sourced library with a client that covers all API functionalities.
Public datasets
Use our datasets for your projects or collect your own data that meets your needs.
Get started now
Take advantage of Toloka technologies, including millions of performers available for your projects 24/7.
Tue Apr 05 2022 16:40:03 GMT+0300 (Moscow Standard Time)