Find success with our helpful guides and step-by-step instructions.
Hands-on tutorials
Launch projects successfully by following our step-by-step instructions. Use our datasets for practice projects, or run projects on your own datasets and adapt the settings to your needs. You can also run the projects programmatically using the Toloka-Kit.
1
Image & Video Data
2
Text Data
3
Audio Data
4
Field Data Collection
Image classification
The challenge: classify a set of photographs according to what animal is shown in the picture – a cat or a dog
There’s a set of real-life photos of cats and dogs. Performers are asked to look at the photos and decide what animal is in the picture.
The challenge: find out which icon people prefer and determine the top icon out of the set
There’s a set of 3 icons in 3 pairs. Performers are asked to look at the icons and choose the one they prefer. The results are then aggregated to obtain the top icon.
The challenge: collect a set of categorized, real-life photos of pets — cats and dogs
Performers are asked to send a picture of their cat or dog (or a random photo if they don’t have a cat or a dog) and select the appropriate label for their picture.
The challenge: collect a set of video recordings of people imitating popular emoji gestures.
There’s a set of emoji combinations, and performers are asked to record videos showing these emojis with their hands. Each video must meet certain quality criteria.
The challenge: collect information about how people manage stress, and if they are prepared to get a meditation app to do so.
There’s a survey where performers answer questions about their stress levels and stress management techniques, meditation practices, and habits relating to paid apps.
The challenge: define which class a search query belongs to and distribute the queries between several categories inside the class.
There’s a list of queries (related to travel and dining), each with an unknown class and category. Performers are asked to first select the search query’s class and then define the category it belongs to within this class.
The challenge: classify audio recordings according to the speaker’s gender
There’s a set of audio recordings from different people. Performers are asked to listen to the recordings and decide whether the speaker is a man or a woman.
There’s a set of texts, and performers are asked to read the texts aloud and record themselves. Recordings like these are used for training voice assistants.
The challenge: collect a set of real-life photos of metro station entrances
Performers are asked to visit a specific metro entrance and take a few pictures of it. These pictures can be used to decide which entrances need cleaning.
Our methodology based on years of research and unique industry expertise can help you successfully tap into the wisdom of the crowd on a large scale. If you want to efficiently use the knowledge of thousands of people to get clean and accurate data for your ML needs, follow our tips for each of these essential steps.
1. Decomposition
Break your task down into steps until each separate level is clear enough for any performer to handle.
2. Instructions
The more comprehensive the instructions, the more accurate the results.
3. Interfaces
A good interface makes it easy for users to perform the same repeated actions quickly and correctly.
4. Quality control
Carefully plan and configure a quality control system to ensure high-quality results.
5. Pricing
Find the optimal price based on speed and quality.
6. Results
After the pool is finished, aggregate the results and check statistics.
Research Benchmarks
Quality control lies at the heart of crowdsourcing. Use our examples as benchmarks to achieve the described levels of quality on popular research datasets.