Get to the heart of crowdsourcing

Decomposition means to break a task down into parts by replacing one large problem with a series of smaller, separate problems that are easier to solve and can be completed by different performers. Decomposition lies at the core of the crowdsourcing concept. It is the first step to take when planning a new crowdsourcing project.

Benefits of decomposition

  • Lowers the entry threshold for performers: the easier the task, the more people will complete it correctly and quickly
  • Reduces the number of errors
  • Makes it easier to review completed tasks
  • Saves money: you spend less on the project overall

How is this possible?

It may seem counterintuitive, but splitting one task into several smaller ones helps lower the project budget. This happens because smaller tasks are easier to complete without mistakes, so we don’t need to re-evaluate them as often as the complex ones.

Cost
Tasks for revaluation
Active users
One complex task
100%
15-25%
6000
Several smaller tasks
70%
10-15%
11000

How do I know that a task needs to be decomposed?

There is one fairly easy rule. If your task offers a choice of 3-5 answers and the instructions fit on one page without scrolling, then most likely your task doesn’t need to be decomposed. In all other cases, you should probably try to break down the task. 
You can also discover when you need to decompose by running short experiments. If your task is taken up very slowly or all the performers are filtered out due to low skill levels, but you don’t see any problems with your control tasks, you can assume that the task is too complex.

Ways to decompose a task

Decomposing a complex challenge
If your task is aimed at getting an answer to a complex question, try dividing it into a series of simple ones that are easy to answer and independent of each other. For example, instead of asking whether a tech support specialist gave a “good” or “bad” answer, ask if the response was detailed, friendly and grammatically correct.
Decomposing a multi-task
If your task involves answering a series of questions at once, try asking them successively, one question at a time. If you have a set of pictures and you need to outline traffic lights on them, first ask if the picture contains a traffic light and then (if yes) ask to outline it. Best practice here is to use two different projects for collecting data.
Decomposing a multitude of options
Sometimes there’s only one question that needs to be answered, but there are too many possible answers and it’s difficult to remember all the rules about them at once. If there are more than 10 options to choose from, we recommend grouping them thematically, so that a performer first makes a general choice of theme and then chooses within a smaller variety of answers. Best practice here is to support the successive classification with a clean interface that displays only the necessary options.
Decomposing a crowdsourcing project itself
Collecting crowd data involves more than just setting up a task for performers. You also need to set up control mechanisms to maintain good quality. If the best control method is human evaluation, try adding a post-verification project where performers will check tasks completed by other performers.
Real-life crowdsourcing projects normally demand a combination of various decomposition techniques.
Here are some examples:

Let’s say your task is to regularly update information about local businesses in order to keep an up-to-date list. You give performers an offline task to find a particular business, check the address and opening hours, and provide a photo. After the task is complete, you find out that some answers are only partly correct. Some performers didn’t provide a quality photo, while others got the opening times wrong. How can you clean up this data? Do you need to pay the performers who were only partly correct? Where do you get an extra budget to re-label the objects with missing data? This task can be decomposed by splitting it into three independent projects where one simple piece of information is collected, and performers don’t get confused with multi-tasking:

  • An entrance photo
  • Address
  • Opening hours

This allows you to use simple quality control mechanisms, choose performers who are better at each individual task, and save money on relabeling incorrect data.

Let’s say you want to improve a computer vision algorithm that is used for self-driving cars. You give performers a set of photos of streets and ask them to outline every traffic light with a bounding box. After the labeling is finished, you receive noisy labels. Why? Because the task didn’t make allowances for details such as photos without any traffic lights and you didn’t consider how to check the quality of results provided by thousands of anonymous performers.

You can decompose this project and try again with a pipeline like this:

  1. Check whether the image contains traffic lights or not (classification)
  2. Outline each traffic light with a bounding box (object segmentation)
  3. Check whether traffic lights are outlined correctly (classification)

To ensure quality, the second and third projects are completed by different performers. The third project checks whether the results of the second project are correct, and performers are only rewarded for correct results.
Let’s imagine you’re managing an online marketing team that constantly needs to generate small ads for new campaigns. You give performers a task to write short ads. Even with a trained group of performers, collected ads may be of poor quality or just contain minor mistakes, like irrelevant content or typos. It’s impossible to check quality on the go with methods based on horizontal comparison like majority vote, because each text is unique. It’s also expensive to re-write defective ads.


Projects like this can be decomposed as follows:

  1. Ask performers to write an ad based on standard introductory information.
  2. Run some basic validation checks: number of characters, use of forbidden words, capitalization, etc.
  3. Ask another group of performers to verify the ads by using a list of simple criteria: informativeness, format compliance, grammar.
  4. If an ad features only minor mistakes that can easily be corrected, ask another team of performers for the corrections and verify them once more as in step three. If an ad is just bad, send the task to step one.

Tasks for generating and correcting ads should only be rewarded after verification, and only if they were completed correctly.

Decomposition and…

When we say that decomposition is the key, we mean it. Here's how decomposition is connected to other components of crowdsourcing:

  • Instructions
    A well-decomposed task is easily explained using simple instructions. To learn more about clear instructions, see our special page.
  • Interface
    Each single step of a decomposed task should be supported by the task interface: clean and simple, with no unnecessary elements. To learn more about transparent task interfaces, see our “Interfaces” section.
  • Quality control
    A set of simple tasks is easy to check with basic quality control methods, such as majority vote or golden sets. Learn more about establishing quality control.
  • Pricing
    The simpler a task is, the quicker it can be submitted, and the cheaper it is. Here’s a section on pricing principles and mechanisms.
Toloka News
Receive information about platform updates, training materials, education events and other news.
Wed Jul 07 2021 15:10:09 GMT+0300 (Moscow Standard Time)