Control quality
If anyone can become a performer, how do you achieve adequate labeling quality?
Carefully plan and configure a quality control system to ensure high-quality results.
Get quality results
  • Train performers

    Ask performers to take training tasks before beginning the main tasks. Training helps performers understand the instructions, learn what is expected, and answer questions correctly. Include disputable and rare cases in the training tasks. If the answer is incorrect, the performer sees a hint explaining which answer is correct and why.

  • Control the results of training

    Create an exam with control tasks. Unlike training tasks, exam tasks don’t have any hints or explanations. A good exam should be:

    1. Passable
    2. Regularly updated
    3. Short
  • Control quality in real time within the production pool

    You can't get high-quality results in crowdsourcing without a well-thought-out quality control system. In Toloka, you can choose and configure rules to match the type of task: overly fast responses , majority vote, captcha, and, of course, control tasks (also called golden sets or honeypots). Control tasks have correct answers added to a page of main tasks in order to evaluate the quality of the performer's work in real time. A performer who gives too many incorrect responses in the control tasks loses access to the project.

  • Optionally, create a retry pool

    If the labeling of your pool was fast at first and then slowed down, it may mean that too many performers have lost the necessary skill level on control tasks. Perhaps they were just a little unlucky and deserve a second chance. You can create a retry pool as a way to gain back more performers.

  • Optionally, use post-verification

    You can check tasks manually or create a separate project and allow another group of performers to verify the results of the initial project. You can reject the results if the task was performed poorly and accept the ones that were done correctly. Rejected tasks are not paid for. This type of quality control works well for tasks with results that can’t be evaluated using control tasks or consensus (majority vote), like generation of unique content, object segmentation, or voice recording.

  • Best practices for control tasks

    If a task pool consists of several hundred tasks, then 10% of the pool should be control tasks. If the pool contains thousands of tasks, just 1% is enough.

    If possible, classes should be represented in the set of control tasks in proportions similar to the classes in the general pool of tasks. These proportions are set by the requester. Suppose you need to determine the type of accommodations on a hotel aggregator website: family, business, casual or luxury. Luxury accomodations make up just 10% of the main pool, but performers see them in every second control task. As a result, you will not be able to check whether the performers correctly mark the other types of accommodations, and you risk getting poor results with noisy data.

    Tip: If the quality or labeling speed is low, check the control tasks. Perhaps they contain incorrect, outdated or unclear examples.

    Convenient ways to create control tasks
  • Send the tasks to a "trusted" crowd. Select performers who are reliable in delivering high-quality results and assign them skills (a skill is a variable assigned to a performer). Then create a separate project for them to generate correct responses for tasks and launch it with high overlap. Don’t forget to use additional quality control methods, just in case.

  • Choose specialists in your company who can label the data well. Have them produce control tasks to use for checking the quality of performers' responses. In other words, 10% of the tasks are completed by an internal team and used for controlling 90% of the entire labeling process.
  • Apply skills to control quality

    Use skills to automatically calculate:

  • Rate of correct responses (with control tasks, majority vote, or post-verification)
  • Behavioral features like fast responses
  • Binary information on execution of particular projects
  • Any combinations of these and other features

    Also use skills in automatic decision making to control access to projects and tasks (for instance, automatically revoke access to tasks if a user’s skill level drops too low).

    Best practice when assigning a skill is to combine different signals to get a robust skill.

  • Yandex.Toloka News
    Receive information about platform updates, partners, training materials, and other news.
    Файлы cookies
    Для персонализации сервисов Яндекс использует файлы cookies. Продолжая использование сайта, вы соглашаетесь с этим. Подробности о файлах cookies и об обработке ваших данных в Политике конфиденциальности.
    Wed Oct 21 2020 22:16:20 GMT+0300 (Moscow Standard Time)