In this tutorial, you will learn how to run content moderation in Toloka. We will use a project preset designed specifically for this type of data labeling.
Content moderation is a type of data labeling task with a text and a number of response options. Tolokers read the text and choose one of the given answer options. After that, they can specify their answer using an additional question with checkboxes.
Use this preset when you need to:
Moderate comments and nicknames on a forum.
Check ads on a site, product reviews in a store, or messages in social networks.
Check for the presence of a brand or company name.
Before you begin:
Make sure you are registered in Toloka as a requester.
Top up your Toloka account. If you are unsure about the budget, you can do that later in this tutorial. Toloka will display the budget estimate for your project.
We recommend starting with a project preset for easier configuration and better results.
Follow this link, or create a project manually:
Click Create a project.
Click Do it myself.
Select the Sentiment analysis & Content moderation preset.
Click Choose this preset in the pop-up tab.
In the General information section, add the project name and description:
Name to show Tolokers: In 2–5 words, state the general idea of the project.
Description for Tolokers: In a couple of sentences, explain what you expect Tolokers to do. This is just an overview. You will write instructions later.
In the Task interface section, set up what your tasks will look like. This preset has a task template with validation, keyboard shortcuts, and task layout pre-configured.
This tutorial uses Template Builder, but you can use the HTML/JS/CSS editor for the same purpose.
Write the first question Tolokers will see in your task. All tasks in a project use the same question.
Set answer options. In the options
list, replace the sample answers with your values in the following properties:
label
: This is the label that Tolokers will see. Make sure it is clear and correct.
value
: This is the value you will see in the file with the labeling results.
When a Toloker selects an option which requires additional information, the second question with checkboxes appears. You can change the condition under which the additional question becomes visible. To to that, replace the value of the to
propertу with one of the values you’ve already specified in the value
properties in the previous step:
Configure the text and the answer options for the additional question:
In the Input data example section, add a sample text. It is only used to display the task interface preview on the right.
Raw task data is stored in the XSLX, TSV, or JSON format. The labeling results are presented in a TSV file. The Data specification section determines which parameters these files might contain.
Click Show specifications and check the values:
Input data and Output data match the task interface you set up in Template Builder. Check that there are fields for all data types you use for your tasks, and for the ones you want to see in the results file.
In the Instructions for Tolokers editor, enter the instructions Tolokers will see when they start doing your tasks. You can add text, tables, and images to your instructions.
Check the sample text of the instructions, and update it to fit your project.
When writing instructions, remember that most Tolokers don’t know anything about your tasks beforehand. Make sure your instructions are as clear as possible, but not too wordy. For successful data labeling, try to strike a balance between covering all the essentials and keeping it short. Learn more in our knowledge base.
In the upper-right corner, click Save.
Learn more about working with the project in the Project section.
A pool is a set of tasks sent out to Tolokers at the same time. One project can have many pools. When creating a pool, you set up pricing, audience filters for Tolokers, and quality control.
Click Create new pool on the project page.
Select the value in the Pool type drop-down list.
If the price per task suite is zero, you must select the pool type.
Set the Pool name (visible only to you) field. Only you will see this pool name on the project page.
Specify the pool description which will be displayed instead of the project description in the task list for Tolokers. By default, Tolokers see the description from the project settings. To use a different description, uncheck the Use project description box and set Public description. If necessary, click + Private comment to add a private project description that only you will see.
Click Create.
At the Select the audience for your task step, set up filters to select Tolokers for your pool.
Clear My tasks may contain shocking or pornographic content if your project has none of those.
To select Tolokers based on their language, location, age, gender, and other parameters, click the Add filter button.
For example, add the Languages filter:
Tasks in pools will automatically be available in the web version of Toloka and the mobile app. If you want to change the default settings and limit the visibility of the task for any of the versions, add the Client filter and select the desired value: Toloka web version or Toloka for mobile.
Use the Speed/quality balance slider to change the number of Tolokers who can see your tasks. Move the slider to the right to exclude Tolokers with lower ratings from participating in your project.
At the Setup quality control step, set quality control rules for more accurate results:
To filter out Tolokers who complete tasks too fast, edit the pre-configured Fast responses rule. Specify the following values:
These settings mean that a Toloker who completes a task suite in less than 20 seconds will be suspended and won't be able to access your tasks for 10 days.
A task suite is a page with a number of tasks. It can contain one or several tasks. If the tasks are simple, you can add 10–20 tasks per suite.
To determine the Minimum time per task suite value, complete your task suite and record the time. If you ban users for one fast response, then set a minimal time. If you do it after several fast responses, increase the time slightly.
Keep the pre-configured Majority vote rule as is. It accepts the most popular response as the correct one, and allows you to filter out Tolokers who answer incorrectly. The default settings mean that Tolokers who give correct responses to less than 40% of tasks are banned from the project for 1 day. Accept as majority set to 2
means that 2 similar responses out of all responses given to a single task will be considered as the correct answer.
Click Add a quality control rule → Control tasks, and enter the following values:
This means that a Toloker who gives more than 40% of incorrect responses will be blocked and won't be able to complete tasks in this project for 10 days.
At the Add optional pool settings step, specify the Time per task suite, sec.
It should be long enough to read the instructions and wait for task data to download (for example, 150 seconds).
At the Set the task price and overlap step, set up how much a single task will cost for you.
In Price per task suite, $, set the amount of money to pay per a page with tasks done by one Toloker.
In the Overlap field, define how many Tolokers must do each task.
For content moderation tasks, it is usually 3–5. The default value (3
) means that each task will have 3 responses.
At the Prepare and upload data step, upload your task data.
Create the tasks for Tolokers:
To download a template, click one of the buttons:
For this type of project, the file with tasks must have one parameter. Its name equals INPUT:comment
, and the values are texts.
INPUT:commentThis movie is terrible with only a few funny scenes.Hard to beat Cinnamon Toast CrunchThe trigger stopped working after 5 uses. Really
Open the downloaded file, and replace the sample comments with your texts.
Click Drop file here or select, and upload the file you’ve just made.
Click Continue.
Tasks are shown to Tolokers in suites. A suite is a single page with multiple tasks. Define how many tasks to include per suite:
General tasks: These are tasks for Tolokers to label.
Control tasks: These are tasks with predefined answers used to control the quality of responses. You will create them in the next step.
Training tasks: These are tasks with predefined answers and explanations for Tolokers. Normally you use training tasks in separate training pools. You don’t have to include them.
For example, you can add 9 general tasks and 1 control task per suite:
Click Combine tasks into suites.
Create control tasks at the Add control tasks for checking performance step. To do it, add correct answers to some of your tasks.
Select all the checkboxes, and specify the correct answer for a task. Then, click the Save and go to next button. Add several control tasks this way.
Note the Distribution of correct responses for control tasks graph on the right side of the page. It shows how many control tasks of each type you have. We recommend adding an equal quantity of each correct response.
At the Double-check your project and try out tasks step, check how the task will look from the Toloker's point of view.
This step will be enabled after you complete the previous steps. You can skip this step by clicking Do it later.
After all the steps, you'll see the Set up is finished and your pool is ready for labeling tip on the pool page.
Make sure you have topped up your account.
To send the tasks to Tolokers and begin the labeling process, click Start labeling.
In the pop-up panel, review the budget and click Launch.
You can see the labeling progress on the pool page. Wait until the labeling is completed. Refresh the page to check the progress.
When the labeling is complete, click the arrow next to the Download results button and choose Run Dawid-Skene model from the drop-down menu. Click Yes in the pop-up window.
Open the same drop-down menu again, and click View aggregations list.
Wait until the aggregation is complete, and click Download. You will get a TSV file with the labeling results:
INPUT: The data you uploaded for labeling.
OUTPUT: The results of labeling (answers given by Tolokers).
CONFIDENCE: The response significance according to the Dawid-Skene model.
Last updated: March 10, 2023