Solutions

Datasets

Research

Resources

Company

Talk to us

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

How to boost content moderation efficiency with Toloka

Toloka Team

December 6, 2021

Customer cases

How to boost content moderation efficiency with Toloka

Content moderation is the job of monitoring user-generated web content to make sure that nothing inappropriate, offensive, or disturbing gets published online. Things like false advertising, fraud, or sexually explicit images can seriously harm the reputation of an online business, whether it’s an e-commerce marketplace, an AdTech platform, or a social media site.

As online content grows exponentially, keeping the web safe is no easy task. Companies today use Machine Learning models to build their automated moderation systems. These systems rely heavily on high-quality, accurately labeled data for training and validating their models — and this is where crowdsourcing can play an invaluable role.

Use case: Yandex Zen

Yandex Zen is a popular recommendation service that provides user-generated content and uses ML algorithms to support automated moderation. The service takes the form of a personalized news feed that shows articles and videos from external digital media sources, as well as original pieces from its own blog. The website has 20 million daily active users, including 5000 daily new authors, the most popular of whom are able to monetize their contributions.

Examples of content on Yandex Zen

Since its launch in 2015, Yandex Zen has encountered 2 major issues:

Even relevant, personalized content on the platform sometimes crosses the line between the acceptable and the inappropriate, going from informative to provocative, sensational, and sometimes even potentially dangerous advice.
Profit-seeking authors try to cheat the website’s algorithm, creating questionable or substandard content in an attempt to increase engagement.

The platform relies on human moderators to cover the grey areas that their ML algorithms can’t handle reliably, as well as to monitor trending content in real time. The team found it was a challenge to keep enough moderators online 24/7, label enough content to train and validate their ML models, and maintain quality of moderation.

Challenge

Toloka was asked to utilize the power of the crowd and help improve moderation at Yandex Zen. Moderators were expected to identify the following types of violations:

Clickbaiting
Hate speech
Poor (and potentially hazardous) medical advice
Spam
Violent content
COVID-19 misinformation
Adult content
Illegal content and scams (information about illegal products and services + attempts to defraud other users)

Example of a content moderation project

Toloka’s Crowd Solution Architects (CSAs) isolated 3 classification formats for the project – text, image, and video.

Normally, majority vote is used to improve quality on classification tasks. This means that each item is classified by multiple Tolokers and results are automatically aggregated to use the most popular verdict. However, this moderation project was a long-running continuous process, and majority vote revealed socio-demographic bias that degraded quality over time.

In addition, the website’s fast-paced environment with freshly published hourly content meant that moderation and control tasks had to be regular and frequent, while all trending content required real-time moderation.

Example of a classification task about traveling to Egypt that was classified as “Covid-19 conspiracy”

Toloka created 10 projects for articles and videos, as well as 4 projects for the comment section. Each project contained around 10,000 daily crowdsourcing tasks.

Solution

The Yandex Zen team came up with a new infrastructure for moderating content efficiently. Initially, the majority of content is handled by automatic classifiers. Anything that falls into a gray area is passed on to the Toloka crowd for classification. This crowd is supported by in-house moderators, who verify the performers’ work and help control quality. A handful of expert moderators create control tasks with true labels, which are used for daily secret exams to monitor quality and reward those who perform well. In-house moderators can focus on trending content that demands careful consideration, while letting the crowd handle the majority of tasks.

The pipeline enabled effective quality control, transparency, and scalability, and also reduced the need for full-time moderators.

Here’s what the new collaborative process achieves:

Successful decomposition of a large-scale labeling project into numerous simple tasks, which makes the workload manageable and increases the overall quality of moderation while reducing the burden on in-house moderators.
A new quality-based payment scheme that awards bonuses for accuracy and motivates both Tolokers and the in-house staff.
An interactive model that establishes an effective feedback loop among the moderators and the crowd, resulting in higher accuracy and improved task pool management.

As a result, moderation accuracy has risen significantly, reaching figures up to 98 percent. Content classification quality also improved in the first 2,5 months of working together with Toloka.

Results

As a result, moderation accuracy has risen significantly, reaching figures up to 98 percent. Content classification quality also improved in the first 2,5 months of working together with Toloka. If your business requires an affordable team of labelers for large-scale projects, including but not limited to classification tasks, be sure to get in touch with us – we'll be glad to help you build a robust data-labeling pipeline.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Recent articles

View all articles

The new frontier of cybersecurity: a guide to AI agent security

Jun 18, 2025

Agent Evaluation: Why Simulated Environments are the New Frontier for Data

Jun 17, 2025

LLM evaluation: from classic metrics to modern methods

Jun 17, 2025

The new frontier of cybersecurity: a guide to AI agent security

Jun 18, 2025

Agent Evaluation: Why Simulated Environments are the New Frontier for Data

Jun 17, 2025

LLM evaluation: from classic metrics to modern methods

Jun 17, 2025

Agentic RAG systems for enterprise-scale information retrieval

Jun 13, 2025

More about Toloka

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?