Product moderation for e-commerce: a case study

Toloka Team
by Toloka Team

Subscribe to Toloka News

Subscribe to Toloka News

AliExpress CIS, part of the Alibaba Group, is a Chinese e-commerce platform serving the CIS region (former republics of the Soviet Union). The Alibaba Group serves small businesses in China and other locations like Singapore that sell products to international online shoppers. Using Toloka's scalable data labeling solution, AliExpress was able to set up their product moderation system for users located in CIS countries.

About the client

AliExpress CIS is a joint venture between Alibaba and multiple local investors across the CIS. With more than 400,000 Chinese retailers on the platform, AliExpress CIS has over 200 million products on sale, which amounts to more than three billion SKUs. That's a massive amount of data, and AliExpress knew they needed a highly scalable and effective moderation solution to process it.


Challenges of multicultural operations

AliExpress CIS sells goods to people in ten different countries, which means they must comply with local restrictions and regulations on what can and can't be sold in each particular country. Furthermore, the legal framework differs from country to country and new laws appear constantly.

But differences in legislation are only part of the issue. With cross-cultural e-commerce sites like AliExpress, where sellers are from one culture and buyers are from another, cultural differences are fraught with risk. The marketplace must limit or restrict the sale of items that are illegal, culturally unacceptable, or perceived negatively.


Small Chinese businesses featured on the platform are unaware of CIS laws and cultural differences. Their job is to provide goods and delivery services. The moderation of products is the platform's responsibility.


That being said, Alibaba's moderation process is global. They check whether items comply with Chinese laws and don't account for local legislation. The goal of AliExpress CIS was to transition from global to local moderation by building their own moderation process from scratch.

Difficulties addressed

The AliExpress team, in collaboration with Toloka, began building the moderation process by defining what items should be moderated and how they should be classified.

As a result, they came up with five main classes of products for building a pipeline for further moderation:

  1. OK: items that can be sold and displayed without any restrictions.
  2. Restricted: products that are only displayed upon direct request and are represented with culturally inappropriate, offensive, disgusting, or insulting content, a class that is the most difficult to formalize and thus moderate.
  3. Adult: sex-related products that are only displayed upon request and following a process of age verification.
  4. Prohibited: items that are not allowed to be sold in CIS countries such as illegal drugs, firearms, or intellectual property.
  5. Suspect to fraud: products with unreasonably high or low prices, as well as those without clear descriptions. This class isn't used to determine instances of fraud on the platform but was included for the convenience of annotation.

Setting up a product moderation pipeline

The next step was to create a moderation pipeline, specifically targeting items that fit into one of these five categories. Given the size of the entire catalog, which contains over 200 million products, the solution was to prioritize items for moderation rather than check everything on the site.

Moderation priority is based on how often a product is viewed, along with other criteria including analytical and threshold-based inputs. This approach reduces the overall volume of what needs to be moderated, resulting in a much more manageable quantity.

The set of selected items is then sent for moderation to a pre-trained ML model. The model predicts whether each item is acceptable or should be banned from the website, and also provides a confidence score that indicates how reliable the verdict is.

Items with a high confidence score are added directly to the database and then passed down the line to the product service. Items that the model is uncertain about are routed to the gray zone and sent to the Toloka crowd.

After being sent to Toloka for human labeling, these items are added to the learning set for the ML model to train on and improve the confidence threshold for the next rounds of moderation. As a result, the process is continuously optimized and automated.


Moderation handled by Toloka

Toloka's crowdsourcing process can be broken down into three steps. The annotators are trained and then take an exam before getting assigned real-world production tasks.

The production tasks are classified as either first level or second level. All items from the gray zone are assigned to level one tasks and are evaluated independently by three annotators. If there's consistency in the answers, or, in other words, a majority vote (three out of three or two out of three), the item advances directly to the aggregation results and then to the database. If the annotators provide inconsistent answers, the item is routed to the second task level by a seven-vote majority.

Alongside items from the gray zone, Tolokers solve tasks from the "golden set," which are automatically verified without their knowledge. This means that each Toloker's accuracy is continually monitored. If the accuracy isn't high enough, new Tolokers can be added to achieve more consistent results.


AliExpress CIS, with a team size of five, initiated the process in June 2022, and has since grown and scaled at a rate that couldn't have been achieved without human data labeling and crowdsourcing.

  • Moderation efficiency increased 500-fold: they went from verifying only 200 items per day to 100,000 items per day.
  • The ML-optimized process, combined with human verification, reduced the price of the verified item by half, from $0.017 to $0.01.
  • It now takes less than 15 seconds to verify one item.
  • The quality of labels is now 98.7%.

Key takeaways

The moderation process developed by AliExpress CIS and Toloka isn't just a data labeling process, but rather a complex pipeline that combines general classification and human evaluation. When it comes to big data tasks, general ML classification aided by humans becomes a powerful tool.

Using effective and scalable crowdsourcing techniques, it is now possible to evaluate difficult-to-formalize domains of data, such as cultural differences, in a way that is sensitive to the current context.

And, perhaps most importantly, human judgments serve as a double-action trigger because they are used to arrive at final conclusions while also optimizing ML performance. As a result, they help not only improve the process but also reduce costs.

The case was presented at the Data-Driven AI meetup by Elena Gruntova, Product Director, AliExpress. The full video is available here.

Download presentation
Article written by:
Toloka Team
Toloka Team

Recent articles

Have a data labeling project?

Take advantage of Toloka technologies. Chat with our expert to learn how to get reliable training data for machine learning at any scale.