Product moderation for e-commerce: a case study

Toloka Team
by Toloka Team
Image

Subscribe to Toloka News

Subscribe to Toloka News

AliExpress CIS, part of the Alibaba Group, is a Chinese e-commerce platform serving the CIS region (former republics of the Soviet Union). The Alibaba Group serves small businesses in China and other locations like Singapore that sell products to international online shoppers. Using Toloka's scalable data labeling solution, AliExpress was able to set up their product moderation system for users located in CIS countries.

About the client

AliExpress CIS is a joint venture between Alibaba and multiple local investors across the CIS. With more than 400,000 Chinese retailers on the platform, AliExpress CIS has over 200 million products on sale, which amounts to more than three billion SKUs. That's a massive amount of data, and AliExpress knew they needed a highly scalable and effective moderation solution to process it.

Image

Challenges of multicultural operations

AliExpress CIS sells goods to people in ten different countries, which means they must comply with local restrictions and regulations on what can and can't be sold in each particular country. Furthermore, the legal framework differs from country to country and new laws appear constantly.

But differences in legislation are only part of the issue. With cross-cultural e-commerce sites like AliExpress, where sellers are from one culture and buyers are from another, cultural differences are fraught with risk. The marketplace must limit or restrict the sale of items that are illegal, culturally unacceptable, or perceived negatively.

Image

Small Chinese businesses featured on the platform are unaware of CIS laws and cultural differences. Their job is to provide goods and delivery services. The moderation of products is the platform's responsibility.

Image

That being said, Alibaba's moderation process is global. They check whether items comply with Chinese laws and don't account for local legislation. The goal of AliExpress CIS was to transition from global to local moderation by building their own moderation process from scratch.

Difficulties addressed

The AliExpress team, in collaboration with Toloka, began building the moderation process by defining what items should be moderated and how they should be classified.

As a result, they came up with five main classes of products for building a pipeline for further moderation:

  1. OK: items that can be sold and displayed without any restrictions.
  2. Restricted: products that are only displayed upon direct request and are represented with culturally inappropriate, offensive, disgusting, or insulting content, a class that is the most difficult to formalize and thus moderate.
  3. Adult: sex-related products that are only displayed upon request and following a process of age verification.
  4. Prohibited: items that are not allowed to be sold in CIS countries such as illegal drugs, firearms, or intellectual property.
  5. Suspect to fraud: products with unreasonably high or low prices, as well as those without clear descriptions. This class isn't used to determine instances of fraud on the platform but was included for the convenience of annotation.
Image

Setting up a product moderation pipeline

The next step was to create a moderation pipeline, specifically targeting items that fit into one of these five categories. Given the size of the entire catalog, which contains over 200 million products, the solution was to prioritize items for moderation rather than check everything on the site.

Moderation priority is based on how often a product is viewed, along with other criteria including analytical and threshold-based inputs. This approach reduces the overall volume of what needs to be moderated, resulting in a much more manageable quantity.

The set of selected items is then sent for moderation to a pre-trained ML model. The model predicts whether each item is acceptable or should be banned from the website, and also provides a confidence score that indicates how reliable the verdict is.

Items with a high confidence score are added directly to the database and then passed down the line to the product service. Items that the model is uncertain about are routed to the gray zone and sent to the Toloka crowd.

After being sent to Toloka for human labeling, these items are added to the learning set for the ML model to train on and improve the confidence threshold for the next rounds of moderation. As a result, the process is continuously optimized and automated.

Image

Moderation handled by Toloka

Toloka's crowdsourcing process can be broken down into three steps. The annotators are trained and then take an exam before getting assigned real-world production tasks.

The production tasks are classified as either first level or second level. All items from the gray zone are assigned to level one tasks and are evaluated independently by three annotators. If there's consistency in the answers, or, in other words, a majority vote (three out of three or two out of three), the item advances directly to the aggregation results and then to the database. If the annotators provide inconsistent answers, the item is routed to the second task level by a seven-vote majority.

Alongside items from the gray zone, Tolokers solve tasks from the "golden set," which are automatically verified without their knowledge. This means that each Toloker's accuracy is continually monitored. If the accuracy isn't high enough, new Tolokers can be added to achieve more consistent results.

Results

AliExpress CIS, with a team size of five, initiated the process in June 2022, and has since grown and scaled at a rate that couldn't have been achieved without human data labeling and crowdsourcing.

  • Moderation efficiency increased 500-fold: they went from verifying only 200 items per day to 100,000 items per day.
  • The ML-optimized process, combined with human verification, reduced the price of the verified item by half, from $0.017 to $0.01.
  • It now takes less than 15 seconds to verify one item.
  • The quality of labels is now 98.7%.
Image

Key takeaways

The moderation process developed by AliExpress CIS and Toloka isn't just a data labeling process, but rather a complex pipeline that combines general classification and human evaluation. When it comes to big data tasks, general ML classification aided by humans becomes a powerful tool.

Using effective and scalable crowdsourcing techniques, it is now possible to evaluate difficult-to-formalize domains of data, such as cultural differences, in a way that is sensitive to the current context.

And, perhaps most importantly, human judgments serve as a double-action trigger because they are used to arrive at final conclusions while also optimizing ML performance. As a result, they help not only improve the process but also reduce costs.

The case was presented at the Data-Driven AI meetup by Elena Gruntova, Product Director, AliExpress. The full video is available here.

Download presentation
Article written by:
Toloka Team
Toloka Team
Updated: 

Recent articles

Have a data labeling project?

Take advantage of Toloka technologies. Chat with our expert to learn how to get reliable training data for machine learning at any scale.
Fractal

More about Toloka

  • Our mission is to empower businesses with high quality data to develop AI products that are safe, responsible and trustworthy.
  • Toloka is a European company. Our global headquarters is located in Amsterdam. In addition to the Netherlands, Toloka has offices in the US, Israel, Switzerland, and Serbia. We provide data for Generative AI development.
  • We are the trusted data partner for all stages of AI development–from training to evaluation. Toloka has over a decade of experience supporting clients with its unique methodology and optimal combination of machine learning technology and human expertise. Toloka offers high quality expert data for training models at scale.
  • The Toloka team has supported clients with high-quality data and exceptional service for over 10 years.
  • Toloka ensures the quality and accuracy of collected data through rigorous quality assurance measures–including multiple checks and verifications–to provide our clients with data that is reliable and accurate. Our unique quality control methodology includes built-in post-verification, dynamic overlaps, cross-validation, and golden sets.
  • Toloka has developed a state-of-the-art technology platform for data labeling and has over 10 years of managing human efforts, ensuring operational excellence at scale. Now, Toloka collaborates with data workers from 100+ countries speaking 40+ languages across 20+ knowledge domains and 120+ subdomains.
  • Toloka provides high-quality data for each stage of large language model (LLM) and generative AI (GenAI) development as a managed service. We offer data for fine-tuning, RLHF, and evaluation. Toloka handles a diverse range of projects and tasks of any data type—text, image, audio, and video—showcasing our versatility and ability to cater to various client needs.
  • Toloka addresses ML training data production needs for companies of various sizes and industries– from big tech giants to startups. Our experts cover over 20 knowledge domains and 120 subdomains, enabling us to serve every industry, including complex fields such as medicine and law. Many successful projects have demonstrated Toloka's expertise in delivering high-quality data to clients. Learn more about the use cases we feature on our customer case studies page.