Translating through Crowdsourcing: The AliExpress Story

Toloka Team
by Toloka Team
Image

Subscribe to Toloka News

Subscribe to Toloka News

AliExpress has the highest traffic of any e-commerce website in Russia with almost 9 million daily visitors and more than 100 million items for sale. When AliExpress Russia opened its doors in 2019, the company quickly ran into localization problems with many inaccurate translations of catalogued products that required lengthy and ultimately unsuccessful solutions. Looking for new ways to solve this, AliExpress turned to Toloka for crowdsourcing-based solutions, which proved to be the right decision.

Replacing outdated methodology

Virtually all automated translation follows the same logic, executed step by step:

  • Separation of written content into smaller pieces
  • Translation and validation of individual lexical components bit by bit
  • Subsequent reassembling of the original message
Image
Traditional Pipeline

All attempts at improving translation quality have so far been along the same line. Originally, Toloka was no different and followed the same tried-and-true pattern, which entailed going back and forth between translating and validating until an adequate result was achieved.

The problem is that this method is quite inefficient. It relies heavily on going through multiple stages (or "projects") and therefore requires a lot of time to execute. This issue of impracticality becomes even more pronounced when many projects are run in parallel. From a business perspective, this sort of setup is far from ideal.

New translation algorithm

Toloka programmers, led by Andrey Olkhovik, decided to skip the validation stage altogether, at least in the traditional form known to most in the auto-translation field.

Image
New Experimental Approach

Now, Tolokers could choose one of the answers provided in a multiple-choice box containing up to 4 options, or tick "none of the options fits" and offer their own answer.

Image
Task example

As a result, the new versions provided by Tolokers became fixed options in a multiple-choice box for other contributors to choose from, and the whole "choose the right option" process started over.

Different users either chose those answers and hence verified them, or offered their own, newer versions. The cycle continued until there were no other "better" answers to offer, and the best answers had been confirmed by most Tolokers.

Image
Task example

Results

Although this crowdsourcing-based method requires a much more careful selection of Tolokers, it provides a significant improvement in translation accuracy and offers the following advantages: 1 task – 1 project. Both the metrics and the budget are focused, not scattered across numerous Toloka projects Less error-prone automatization. With reduced automatization, the potential for mistranslations also decreases Quick start. Less preparation is needed to begin the whole process Smaller budget. Not 2 projects for $0.01 + 50% commission, but 1 project for $0.02 + 25% commission
Article written by:
Toloka Team
Toloka Team
Updated: 

Recent articles

Have a data labeling project?

Take advantage of Toloka technologies. Chat with our expert to learn how to get reliable training data for machine learning at any scale.
Fractal