How Toloka Helps Rate the Quality of Search Results in the Ozon online store

by Toloka Team on Dec 3rd, 2020
About Ozon
Ozon is Russia's leading multi-category e-commerce platform, which offers more than 9 million SKUs across 24 different categories.

Use Case

Ozon uses Toloka for creating reference samples. They have several purposes:

  • To evaluate the quality of the new search engine.
  • To determine the most effective ranking model.
  • To improve the quality of the search algorithm using machine learning.

Test run

Ozon employees created the first test sample manually — they took 100 search queries and did the labeling themselves. Even this small sample helped to identify problems in the search engine and determine the evaluation criteria. The company wanted to create its own tool for evaluating search quality, hire assessors, and train them, but this would take too much time, so they decided to choose a ready-made crowdsourcing platform.

Training turned out to be the hardest stage of the task for performers: even Ozon employees failed at the first test task. With feedback from the team, they developed a new test. Training was now organized from simple to complex, and tasks accounted for performer qualities that were important to the company.

To eliminate errors, Ozon did a test run. The task consisted of three blocks: training, a control set with a 60% threshold for correct answers, and the main task with an 80% threshold for correct answers. To improve the quality of the sample, each task was offered to five tolokers.


Test run statistics

Performers
147 performers joined the first stage, 77 completed training, and 12 were assigned the skill and did the main task. 
Tasks
350 tasks in 40 minutes
Budget
The budget was $12

Main launch

The scenario of the main launch was more complex: it involved new tolokers as well as those who received the necessary skill during the test stage. The newbies went through the standard procedure, and the experienced tolokers were admitted to the main tasks straightaway. For the main launch, additional skills were added — the percentage of correct answers in the main sample and the majority vote score. The task was offered to five tolokers, like before.

Main launch statistics

Tolokers
1,117 tolokers joined the project, 18 received skills, 6 got access to the largest main pool and are evaluating it. 
Tasks
40,000 tasks in a month
Budget
The budget was $1,150

Now the Ozon task on Toloka looks like this:

The toloker sees the search query and 9 products from the search results.
Their task is to rate the results, choosing among

  • "perfect"
  • "good"
  • "might fit"
  • "doesn't fit"
  • "page not found"
The last value helps identify technical problems on the website. To simulate user behavior as accurately as possible, the developers recreated the interface of the online store in an iframe.

At the same time as the task was launched on Toloka, the search queries were labeled using rules. The focus was on popular queries, in order to improve their search results first.

Labeling with rules made it possible to get data faster using a small number of queries, and the results for top queries were good. But there were also disadvantages: ambiguous queries can't be evaluated using rules, and there are many controversial situations. This method also proved rather expensive in the long term.

Manual labeling doesn't have those disadvantages. In Toloka, you can collect the opinions of a large number of tolokers and get more granular evaluations, which lets you analyze search results more deeply. After the initial setup, the platform works stably and processes large amounts of data.

Manual labor and AI aren't mutually exclusive of each other. The more AI develops, the more manual labor is needed to train it. On the other hand, the more training neural networks get, the more routine tasks can be automated, so people don't have to do them.

Almost any task, even a large one, can be divided into many small ones and done with the help of crowdsourcing. Most of the tasks that are solved in Toloka are the first step to training models and automating processes with manually collected data.

Toloka News

Receive information about platform updates, training materials, and other news.
Wed Apr 28 2021 16:50:10 GMT+0300 (Moscow Standard Time)