Toloka Team
Error reduction in a recommendation system for e-commerce: a case study
Yandex.Market is an e-commerce marketplace with over one million items available in its product catalog. When the company needed to tune their product recommendation engine, they developed a data labeling pipeline in Toloka to get the data they needed.
Challenge
Recommender (or recommendation) systems are algorithms used by e-commerce companies to suggest related products when customers are choosing products online. The ultimate goal is to increase average purchase value and CTR in the online store. An effective recommender system needs vast amounts of labeled data to support its ML model.
Yandex.Market started out by using automated solutions to train their recommendation model, but the algorithm was not performing well enough. They developed a new strategy using the Toloka platform:
Label products with matching accessories and related items.
Train a gradient boosting model to apply filters based on the labeled dataset.
Measure precision and recall and then retrain the model until satisfied.
Solution
The product pairs were pre-selected by analyzing the content of numerous shopping carts. If any two products were purchased together at least five times, that was a good indicator that the two were either related products or one of them was an accessory to the other.
Initially, the team merged related products and accessories into one labeling project but soon decided to separate the two. They discovered that most accessories are associated with a single product, whereas related items tend to be connected to multiple products.
For example, for the iPhone 13, a phone case would be considered an accessory, while AirPods would be considered related products (listed under "Frequently bought together" on the site).
After realizing that related products and accessories needed separate labeling processes, the company used pre-labeling to prepare separate datasets. Then they set up quality control and launched the two labeling projects with trained and tested performers.
The labeled data was sent back to Toloka in a new project to check the quality of results. If there were too many mistakes or the engineers disagreed with Tolokers’ judgments, they improved the task instructions and added more examples to the training tasks, then continued the process. High-quality labels were also collected and reused as control tasks in subsequent projects.
The pipeline for both projects looked like this:
Once the labeled dataset was ready, it was fed into a gradient boosting framework to train and improve the performance of the recommendation model. Finally, accuracy and recall were measured and the model was retrained as much as necessary.
Unexpected issues
One of the problems that impeded labeling was that some items had unclear descriptions and Tolokers didn’t have enough information to label them correctly.
The team tried labeling with in-house experts to get around this issue, but the process was not effective enough. Now they have switched to using Tolokers to search for missing information.
Results
After integrating Toloka, the accuracy of the Yandex.Market recommender system went from a modest 40% to 90% overall, while recall rose from 20% to a solid 74% for accessories and 90% for related items. Toloka provides much faster results than alternative solutions, with the average labeling speed on this project of 24 product pairs per minute. Recommendations stay up to date — now that the pipeline is in place, it is quick and easy to get new labels and retrain the system whenever a new category of products is introduced. The marketplace is currently looking into using Toloka to boost other aspects of their business.
Special thanks to Maxim Eliseev for providing the data for this post.
Article written by:
Toloka Team
Updated:
Jan 24, 2022