Search relevance 

Optimize your search engine performance with fast solutions for collecting human data 
to improve search relevance.


How Toloka can meet your business needs

  • Use the Toloka data labeling platform to collect explicit human judgments about the quality of your search engine
  • Understand how well your search algorithms work and compare different versions before A/B tests
  • Define your scale for graded search relevance to fit your model
  • Make important search product decisions based on human validation
  • Improve search relevance instead of optimizing for clicks

Data labeling cases to measure search relevance

Toloka supports a wide range of use cases for search relevance evaluation with human annotators

Use Toloka to get accurate offline metrics to complement your online metrics


Success stories

10+ years of industry experience solving real-world problems.

Improved in-context predictions of product categories
growth in GMV with enhanced search relevance
  • Product catalog
  • E-commerce
Improved in-context predictions of product categories


Improved search relevance
200% quality
speed with 2,000+ objects labeled per day
  • Search relevance
  • E-commerce


Improved product search results for online customers
of final data and completed 60% quicker
  • Search relevance
  • E-commerce

Why Toloka

  • ML technologies
    • One platform to manage human labeling & ML
    • Prebuilt scalable infrastructure for training and real-time inference
    • Flexible foundation models pre-trained on large datasets
    • Automatic retraining and monitoring out of the box
    Learn more
  • Diverse global crowd
    • 100+ countries
    • 40+ languages
    • 200k+ monthly active Tolokers
    • 800+ daily active projects
    • 24/7 continuous data labeling
    Learn more
  • Crowdsourcing technologies
    • Advanced quality control and adaptive crowd selection
    • Smart matching mechanisms
    • 10 years of industry experience and proven methodology
    • Open-source Python library for aggregation methods
    Learn more
  • Robust secure infrastructure
    • Privacy-first, GDPR-compliant focus on data protection test
    • ISO 27001-certified
    • Multiple data storage options, Microsoft Azure cloud
    • Automatic scaling to handle any volumes
    • API and open-source libraries for seamless integration
    Learn more

For developers

  • API
    Our open API gives you the freedom 
    to integrate directly into any pipelines
  • Python SDK
    Our Python toolkit covers all API 
    functionality to give you the full 
    power of Toloka
  • Java SDK
    Our Java client library provides a lightweight 
    interface to the Toloka API that works 
    in any Java environment

Boost your search algorithms

Let's talk about how to get the search relevance data you need.


  • There is more than one way to get the high quality training data needed for improving a search algorithm. Our experience and success stories of our clients show that crowdsourced relevance evaluation performs on par with in-house experts. At the same time, crowdsourcing allows scaling for very large amounts of data in multiple languages and domains.
  • Toloka provides search evaluation services in over 40 languages with a crowd of annotators around the globe. Our support for multilanguage projects means that you can run multiple languages in parallel with different groups of people evaluating your data. Experiment with a small single-language project to start, and easily scale to as many languages as you need.Toloka integrates state-of-the-art technologies for quality management. You can be confident in the accuracy of search relevance evaluations in any language.
  • Online metrics from logs and A/B testing show you the model's actual performance with real users. Offline metrics are calculated from sets of labeled data — usually human evaluations of search results or product recommendations.Both online and offline metrics can tell you how effective your search engine is. Online metrics that don't use crowdsourcing are more expensive, so before running them, companies use offline evaluation. In addition, offline metrics are more scalable and they don't affect users.
  • The search relevance scale refers to the values used for measuring relevant results. Possible types:
    • Binary scale (relevant or non-relevant)
    • Graded relevance with multiple levels like exact match, mostly relevant and somewhat relevant (for product recommendations, custom categories could be similar item, substitute, accessory)
    • Side-by-side comparison to verify which response is more relevant to the user's search terms
  • Toloka experts rely on state-of-the-art technologies and best practices to obtain high quality data. See the article Evaluating search relevance on demand with crowdsourcing for tips on how to obtain consistently accurate human-sourced labels for search relevance projects.With subjective evaluations like side-by-side comparison, it's important to obtain aggregated responses to establish ground truth. Toloka integrates a variety of aggregation methods that are appropriate in different scenarios. To learn about the technical aspects of data labeling for side-by-side evaluations, see this article by Dmitry Ustalov.