Question 1

Do I need to collect expert opinions to improve search relevance?

Accepted Answer

There is more than one way to get the high quality training data needed for improving a search algorithm. Our experience and success stories of our clients show that crowdsourced relevance evaluation performs on par with in-house experts. At the same time, crowdsourcing allows scaling for very large amounts of data in multiple languages and domains.)

Question 2

What if I need to improve search relevance in multiple languages?

Accepted Answer

Toloka provides search evaluation services in over 40 languages with a crowd of annotators around the globe. Our support for multilanguage projects means that you can run multiple languages in parallel with different groups of people evaluating your data. Experiment with a small single-language project to start, and easily scale to as many languages as you need.Toloka integrates state-of-the-art technologies for quality management. You can be confident in the accuracy of search relevance evaluations in any language.)

Question 3

How do you measure the quality of search results?

Accepted Answer

Online metrics from logs and A/B testing show you the model's actual performance with real users. Offline metrics are calculated from sets of labeled data — usually human evaluations of search results or product recommendations.Both online and offline metrics can tell you how effective your search engine is. Online metrics that don't use crowdsourcing are more expensive, so before running them, companies use offline evaluation. In addition, offline metrics are more scalable and they don't affect users.)

Question 4

What is meant by a scale of relevance?

Accepted Answer

The search relevance scale refers to the values used for measuring relevant results. Possible types:

Binary scale (relevant or non-relevant)
Graded relevance with multiple levels like exact match, mostly relevant and somewhat relevant (for product recommendations, custom categories could be similar item, substitute, accessory)
Side-by-side comparison to verify which response is more relevant to the user's search terms

)

Question 5

Can I trust the accuracy of human evaluations?

Accepted Answer

Toloka experts rely on state-of-the-art technologies and best practices to obtain high quality data. See the article Evaluating search relevance on demand with crowdsourcing for tips on how to obtain consistently accurate human-sourced labels for search relevance projects.With subjective evaluations like side-by-side comparison, it's important to obtain aggregated responses to establish ground truth. Toloka integrates a variety of aggregation methods that are appropriate in different scenarios. To learn about the technical aspects of data labeling for side-by-side evaluations, see this article by Dmitry Ustalov.)

Search relevance
evaluation

How Toloka can meet your business needs

Data labeling cases to measure search relevance

Use Toloka to get accurate offline metrics to complement your online metrics

Success stories

Why Toloka

ML technologies

Diverse global crowd

Crowdsourcing technologies

Robust secure infrastructure

For developers

API

Python SDK

Java SDK

Boost your search algorithms

FAQ

Search relevance evaluation