A top voice assistant developer needed accurate training data in several languages for expansion into new markets.
Directly translating the existing voice assistant requests and responses yielded very low accuracy (about 12%), so the next strategy was to collect language-specific datasets for training the models.
The Toloka crowd provided data in the target languages: speech recordings, audio transcription for speech recognition, request classification, and answer relevance evaluations.
The voice assistant's accuracy in the new languages hit ~62% (~2 correct responses for every 3 requests) — a major step up from a baseline of ~12%. With Toloka's contribution this result was reached in less than a year.