A top voice assistant developer needed accurate training data in several languages for expansion into new markets.
Challenge
Directly translating the existing voice assistant requests and responses yielded very low accuracy (about 12%), so the next strategy was to collect language-specific datasets for training the models.
Solution
The Toloka crowd provided data in the target languages: speech recordings, audio transcription for speech recognition, request classification, and answer relevance evaluations.
Business impact
The voice assistant's accuracy in the new languages hit ~62% (~2 correct responses for every 3 requests) — a major step up from a baseline of ~12%. With Toloka's contribution this result was reached in less than a year.