Research Benchmarks

Subscribe to Toloka News

Subscribe to Toloka News

Quality control lies at the heart of crowdsourcing. Below are a few handy examples — you can use them as benchmarks to achieve the described levels of quality on popular research datasets.

TaskDatasetAggregationQualityReference
Image
Classification
CINIC-10Dawid-SkeneAccuracy on Test = 88%GitHubColab
Text
Classification
Large Movie
Review Dataset
Dawid-SkeneAccuracy on Test = 89%GitHubColab
Audio
Transcription
CrowdSpeechFine-Tuned T5Word Error Rate
on test-clean = 5.22
GitHubPaperColab