Research Benchmarks

Subscribe to Toloka News

Quality control lies at the heart of crowdsourcing. Below are a few handy examples — you can use them as benchmarks to achieve the described levels of quality on popular research datasets.

Task	Dataset	Aggregation	Quality	Reference
Image Classification	CINIC-10	Dawid-Skene	Accuracy on Test = 88%	GitHubColab
Text Classification	Large Movie Review Dataset	Dawid-Skene	Accuracy on Test = 89%	GitHubColab
Audio Transcription	CrowdSpeech	Fine-Tuned T5	Word Error Rate on test-clean = 5.22	GitHubPaperColab