With thousands of annotators making millions of evaluations in hundreds of tasks every day, Toloka is a major source
of human-marked training data. Toloka supports academic research and innovation by sharing large amounts of
accurate data applicable to machine learning in a variety of areas.
Please note: These public datasets are only available for non-commercial use with a clear reference to Toloka as the source of data.
If you plan to use any of these datasets for commercial purposes, please contact us for our consent.
ZIP archive, 10.8 GB
Labels: texts.tsv
Photos: images/
ZIP archive, 19.5 GB
Labels: data.tsv
Photos: photos/
ZIP archive, 981 MB
Photos: images/
Masks: masks/
Collages: collage/
Have a dataset that you are ready to share? Submit it for publication on this page.