At Toloka, we are committed to unlocking AI opportunities. Every day, our researchers tackle pressing AI and ML challenges,
make appearances at prominent global events, and publish their findings in scientific journals. Scroll down to learn more.
Browse through some of our latest work.
Our dataset has balanced distributions of age and gender using the well-known IMDB-WIKI dataset as ground truth. We describe how our dataset is built and then compare several baseline methods, indicating its suitability for model evaluation.
Domain-specific data is the crux of the successful transfer of machine learning systems from benchmarks to real life. Crowdsourcing has become one of the standard tools for cheap and time-efficient data collection for simple problems such as image classification: thanks in large part to advances in research on aggregation methods.
In this paper, we demonstrate Crowd-Kit, a general-purpose crowdsourcing computational quality control toolkit. It provides efficient implementations in Python of computational quality control algorithms for crowdsourcing, including uncertainty measures and crowd consensus methods.
This paper reviews the crowdsourced audio transcription shared task devoted to this problem and co-organized with the Crowd Science Workshop at VLDB 2021.
We study the problem of predicting future hourly earnings and task completion time for a crowdsourcing platform user who sees the list of available tasks and wants to select one of them to execute.
In this paper, we address the problem of labeling text images via CAPTCHA, where user identification is generally impossible. We propose a new algorithm to aggregate multiple guesses collected through CAPTCHA.
If you are using Toloka for research, please cite our work as follows:Anna Lioznova, Alexey Drutsa, Vladimir Kukushkin, and Anastasia Bezzubtseva. 2020. Prediction of Hourly Earnings and Completion Time on a Crowdsourcing Platform. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20). Association for Computing Machinery, New York, NY, USA, 3172–3182. https://doi.org/10.1145/3394486.3403369
We regularly hold tutorials and lead workshops at some of the biggest AI conferences around the globe.
This сrowd science workshop explores how a reimagined perspective on crowdsourcing platforms could provide a more equitable, fair, and rewarding experience.
This сrowd science workshop focuses on the best practices for efficient and trustworthy crowdsourcing.
High-Quality Data Labeling at Scale with Toloka workshop aims to provide a comprehensive picture of how crowdsourcing can be applied to real life AI production.
We shared some of the unique insights we have gained from six years of industry experience in efficient natural language data
In this tutorial, we presented a systematic view on using Human-in-the-Loop to obtain scalable offline evaluation processes and high-quality relevance judgements
On this сrowd science workshop we discussed key issues in preparing labeled data for machine learning, with a focus on remoteness, fairness, and mechanisms in the context of crowdsourcing for data collection and labeling.
We presented a data processing pipeline used for training self-driving cars. Participants gained practical experience launching an annotation project in Toloka.
We explored the practical aspects of how crowdsourcing can be applied to information retrieval. Participants learnt how to create a dataset with relevant products.
We explored the practice of efficient data collection via crowdsourcing: aggregation, incremental relabeling, and pricing.
We thrive on continuous improvement and international cooperation. Contact us on LinkedIn if you’d like to collaborate.
Toloka partners with universities across the world to incorporate crowd science techniques.