Just as we believe in the power of crowdsourcing for data labeling, we also have confidence in the developer community to improve data processing technologies that can propel the AI industry forward. Toloka supports the development of open source solutions and we make our tools available to everyone under the Apache License version 2.0 for collaborative development.
Browse through our projects and find code to use or modify for your needs. We welcome your contributions!Â
Get the power of the Toloka API from Jupyter notebooks for automating data labeling pipelines.
Aggregation methods, datasets, and other tools to simplify working with crowdsourced data.
An easy-to-use tool for generating python stubs.
A lightweight interface to the Toloka API that works in any Java environment.
Launch a data labeling project with plain Python code — no expertise needed.
Integrations and datasets
Integrate Toloka with Apache Airflow for managing pipelines.
Integrate Toloka with Prefect for managing pipelines.
An evaluation dataset for crowdsourced pairwise comparisons.
A benchmark dataset for crowdsourced audio transcription.
Toloka Global Community is a space for exchanging ideas, code, and expertise. Together we can shape a framework of excellence across all stages of the machine learning lifecycle.
Learn more about our community and follow our updates on open source projects, or jump right in on Github and Slack.
Keep an eye on our events page for meetups and speakers on open-source projects and other hot topics.