To facilitate research and integration projects, we have developed a few open source libraries. Anyone may feel free to use our libraries and contribute to the projects.

Toloka Kit

This is a Python library that allows ML engineers and data scientists to scale the data labeling process and control it programmatically.

Benefits of Toloka Kit:
  • Make your data labeling processes reproducible.
  • Integrate data labeling processes with your ML environment.

If you want to try it out, use our sample labeling pipeline for selecting road signs in images, which we presented at CVPR 2020. Even though it's very simple, this pipeline plays an essential role in image segmentation for self-driving vehicles.

Open on GitHub Documentation

Crowd Kit

This is a Python library that implements most of the popular crowdsourcing algorithms.

Crowd Kit includes:
  • A variety of methods for aggregating Toloker responses.
  • Metrics for evaluating the quality of assignments and Tolokers.
  • Quality control methods.

The library has a friendly and easy-to-use interface and works with Pandas dataframes.

Open on GitHub