The Toloka self-service platform just got better

on June 2, 2026

Toloka Arena is live. See how your model ranks.

Learn more

We launched the Toloka self-service platform earlier to give AI and ML teams faster access to high-quality human data without the delays and operational overhead that often slow projects down. You can get started when you need to, instead of waiting for procurement processes or extra layers of coordination.

Since its launch, teams have been using the platform for increasingly advanced evaluation and data collection work. The latest updates to the Toloka self-service platform are designed to give you more control, and significantly expanded capabilities – from launching full multi-stage pipelines, not just single projects, to watching your data flow through every stage in real time.

Here’s what’s new.

A smarter way to build pipelines with agent builder

The biggest update is how pipelines get built. Instead of configuring one project at a time, you describe your full data goal once, and the agent builds your entire collection and annotation pipeline automatically.

When you submit your project brief, you land in a unified project space: Pipelines, Datasets, and Secrets, all organized in one place. The agent gets to work immediately, asking a short set of clarifying questions before it builds. Things like how do you want your pipeline structure, workforce split, consent flows, regional storage. These aren't cosmetic choices, they're real architectural decisions, and the agent uses your answers to build exactly what you need rather than a generic starting point you'd have to rework.

Once the pipeline is ready, you review it before anything goes live. Each node in the pipeline is fully configured to include data structure, task UI, quality criteria, contributor guidelines, audience filters, pricing, and QA method. You walk through it, make any changes, and deploy when it looks right. If something needs adjusting after the fact, describe the change in plain language and the agent updates that node without touching the rest of the pipeline.

The platform also supports workflows where multiple contributors validate the same task through majority voting.

We put together a full walkthrough.

[insert video]

What teams are building on the platform

From a French institutional finance benchmark to a large-scale robotics dataset, the platform is already being used for highly specialized workflows. Recent projects have included evaluating AI-generated museum audioguides with native Korean speakers, validating legal answers against source documents, and collecting multimodal training data at global scale.

Building a finance benchmark that frontier models can’t shortcut

One team used the platform to build a French institutional finance benchmark entirely through self-serve workflows. The dataset covered everything from MiFID II and EMIR through to Basel III/IV and quantitative markets, with Q&A pairs written from the perspective of real practitioners working in the field. The questions needed to go beyond surface-level knowledge and test the kind of reasoning used in real institutional finance work. Domain experts on the platform took care of both creation and validation from end to end.

Validating legal answers down to the page citation

Another project focused on legal validation for an international RAG benchmark, where every answer needed to be grounded against DIFC source documents with exact page-level citation sufficiency. Human expert reviewers ran validation throughout the process, with every decision tied back to specific source text.

Collecting robotics training data at global scale

For HomER, an open-source egocentric robotics dataset now available on Hugging Face, the platform was used to collect head-mounted video across 17 household task categories from more than 1,800 contributors worldwide, with the entire workflow managed through the self-service platform. Automated QA checks enforced strict capture requirements throughout collection, including perspective consistency and hand visibility thresholds

Evaluating AI-generated audioguides with native speakers

When an AI startup needed to evaluate Korean-language museum audioguides, they turned to the platform to gather feedback from native Korean speakers. The project was set up through self-service workflows and completed within hours. Human feedback helped the team understand how the content would be received by the people it was designed for, providing insight that automated evaluation alone couldn't deliver.

Built for the most ambitious workflows

The platform is evolving beyond annotation tooling into infrastructure for complex evaluation and multimodal data collection workflows. The latest updates reduce much of the operational coordination behind those projects, helping you move from setup to production far more efficiently.

Self-serve access is live.

Get started with your first project

Subscribe to Toloka news

Case studies, product news, and other articles straight to your inbox.