← Blog
/

Test before you run, automate via API, pause anytime: what's new on Toloka
Toloka Arena is live. See how your model ranks.
Since we launched multi-stage pipeline support, teams have been using the Toloka self-service platform for progressively more complex data collection and evaluation work. The feedback has been consistent: the pipeline builder is powerful, but the steps around it, setting up, validating, monitoring, and automating, still take more manual effort than they should.
This update addresses exactly that. Every feature below is designed to cut the gap between having a good idea for a pipeline and running it confidently at scale.
Here's what's new.
Automate pipelines end-to-end with the Pipeline API
The platform now has a public API in beta, so everything you build in the interface can be operated programmatically.
The workflow is designed to keep the two sides cleanly separated: you design your pipeline once in the app, then use the API to copy that pipeline, feed it data, launch runs, monitor progress, and pull results, without ever touching the visual editor again. This is the right split. The visual builder is where you make architectural decisions; the API is where you run them reliably at scale.
The API is in beta, meaning endpoints and response shapes are still subject to change. It's available to all external users now. See the full reference in the documentation.
Agent Plan Mode: review before anything is built
Agent Plan Mode adds a checkpoint between brief and build: you see the full plan before anything gets configured.
When you submit a project brief, the agent studies the project and produces a structured plan with a checklist before touching a single node. You review it, approve each section, or leave comments on parts you want reworked. Only then does the agent start building. If something doesn't look right, describe the change and the agent revises that section.
Quorums: quality through consensus
Quorums allow the same item to be labeled by multiple contributors, with the platform automatically calculating consensus across their responses.
Quality is maintained through majority vote, with a contributor offboarded from the project automatically if they disagree with the majority four times in a row.
Self-check: test a node before the whole pipeline runs
Every node can now be tested individually on a single data sample, without launching the full pipeline.
Three check types are available: UI to see the task as a contributor would, Code to run the node's Python logic, and LLM QA to evaluate output against quality criteria with a pass/fail result. One click from inside the component, results appear right next to it. The agent can also generate checks automatically, catching misconfigured UI, bad quality criteria, or broken code before a single item goes through.
Cost estimation and flexible billing
Every pipeline run now shows a cost estimate before it launches: an expected cost and a maximum, broken down by component.
Billing has moved to pay-as-you-go. Top up your account with the estimated cost to launch a run, and funds are charged as work progresses. LLM QA is billed on actual checks performed, and failed or canceled components aren't charged. If your balance runs out mid-run, the run pauses rather than failing, so you can top up and pick back up.
Pause and resume a run
Running pipelines can now be paused and resumed without losing progress.
The pause is graceful: the current step finishes, then the remaining steps wait. Nothing is los. Resuming a run picks up exactly where it stopped. You can still cancel a paused run (irreversible, clears unfinished items), but pausing keeps everything intact.
Larger items: up to 1 MB per field
The item size limit has increased from 32 KB to 1 MB per field, removing a ceiling that was blocking projects with long texts, detailed JSON, or complex structured answers.
Two independent limits apply: 1 MB per field, and 1 MB on a node's total input. Exceeding the node input limit fails that node's run, so it's worth checking the total for items that are large across multiple fields.
Step performance metrics in run exports
Run results exports now include per-step metrics alongside input and output data: handle time, LLM QA time, number of attempts, and annotator IDs for generation and quorum steps. Included automatically, no toggle required.
Get started
All of these features are live. Try the platform now and see what it can do for your project. If you'd like to discuss your use case or need help getting started, we're happy to talk.
Subscribe to Toloka news
Case studies, product news, and other articles straight to your inbox.