Success Stories
Learn how companies around the world are pushing the boundaries of AI with LLM post-training and evaluation

Frontier Models can win at IMO, but they still can't check their own assumptions.

The human difference in high-stakes AI evaluation

HomER: Building an open-source egocentric robotics dataset with Toloka

Building Shopify's Product Catalog at AI Speed

Supporting the launch of JetBrains’ Developer Productivity AI Arena
How Toloka helped poolside define and measure AI quality for developers
From word docs to data analysis: Evaluating AI agent performance across everyday apps
Creating domain-ready datasets: how Toloka's hybrid approach generates realistic and high-quality data
Detecting hidden harm in long contexts: how Toloka built an advanced safety dataset