Solutions

Datasets

Research

Resources

Company

Talk to us

Blog

Explore our updates, case studies,
technology articles and insights.

All

News

Insights

Customer cases

Essential ML Guide

Filters

Beyond Next-Token Prediction: How Post-Training Teaches LLMs to Reason

Jul 1, 2025

Agent Evaluation: Why Simulated Environments are the New Frontier for Data

Jun 17, 2025

Toxicity detection: Why we still need human-labeled data

Jun 2, 2025

Evaluating Model Reasoning with Rubrics: Building a Domain-Specific Evaluation Dataset

May 27, 2025

Human-powered evaluation: Actionable feedback for next‑gen video diffusion models

May 20, 2025

Standardizing AI safety with MLCommons

May 15, 2025

Toloka Fuels Next Stage of Growth with Investment Led by Bezos Expeditions

May 6, 2025

AI agents under attack: A case study on advanced agent red-teaming

Apr 28, 2025

Introducing JEEM: A new benchmark for evaluating low-resource Arabic dialects

Apr 14, 2025

The personality paradox: Teaching AI agents to act like real people

Apr 10, 2025

Fixing SWE-bench: A Smarter Way to Evaluate Coding AI

Mar 17, 2025

LLM Evaluation In Action: Should You Trust Automated Metrics or Human Judgment?

Mar 3, 2025

Toloka’s Commitment to Responsible AI: How We Prioritize Ethics, Safety, and Excellence

Feb 27, 2025

Introducing Toloka’s Bug Bounty Program: Strengthening Security with Ethical Hacking

Feb 12, 2025

R1 is not on par with o1, and the difference is qualitative, not quantitative

Feb 12, 2025

All

News

Insights

Customer cases

Essential ML Guide

Filters

Beyond Next-Token Prediction: How Post-Training Teaches LLMs to Reason

Jul 1, 2025

Agent Evaluation: Why Simulated Environments are the New Frontier for Data

Jun 17, 2025

Toxicity detection: Why we still need human-labeled data

Jun 2, 2025

Evaluating Model Reasoning with Rubrics: Building a Domain-Specific Evaluation Dataset

May 27, 2025

Human-powered evaluation: Actionable feedback for next‑gen video diffusion models

May 20, 2025

Standardizing AI safety with MLCommons

May 15, 2025

Toloka Fuels Next Stage of Growth with Investment Led by Bezos Expeditions

May 6, 2025

AI agents under attack: A case study on advanced agent red-teaming

Apr 28, 2025

Introducing JEEM: A new benchmark for evaluating low-resource Arabic dialects

Apr 14, 2025

The personality paradox: Teaching AI agents to act like real people

Apr 10, 2025

Fixing SWE-bench: A Smarter Way to Evaluate Coding AI

Mar 17, 2025

LLM Evaluation In Action: Should You Trust Automated Metrics or Human Judgment?

Mar 3, 2025

Toloka’s Commitment to Responsible AI: How We Prioritize Ethics, Safety, and Excellence

Feb 27, 2025

Introducing Toloka’s Bug Bounty Program: Strengthening Security with Ethical Hacking

Feb 12, 2025

R1 is not on par with o1, and the difference is qualitative, not quantitative

Feb 12, 2025

All

News

Insights

Customer cases

Essential ML Guide

Filters

Beyond Next-Token Prediction: How Post-Training Teaches LLMs to Reason

Jul 1, 2025

Agent Evaluation: Why Simulated Environments are the New Frontier for Data

Jun 17, 2025

Toxicity detection: Why we still need human-labeled data

Jun 2, 2025

Evaluating Model Reasoning with Rubrics: Building a Domain-Specific Evaluation Dataset

May 27, 2025

Human-powered evaluation: Actionable feedback for next‑gen video diffusion models

May 20, 2025

Standardizing AI safety with MLCommons

May 15, 2025

Toloka Fuels Next Stage of Growth with Investment Led by Bezos Expeditions

May 6, 2025

AI agents under attack: A case study on advanced agent red-teaming

Apr 28, 2025

Introducing JEEM: A new benchmark for evaluating low-resource Arabic dialects

Apr 14, 2025

The personality paradox: Teaching AI agents to act like real people

Apr 10, 2025

Fixing SWE-bench: A Smarter Way to Evaluate Coding AI

Mar 17, 2025

LLM Evaluation In Action: Should You Trust Automated Metrics or Human Judgment?

Mar 3, 2025

Toloka’s Commitment to Responsible AI: How We Prioritize Ethics, Safety, and Excellence

Feb 27, 2025

Introducing Toloka’s Bug Bounty Program: Strengthening Security with Ethical Hacking

Feb 12, 2025

R1 is not on par with o1, and the difference is qualitative, not quantitative

Feb 12, 2025

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.