Restaurant operations 

Multi-brand QSR internal store operations support: equipment incidents, food safety, delivery discrepancies and credits, time correction workflows, network/drive-through incidents, approval routing, notifications, and case/status coordination.

100Test cases available OTS
27Agent tools

Domain agentic intelligence index

We test models on private, non-contaminated tasks.
Here's what we found.

Composite pass^5 score (%)
Last updated: June 23
Composite pass^5 score (%)
Last updated: June 23

Composite pass^5 score across Tool use evaluations (higher is better).
Error bars show 95% confidence intervals.

Scaling curves

K = 1…5 runs

pass^k — Consistency

% tasks passed in every one of k runs.

0%10%20%30%40%50%60%70%80%90%k=1k=2k=3k=4k=5
Legend

Task difficulty distribution

Tasks bucketed by aggregate success rate

Buckets show difficulty tiers based on aggregate of models results on the benchmarking subset.

Example task

User Request

Correct Agent Solution

What Is Tested

Trusted by Leading AI Teams

Bank - internal HR dataset available for purchase