Restaurant operations
Multi-brand QSR internal store operations support: equipment incidents, food safety, delivery discrepancies and credits, time correction workflows, network/drive-through incidents, approval routing, notifications, and case/status coordination.
140
Test cases
27
Agent tools
Domain agentic intelligence index
We test models on private, non-contaminated tasks.
Here's what we found.
Composite pass^5 score across Tool use evaluations (higher is better).
Error bars show 95% confidence intervals.
Scaling curves
K = 1…5 runs
pass^k — Consistency
% tasks passed in every one of k runs.
Task difficulty distribution
Tasks bucketed by aggregate success rate
Buckets show difficulty tiers based on aggregate of models results.
100%
0 of 110 tasks (0%)
0
75%+
0 of 110 tasks (0%)
0
50%+
38 of 110 tasks (35%)
38
25%+
69 of 110 tasks (63%)
69
0%
3 of 110 tasks (3%)
3
Example task
User Request
Correct Agent Solution
What Is Tested
Trusted by Leading AI Teams