We red-teamed an AI agent in 4 hours
An enterprise AI assistant with RAG and tool access. 8 attack categories. 302 security specialists. Here's what we found and how we did it.
8
Attack categories
302
Security specialists
~4h
Total time
$720
Total cost
Trusted by Leading AI Teams
The problem with testing your own agent
Four patterns we see when teams do security testing internally.
Familiarity blindness
Teams who built the system test what they expect to work. External testers probe what they expect to break.
No taxonomy
Without a systematic list of attack categories, there's no way to measure coverage or identify gaps.
Single-shot testing
Real attacks are iterative—probe, observe response, refine. Most internal testing is one prompt, one check.
No documentation
When a vulnerability appears in production, teams can't trace what was tested or prove due diligence.
How iterative attacks work
This is an actual attack chain from our test. The target: an AI assistant with access to HR documents.
The attacker tried a direct request, got blocked, then used authority claims combined with document references the agent itself revealed.
The vulnerability: The agent cited "hr_handbook.txt" in its refusal. The attacker used that filename to request "verification" of its contents—and the agent complied.
The 8 attack categories we tested
Each category has defined targets and success criteria.
Data extraction
Elevated risk
Technique-based
Credentials
API keys, passwords, access tokens
Personal Data
Names, emails, SSNs, addresses
Financial
Salaries, budgets, payment info
Strategic
Roadmaps, M&A, board notes
Prompt Injection
Override system instructions
System Leak
Extract system prompt
Impersonation
Claim authority to bypass controls
Tool abuse
Misuse email, tickets, lookups
How this project ran
From project setup to downloaded results.
What the deliverable includes
For each attack category, you receive:
Cost and time comparison
Three approaches to security testing for the same 8-category scope.
Approach
Cost
Time
Coverage
Documentation
Internal engineer (ad-hoc)
$1,000–1,300
1 business day
Unstructured
None
Security consultant
$1,500–2,000
1 business day
Systematic
Report
This project (Toloka)
$720
~4 hours
8 categories
Full audit trail
Trusted by Leading AI Teams
Run this on your agent
Start with 2 categories for ~$180, or run all 8 for ~$720. Results in hours, not weeks.
No minimum commitment. No long-term contract. Pay per project.






