Play 41 β AI Red Teaming
Automated adversarial testing of AI systems β systematic jailbreak probing, prompt injection simulation, data exfiltration detection, encoding bypass testing, multi-turn escalation attacks, and compliance-ready safety scorecards for EU AI Act, NIST AI RMF, and OWASP LLM Top 10.
Architecture
| Component | Azure Service | Purpose |
|---|---|---|
| Attacker Model | Azure OpenAI (GPT-4o) | Generate diverse adversarial prompts |
| Judge Model | Azure OpenAI (GPT-4o-mini) | Evaluate attack success/failure |
| Safety Scoring | Azure Content Safety | Independent severity scoring (violence, hate, sexual, self-harm) |
| Orchestrator | Azure Container Apps | Run attack suites, manage campaigns |
| Secrets | Azure Key Vault | API keys for attacker and target |
| Telemetry | Application Insights | Track attack results, detection rates |
π Full architecture details
How It Differs from Related Plays
| Aspect | Play 30 (AI Security) | Play 41 (Red Teaming) | Play 10 (Content Moderation) |
|---|---|---|---|
| Focus | Defensive security controls | Offensive adversarial testing | Content filtering |
| Approach | Protection layers | Attack simulation | Severity classification |
| Output | Security posture | Vulnerability report + scorecard | Moderated content |
| Techniques | RBAC, encryption, PII | Jailbreak, injection, exfiltration | Text/image classification |
| Compliance | Security controls audit | EU AI Act, NIST RMF, OWASP LLM | Content policy enforcement |
| Cadence | Continuous monitoring | Periodic scan campaigns | Real-time per-request |
DevKit Structure
41-ai-red-teaming/
βββ agent.md # Root orchestrator with handoffs
βββ .github/
β βββ copilot-instructions.md # Domain knowledge (<150 lines)
β βββ agents/
β β βββ builder.agent.md # Attack framework + generators
β β βββ reviewer.agent.md # Coverage gaps + OWASP mapping
β β βββ tuner.agent.md # Detection tuning + false positives
β βββ prompts/
β β βββ deploy.prompt.md # Deploy red team framework
β β βββ test.prompt.md # Run attack suites
β β βββ review.prompt.md # Audit coverage gaps
β β βββ evaluate.prompt.md # Generate vulnerability report
β βββ skills/
β β βββ deploy-ai-red-teaming/ # Full deployment with attacker + judge
β β βββ evaluate-ai-red-teaming/ # Coverage, detection, multi-turn, safety
β β βββ tune-ai-red-teaming/ # Attack diversity, severity, regression
β βββ instructions/
β βββ ai-red-teaming-patterns.instructions.md
βββ config/ # TuneKit
β βββ openai.json # Attacker + judge model settings
β βββ attacks.json # Attack categories, techniques, volumes
β βββ guardrails.json # Detection thresholds, severity criteria
β βββ compliance.json # EU AI Act, NIST RMF, OWASP mapping
βββ infra/ # Bicep IaC
β βββ main.bicep
β βββ parameters.json
βββ spec/ # SpecKit
βββ fai-manifest.jsonQuick Start
# 1. Deploy red team infrastructure
/deploy
# 2. Run attack suite against target
/test
# 3. Review coverage and OWASP mapping
/review
# 4. Generate vulnerability scorecard
/evaluateCost
| Service | Dev | Prod | Enterprise |
|---|---|---|---|
| Azure AI Foundry | $0 (Basic) | $50 (Standard) | $150 (Standard HA) |
| AI Content Safety | $0 (Free) | $60 (Standard S0) | $200 (Standard S0) |
| Azure OpenAI | $60 (PAYG) | $400 (PAYG) | $1,200 (PTU) |
| Azure Functions | $0 (Consumption) | $15 (Consumption) | $120 (Premium EP1) |
| Cosmos DB | $5 (Serverless) | $60 (800 RU/s) | $350 (4000 RU/s) |
| Blob Storage | $2 (Hot LRS) | $15 (Hot LRS) | $50 (Hot GRS+WORM) |
| Key Vault | $1 (Standard) | $5 (Standard) | $15 (Premium HSM) |
| Application Insights | $0 (Free) | $20 (Pay-per-GB) | $80 (Pay-per-GB) |
| Total | $68/mo | $625/mo | $2,165/mo |
π° Full cost breakdown
Key Metrics
| Metric | Target | Description |
|---|---|---|
| Attack Success Rate | < 5% | Attacks that bypass target defenses |
| Detection Rate | > 95% | Attacks correctly identified |
| False Positive Rate | < 3% | Benign prompts flagged as attacks |
| OWASP Coverage | 10/10 | All LLM Top 10 vulnerabilities tested |
| Multi-turn Resistance | > 85% | Survive 5-turn escalation attacks |
| Cost per Full Scan | < $50 | 220 attacks across all categories |
WAF Alignment
| Pillar | Implementation |
|---|---|
| Security | OWASP LLM Top 10 coverage, jailbreak detection, data exfiltration prevention |
| Responsible AI | Bias elicitation testing, content safety scoring, EU AI Act compliance |
| Reliability | Regression suite prevents vulnerability reappearance |
| Cost Optimization | gpt-4o-mini for judging, adaptive difficulty, cached regression |
| Operational Excellence | Automated scans, compliance scorecards, regression tracking |
| Performance Efficiency | Parallel attack execution, batched scoring |
FAI Manifest
| Field | Value |
|---|---|
| Play | 41-ai-red-teaming |
| Version | 1.0.0 |
| Knowledge | T2-Responsible-AI, T3-Production-Patterns, R1-Prompt-Patterns, O2-Agent-Coding |
| WAF Pillars | security, responsible-ai, operational-excellence |
| Groundedness | β₯ 85% |
| Safety | 0 violations max |
Last updated on