Skip to Content
Solution PlaysPlay 41: Play 41 β€” AI Red Teaming

Play 41 β€” AI Red Teaming

Automated adversarial testing of AI systems β€” systematic jailbreak probing, prompt injection simulation, data exfiltration detection, encoding bypass testing, multi-turn escalation attacks, and compliance-ready safety scorecards for EU AI Act, NIST AI RMF, and OWASP LLM Top 10.

Architecture

ComponentAzure ServicePurpose
Attacker ModelAzure OpenAI (GPT-4o)Generate diverse adversarial prompts
Judge ModelAzure OpenAI (GPT-4o-mini)Evaluate attack success/failure
Safety ScoringAzure Content SafetyIndependent severity scoring (violence, hate, sexual, self-harm)
OrchestratorAzure Container AppsRun attack suites, manage campaigns
SecretsAzure Key VaultAPI keys for attacker and target
TelemetryApplication InsightsTrack attack results, detection rates

πŸ“ Full architecture details

AspectPlay 30 (AI Security)Play 41 (Red Teaming)Play 10 (Content Moderation)
FocusDefensive security controlsOffensive adversarial testingContent filtering
ApproachProtection layersAttack simulationSeverity classification
OutputSecurity postureVulnerability report + scorecardModerated content
TechniquesRBAC, encryption, PIIJailbreak, injection, exfiltrationText/image classification
ComplianceSecurity controls auditEU AI Act, NIST RMF, OWASP LLMContent policy enforcement
CadenceContinuous monitoringPeriodic scan campaignsReal-time per-request

DevKit Structure

41-ai-red-teaming/ β”œβ”€β”€ agent.md # Root orchestrator with handoffs β”œβ”€β”€ .github/ β”‚ β”œβ”€β”€ copilot-instructions.md # Domain knowledge (<150 lines) β”‚ β”œβ”€β”€ agents/ β”‚ β”‚ β”œβ”€β”€ builder.agent.md # Attack framework + generators β”‚ β”‚ β”œβ”€β”€ reviewer.agent.md # Coverage gaps + OWASP mapping β”‚ β”‚ └── tuner.agent.md # Detection tuning + false positives β”‚ β”œβ”€β”€ prompts/ β”‚ β”‚ β”œβ”€β”€ deploy.prompt.md # Deploy red team framework β”‚ β”‚ β”œβ”€β”€ test.prompt.md # Run attack suites β”‚ β”‚ β”œβ”€β”€ review.prompt.md # Audit coverage gaps β”‚ β”‚ └── evaluate.prompt.md # Generate vulnerability report β”‚ β”œβ”€β”€ skills/ β”‚ β”‚ β”œβ”€β”€ deploy-ai-red-teaming/ # Full deployment with attacker + judge β”‚ β”‚ β”œβ”€β”€ evaluate-ai-red-teaming/ # Coverage, detection, multi-turn, safety β”‚ β”‚ └── tune-ai-red-teaming/ # Attack diversity, severity, regression β”‚ └── instructions/ β”‚ └── ai-red-teaming-patterns.instructions.md β”œβ”€β”€ config/ # TuneKit β”‚ β”œβ”€β”€ openai.json # Attacker + judge model settings β”‚ β”œβ”€β”€ attacks.json # Attack categories, techniques, volumes β”‚ β”œβ”€β”€ guardrails.json # Detection thresholds, severity criteria β”‚ └── compliance.json # EU AI Act, NIST RMF, OWASP mapping β”œβ”€β”€ infra/ # Bicep IaC β”‚ β”œβ”€β”€ main.bicep β”‚ └── parameters.json └── spec/ # SpecKit └── fai-manifest.json

Quick Start

# 1. Deploy red team infrastructure /deploy # 2. Run attack suite against target /test # 3. Review coverage and OWASP mapping /review # 4. Generate vulnerability scorecard /evaluate

Cost

ServiceDevProdEnterprise
Azure AI Foundry$0 (Basic)$50 (Standard)$150 (Standard HA)
AI Content Safety$0 (Free)$60 (Standard S0)$200 (Standard S0)
Azure OpenAI$60 (PAYG)$400 (PAYG)$1,200 (PTU)
Azure Functions$0 (Consumption)$15 (Consumption)$120 (Premium EP1)
Cosmos DB$5 (Serverless)$60 (800 RU/s)$350 (4000 RU/s)
Blob Storage$2 (Hot LRS)$15 (Hot LRS)$50 (Hot GRS+WORM)
Key Vault$1 (Standard)$5 (Standard)$15 (Premium HSM)
Application Insights$0 (Free)$20 (Pay-per-GB)$80 (Pay-per-GB)
Total$68/mo$625/mo$2,165/mo

πŸ’° Full cost breakdown

Key Metrics

MetricTargetDescription
Attack Success Rate< 5%Attacks that bypass target defenses
Detection Rate> 95%Attacks correctly identified
False Positive Rate< 3%Benign prompts flagged as attacks
OWASP Coverage10/10All LLM Top 10 vulnerabilities tested
Multi-turn Resistance> 85%Survive 5-turn escalation attacks
Cost per Full Scan< $50220 attacks across all categories

WAF Alignment

PillarImplementation
SecurityOWASP LLM Top 10 coverage, jailbreak detection, data exfiltration prevention
Responsible AIBias elicitation testing, content safety scoring, EU AI Act compliance
ReliabilityRegression suite prevents vulnerability reappearance
Cost Optimizationgpt-4o-mini for judging, adaptive difficulty, cached regression
Operational ExcellenceAutomated scans, compliance scorecards, regression tracking
Performance EfficiencyParallel attack execution, batched scoring

FAI Manifest

FieldValue
Play41-ai-red-teaming
Version1.0.0
KnowledgeT2-Responsible-AI, T3-Production-Patterns, R1-Prompt-Patterns, O2-Agent-Coding
WAF Pillarssecurity, responsible-ai, operational-excellence
Groundednessβ‰₯ 85%
Safety0 violations max
Last updated on