Play 52 β AI API Gateway V2
Intelligent AI API gateway β multi-provider routing (Azure OpenAI, Anthropic, Google) with priority-based failover, semantic caching via Redis (embedding similarity), circuit breakers, complexity-based model routing (simpleβmini, complexβ4o), per-consumer token metering, rate limiting tiers, and cost attribution dashboards.
Architecture
Full architecture details:
architecture.md
How It Differs from Related Plays
| Aspect | Play 14 (Cost-Optimized Gateway) | Play 52 (AI Gateway V2) |
|---|---|---|
| Providers | Azure OpenAI only | Multi-provider (OpenAI + Anthropic + Google) |
| Caching | Exact-match | Semantic caching (embedding similarity) |
| Routing | Cost-based model selection | Complexity-based + priority failover |
| Resilience | Basic retry | Circuit breakers with half-open recovery |
| Metering | Basic token counting | Per-consumer with cost attribution |
| Rate Limiting | Simple RPM | Tiered (Free/Dev/Pro/Enterprise) + burst |
DevKit Structure
52-ai-api-gateway-v2/
βββ agent.md # Root orchestrator with handoffs
βββ .github/
β βββ copilot-instructions.md # Domain knowledge (<150 lines)
β βββ agents/
β β βββ builder.agent.md # Gateway + routing + caching
β β βββ reviewer.agent.md # Failover + security + rate limits
β β βββ tuner.agent.md # Cache TTL + routing + cost
β βββ prompts/
β β βββ deploy.prompt.md # Deploy gateway + providers
β β βββ test.prompt.md # Test failover + cache
β β βββ review.prompt.md # Audit security + circuits
β β βββ evaluate.prompt.md # Measure cost savings
β βββ skills/
β β βββ deploy-ai-api-gateway-v2/ # APIM + Redis + multi-provider
β β βββ evaluate-ai-api-gateway-v2/ # Cache hit, failover, cost, latency
β β βββ tune-ai-api-gateway-v2/ # Provider priority, cache, circuits
β βββ instructions/
β βββ ai-api-gateway-v2-patterns.instructions.md
βββ config/ # TuneKit
β βββ openai.json # Provider endpoints, model costs
β βββ guardrails.json # Cache, circuit breaker, rate limits
β βββ model-comparison.json # Cost/quality/latency per provider
βββ infra/ # Bicep IaC
β βββ main.bicep
β βββ parameters.json
βββ spec/ # SpecKit
βββ fai-manifest.jsonQuick Start
# 1. Deploy gateway with providers
/deploy
# 2. Test failover and caching
/test
# 3. Audit security and circuit breakers
/review
# 4. Measure cost savings and cache hit rate
/evaluateKey Metrics
| Metric | Target | Description |
|---|---|---|
| Failover Success | > 99% | Automatic provider switch on failure |
| Cache Hit Rate | > 30% | Semantic cache responses served |
| Cost Reduction | > 50% | vs single-provider no-cache baseline |
| P95 Latency (cached) | < 500ms | Cached response delivery |
| Error Rate | < 1% | 4xx + 5xx responses |
| Rate Limit Accuracy | 100% | Quota enforcement per consumer |
Cost Estimate
| Service | Dev | Prod | Enterprise |
|---|---|---|---|
| Azure API Management | $50 | $280 | $1,400 |
| Azure OpenAI | $60 | $600 | $3,000 |
| Azure Cache for Redis | $40 | $160 | $700 |
| Azure Monitor | $0 | $50 | $150 |
| Azure App Configuration | $0 | $35 | $70 |
| Cosmos DB | $5 | $75 | $350 |
| Key Vault | $1 | $5 | $15 |
| Application Insights | $0 | $30 | $100 |
| Total | $156 | $1,235 | $5,785 |
Detailed breakdown with SKUs and optimization tips:
cost.jsonΒ· Azure Pricing CalculatorΒ
WAF Alignment
| Pillar | Implementation |
|---|---|
| Reliability | Multi-provider failover, circuit breakers, half-open recovery |
| Security | Provider keys in Key Vault, per-consumer API keys, APIM policies |
| Cost Optimization | Complexity routing (mini for simple), semantic caching, provider arbitrage |
| Performance Efficiency | <10ms cache lookup, parallel provider health checks |
| Operational Excellence | Per-consumer metering, cost dashboards, usage analytics |
| Responsible AI | Rate limiting prevents abuse, content safety at gateway level |
FAI Manifest
| Field | Value |
|---|---|
| Play | 52-ai-api-gateway-v2 |
| Version | 1.0.0 |
| Knowledge | T3-Production-Patterns, F2-LLM-Selection, R3-Deterministic-AI |
| WAF Pillars | security, performance-efficiency, cost-optimization, reliability |
| Groundedness | β₯ 85% |
| Safety | 0 violations max |
Last updated on