T2: Responsible AI
You are part of the trust chain. Every infrastructure decision you make โ from region selection to content filtering configuration โ directly impacts whether your AI system is safe, fair, and trustworthy. Responsible AI isn't a checkbox; it's a design discipline woven into every layer. For content safety implementation patterns, see T3: Production Patterns. For grounding and accuracy, see R3: Deterministic AI.
Microsoft's 6 Responsible AI Principlesโ
| Principle | What It Means | Your Responsibility |
|---|---|---|
| Fairness | AI treats all people equitably | Test across demographics, monitor for bias |
| Reliability & Safety | AI performs as intended | Retry logic, fallbacks, circuit breakers |
| Privacy & Security | AI protects data and access | Managed Identity, Key Vault, RBAC, encryption |
| Inclusiveness | AI is accessible to everyone | Multi-language, accessibility, diverse testing |
| Transparency | People understand how AI works | Source citations, confidence scores, AI labels |
| Accountability | People are accountable for AI | Audit logs, human-in-the-loop, incident response |
Infrastructure Decisions That Impact Safetyโ
Every "infrastructure" choice is actually a safety decision:
| Decision | Safety Impact | Recommendation |
|---|---|---|
| Region selection | Data residency, compliance | Match to user geography + regulations |
| Content filtering | Blocks harmful outputs | Enable on ALL endpoints โ never disable |
| Logging strategy | Audit trail for incidents | Log all AI interactions (without PII) |
| Rate limiting | Prevents abuse and cost explosion | Per-user + per-tenant limits |
| Key management | Prevents unauthorized access | Key Vault + Managed Identity, never hardcode |
| RBAC | Least-privilege access | Separate roles for dev/deploy/admin |
| Private endpoints | Network isolation | Required for production PaaS services |
| Model selection | Capability vs risk tradeoff | Smaller models for narrow tasks (less hallucination) |
Azure AI Content Safetyโ
Azure AI Content Safety provides real-time detection across four harm categories:
User Input โโโถ [Input Filter] โโโถ [Model] โโโถ [Output Filter] โโโถ Response
โ โ
โผ โผ
Block/Flag Block/Flag
if severity โฅ threshold if severity โฅ threshold
| Category | Severity Scale | Default Block | Description |
|---|---|---|---|
| Hate | 0-6 | โฅ 2 | Discrimination, slurs, dehumanization |
| Self-Harm | 0-6 | โฅ 2 | Instructions or encouragement of self-harm |
| Sexual | 0-6 | โฅ 2 | Explicit sexual content |
| Violence | 0-6 | โฅ 2 | Graphic violence, weapons instructions |
Additional protections:
- Prompt Shields โ detect jailbreak and indirect prompt injection attempts
- Groundedness detection โ flag ungrounded claims in model outputs
- Protected material detection โ identify copyrighted text in outputs
:::info Content Safety Implementation
Configure content filtering in your guardrails.json:
{
"content_safety": {
"hate": { "threshold": 2, "action": "block" },
"self_harm": { "threshold": 2, "action": "block" },
"sexual": { "threshold": 2, "action": "block" },
"violence": { "threshold": 2, "action": "block" }
},
"prompt_shields": { "enabled": true },
"groundedness": { "enabled": true, "threshold": 4.0 }
}
:::
OWASP LLM Top 10 Risksโ
The OWASP Top 10 for LLM Applications identifies the most critical security risks:
| # | Risk | Mitigation |
|---|---|---|
| 1 | Prompt Injection | Input validation, Prompt Shields, system prompt isolation |
| 2 | Insecure Output Handling | Sanitize AI output before rendering, never exec AI output |
| 3 | Training Data Poisoning | Curate data sources, validate training sets |
| 4 | Model Denial of Service | Rate limiting, token budgets, timeout enforcement |
| 5 | Supply Chain Vulnerabilities | Pin model versions, audit dependencies |
| 6 | Sensitive Information Disclosure | PII detection, output filtering, data minimization |
| 7 | Insecure Plugin Design | Least-privilege tool access, input validation |
| 8 | Excessive Agency | Human-in-the-loop for critical actions, action confirmation |
| 9 | Overreliance | Confidence scores, source citations, user education |
| 10 | Model Theft | Private endpoints, access controls, monitoring |
EU AI Act Overviewโ
:::warning EU AI Act โ Know Your Risk Classification The EU AI Act entered into force in August 2024 with phased enforcement. If your AI system operates in the EU or serves EU users, you must classify it. High-risk systems face mandatory conformity assessments, transparency obligations, and human oversight requirements. Non-compliance penalties reach up to โฌ35M or 7% of global turnover. :::
| Risk Level | Examples | Requirements |
|---|---|---|
| Unacceptable | Social scoring, real-time biometric surveillance | Banned |
| High-Risk | Hiring, credit scoring, medical diagnosis, law enforcement | Conformity assessment, logging, human oversight |
| Limited Risk | Chatbots, deepfake generation | Transparency obligations (label as AI) |
| Minimal Risk | Spam filters, game AI | No specific requirements |
For most enterprise AI applications (RAG chatbots, document processing, IT assistants), you fall under limited risk โ requiring transparency labels. If your system influences decisions about people (hiring, lending, medical), it's likely high-risk.
Content Safety Pipelineโ
A production content safety pipeline has four stages:
1. INPUT FILTERING 2. MODEL GENERATION
โโ Prompt Shields โโ Content filter (built-in)
โโ PII detection โโ Token budget enforcement
โโ Input sanitization โโ System prompt guardrails
โโ Rate limiting
3. OUTPUT FILTERING 4. LOGGING & MONITORING
โโ Content Safety API โโ Log interaction (no PII)
โโ Groundedness check โโ Correlation ID tracking
โโ Citation verification โโ Alert on blocked content
โโ PII redaction โโ Audit trail retention
Evaluation for Trustโ
Responsible AI requires continuous evaluation, not one-time checks:
| Metric | Target | What It Measures |
|---|---|---|
| Groundedness | โฅ 4.0 / 5.0 | Are claims supported by provided context? |
| Relevance | โฅ 4.0 / 5.0 | Does the response address the question? |
| Coherence | โฅ 4.0 / 5.0 | Is the response logically consistent? |
| Safety | 0 violations | Are harmful content filters effective? |
| Fairness | < 5% variance | Do responses vary by demographic? |
# Evaluation pipeline example
from azure.ai.evaluation import GroundednessEvaluator, ContentSafetyEvaluator
groundedness = GroundednessEvaluator(model_config)
safety = ContentSafetyEvaluator(credential, azure_ai_project)
result = groundedness(
response="The contract requires 30-day payment terms.",
context="Section 4.2: Payment shall be made within 30 days...",
query="What are the payment terms?"
)
assert result["groundedness"] >= 4.0
Key Takeawaysโ
- You are the trust chain โ infrastructure choices are safety choices
- Enable content filtering everywhere โ never disable, even in dev
- Know your OWASP LLM risks โ prompt injection is #1 for a reason
- Classify under EU AI Act โ know your obligations before deployment
- Evaluate continuously โ groundedness โฅ 4.0, zero safety violations
Next: T3: Production Patterns โ taking AI from prototype to production with resilience, cost control, and monitoring.