Play 27 β AI Data Pipeline π
ETL/ELT with LLM enrichment β classify, extract, and summarize at scale.
Ingest raw data, clean and validate, then enrich with GPT-4o-mini for classification, entity extraction, and summarization. Data Factory orchestrates the pipeline, data flows through lake zones (raw β staging β enriched β serving), with data quality checks and PII detection at every stage.
Quick Start
cd solution-plays/27-ai-data-pipeline
az deployment group create -g $RG -f infra/main.bicep -p infra/parameters.json
code . # Use @builder for ETL/enrichment, @reviewer for data quality audit, @tuner for costArchitecture
π See architecture.md for full data flow, service roles, security architecture, and scaling tables.
Enrichment Types
| Type | Model | Cost/1K Records |
|---|---|---|
| Classification | gpt-4o-mini | ~$0.15 |
| Entity extraction | gpt-4o-mini | ~$0.20 |
| Summarization | gpt-4o-mini | ~$0.30 |
| Sentiment | gpt-4o-mini | ~$0.10 |
Key Metrics
- Enrichment accuracy: β₯90% Β· Data quality: β₯95% Β· Throughput: β₯10K records/hr Β· PII recall: β₯99%
DevKit (Data Engineering-Focused)
| Primitive | What It Does |
|---|---|
| 3 agents | Builder (ETL/enrichment/ADF), Reviewer (quality/idempotency/PII), Tuner (batch/parallelism/cost) |
| 3 skills | Deploy (104 lines), Evaluate (105 lines), Tune (103 lines) |
| 4 prompts | /deploy (ETL pipeline), /test (execution), /review (data quality), /evaluate (enrichment accuracy) |
Cost
π° See cost.json for full pricing breakdown with SKUs, notes, and optimization tips.
| Service | Purpose | Dev | Prod | Enterprise |
|---|---|---|---|---|
| Azure OpenAI | GPT-4o-mini for classification + entity extraction | $40 | $250 | $900 |
| Data Factory | Pipeline orchestration and scheduling | $15 | $120 | $400 |
| Cosmos DB | Enriched records, entity graphs, metadata | $5 | $80 | $400 |
| Event Hubs | Real-time streaming ingestion | $12 | $75 | $300 |
| Azure Functions | Event-triggered classification + enrichment | $0 | $25 | $150 |
| Blob Storage | Raw data landing zone (CSV, JSON, Parquet) | $2 | $25 | $80 |
| Key Vault | API keys, connection secrets | $1 | $3 | $10 |
| App Insights | Pipeline latency, classification accuracy | $0 | $25 | $100 |
| Log Analytics | Pipeline run history, error diagnostics | $0 | $15 | $50 |
| Total | $75 | $618 | $2,390 |
π Full docs Β· π frootai.dev/solution-plays/27-ai-data-pipelineΒ
FAI Manifest
| Field | Value |
|---|---|
| Play | 27-ai-data-pipeline |
| Version | 1.0.0 |
| Knowledge | T1-Fine-Tuning-MLOps, T3-Production-Patterns, R2-RAG-Architecture |
| WAF Pillars | security, reliability, cost-optimization, operational-excellence, responsible-ai |
| Groundedness | β₯ 85% |
| Safety | 0 violations max |
Last updated on