Skip to Content
Solution PlaysPlay 27: Play 27 β€” AI Data Pipeline πŸ“Š

Play 27 β€” AI Data Pipeline πŸ“Š

ETL/ELT with LLM enrichment β€” classify, extract, and summarize at scale.

Ingest raw data, clean and validate, then enrich with GPT-4o-mini for classification, entity extraction, and summarization. Data Factory orchestrates the pipeline, data flows through lake zones (raw β†’ staging β†’ enriched β†’ serving), with data quality checks and PII detection at every stage.

Quick Start

cd solution-plays/27-ai-data-pipeline az deployment group create -g $RG -f infra/main.bicep -p infra/parameters.json code . # Use @builder for ETL/enrichment, @reviewer for data quality audit, @tuner for cost

Architecture

πŸ“ See architecture.md for full data flow, service roles, security architecture, and scaling tables.

Enrichment Types

TypeModelCost/1K Records
Classificationgpt-4o-mini~$0.15
Entity extractiongpt-4o-mini~$0.20
Summarizationgpt-4o-mini~$0.30
Sentimentgpt-4o-mini~$0.10

Key Metrics

  • Enrichment accuracy: β‰₯90% Β· Data quality: β‰₯95% Β· Throughput: β‰₯10K records/hr Β· PII recall: β‰₯99%

DevKit (Data Engineering-Focused)

PrimitiveWhat It Does
3 agentsBuilder (ETL/enrichment/ADF), Reviewer (quality/idempotency/PII), Tuner (batch/parallelism/cost)
3 skillsDeploy (104 lines), Evaluate (105 lines), Tune (103 lines)
4 prompts/deploy (ETL pipeline), /test (execution), /review (data quality), /evaluate (enrichment accuracy)

Cost

πŸ’° See cost.json for full pricing breakdown with SKUs, notes, and optimization tips.

ServicePurposeDevProdEnterprise
Azure OpenAIGPT-4o-mini for classification + entity extraction$40$250$900
Data FactoryPipeline orchestration and scheduling$15$120$400
Cosmos DBEnriched records, entity graphs, metadata$5$80$400
Event HubsReal-time streaming ingestion$12$75$300
Azure FunctionsEvent-triggered classification + enrichment$0$25$150
Blob StorageRaw data landing zone (CSV, JSON, Parquet)$2$25$80
Key VaultAPI keys, connection secrets$1$3$10
App InsightsPipeline latency, classification accuracy$0$25$100
Log AnalyticsPipeline run history, error diagnostics$0$15$50
Total$75$618$2,390

πŸ“– Full docs Β· 🌐 frootai.dev/solution-plays/27-ai-data-pipelineΒ 

FAI Manifest

FieldValue
Play27-ai-data-pipeline
Version1.0.0
KnowledgeT1-Fine-Tuning-MLOps, T3-Production-Patterns, R2-RAG-Architecture
WAF Pillarssecurity, reliability, cost-optimization, operational-excellence, responsible-ai
Groundednessβ‰₯ 85%
Safety0 violations max
Last updated on