Skip to Content
Solution PlaysPlay 19: Play 19 β€” Edge AI Phi-4 πŸ“±

Play 19 β€” Edge AI Phi-4 πŸ“±

On-device inference with Phi-4 SLM, ONNX Runtime quantization, and IoT Hub sync.

Run AI locally on edge devices β€” no cloud dependency for inference. Phi-4 is converted to ONNX, quantized to INT4/INT8 for memory-constrained devices, and served via ONNX Runtime. IoT Hub handles model updates and telemetry sync. Hybrid routing sends simple queries to edge (free) and complex ones to cloud (quality).

Quick Start

cd solution-plays/19-edge-ai-phi4 # Download and quantize Phi-4 python scripts/download_model.py --model microsoft/phi-4 python scripts/quantize.py --model models/phi4-onnx/ --output models/phi4-int4/ --bits 4 code . # Use @builder for ONNX/IoT, @reviewer for memory/privacy audit, @tuner for quantization

Architecture

πŸ“ Full architecture details

ComponentPurpose
Phi-4 (ONNX)Small language model for on-device inference
ONNX RuntimeCross-platform inference engine
Azure IoT HubModel updates + telemetry sync to cloud
Hybrid RouterEdge for simple queries, cloud fallback for complex

Device Compatibility

DeviceRAMRecommended QuantInference Speed
Raspberry Pi 58 GBINT4 (AWQ)~5 tok/s
NVIDIA Jetson4 GBINT4 only~10 tok/s (GPU)
Laptop (16GB)16 GBINT8 or FP16~20 tok/s

Key Metrics

  • Inference: <2s on edge Β· Quality: β‰₯85% of cloud Β· Offline: 100% success Β· Memory: <80% device RAM

DevKit (Edge AI-Focused)

PrimitiveWhat It Does
3 agentsBuilder (ONNX/quantization/IoT), Reviewer (memory/privacy/offline), Tuner (quant level/threads/sync)
3 skillsDeploy (115 lines), Evaluate (100 lines), Tune (112 lines)
4 prompts/deploy (ONNX + device), /test (on-device inference), /review (memory/privacy), /evaluate (speed vs cloud)

Note: This is an edge/on-device AI play β€” no cloud inference costs during operation. TuneKit covers quantization selection, ONNX Runtime threads, prompt compression for small context windows, IoT Hub sync frequency, and hybrid routing (70% cost reduction from edge-first).

Cost Estimate

ServiceDev/PoCProductionEnterprise
Azure Container Registry$5/mo$20/mo$50/mo
Azure IoT Hub$0/mo$25/mo$250/mo
Azure OpenAI$15/mo$80/mo$300/mo
Blob Storage$2/mo$8/mo$25/mo
Azure Monitor$0/mo$20/mo$60/mo
Application Insights$0/mo$20/mo$70/mo
Key Vault$1/mo$3/mo$10/mo
Azure DevOps / GitHub Actions$0/mo$15/mo$40/mo
Total$23/mo$191/mo$805/mo

πŸ’° Full cost breakdown

πŸ“– Full docs Β· 🌐 frootai.dev/solution-plays/19-edge-ai-phi4Β 

FAI Manifest

FieldValue
Play19-edge-ai-phi4
Version1.0.0
KnowledgeF2-LLM-Selection, T1-Fine-Tuning-MLOps
WAF Pillarscost-optimization, performance-efficiency, security
Groundednessβ‰₯ 85%
Safety0 violations max
Last updated on