Skip to Content
Solution PlaysPlay 44: Play 44 β€” Foundry Local On-Device

Play 44 β€” Foundry Local On-Device

On-device AI inference with Azure AI Foundry Local SDK β€” hardware-aware model selection, hybrid cloud/local routing, offline caching, complexity-based query classification, and cost optimization through local-first inference.

Architecture

ComponentTechnologyPurpose
Local InferenceFoundry Local SDKOn-device model loading and inference
Local ModelsPhi-4, Phi-4-mini, Phi-3-miniSLMs optimized for device hardware
Cloud FallbackAzure OpenAI (GPT-4o)Complex queries beyond local capability
Complexity RouterPython classifierRoute simple→local, complex→cloud
Model CacheLocal disk (~2-8GB)Cached models for instant offline inference
TelemetryLocal JSONL logsTrack local vs cloud usage and costs

πŸ“ Full architecture details

AspectPlay 19 (Edge AI)Play 44 (Foundry Local)Play 34 (Edge Deployment)
RuntimeCustom ONNX containerFoundry Local SDKIoT Hub + ONNX Runtime
DevicesIoT/edge devicesDeveloper PCs + laptopsIoT fleet (sensors, gateways)
Model SourceCustom fine-tunedFoundry model catalogCustom ONNX models
ManagementIoT Hub fleet mgmtSingle-device self-managedIoT Hub device twin
NetworkCan be intermittentLocal-first, cloud optionalCloud sync required
Use CaseIndustrial/IoTDeveloper productivity, privacyManufacturing, retail

DevKit Structure

44-foundry-local-on-device/ β”œβ”€β”€ agent.md # Root orchestrator with handoffs β”œβ”€β”€ .github/ β”‚ β”œβ”€β”€ copilot-instructions.md # Domain knowledge (<150 lines) β”‚ β”œβ”€β”€ agents/ β”‚ β”‚ β”œβ”€β”€ builder.agent.md # SDK setup + hybrid router β”‚ β”‚ β”œβ”€β”€ reviewer.agent.md # Hardware compat + offline β”‚ β”‚ └── tuner.agent.md # Model selection + cost β”‚ β”œβ”€β”€ prompts/ β”‚ β”‚ β”œβ”€β”€ deploy.prompt.md # Configure local models β”‚ β”‚ β”œβ”€β”€ test.prompt.md # Test local + fallback β”‚ β”‚ β”œβ”€β”€ review.prompt.md # Audit hardware + offline β”‚ β”‚ └── evaluate.prompt.md # Compare local vs cloud β”‚ β”œβ”€β”€ skills/ β”‚ β”‚ β”œβ”€β”€ deploy-foundry-local-on-device/ # SDK setup + model download + router β”‚ β”‚ β”œβ”€β”€ evaluate-foundry-local-on-device/ # Quality, latency, cost, offline β”‚ β”‚ └── tune-foundry-local-on-device/ # Model profiles, router, prompts, cost β”‚ └── instructions/ β”‚ └── foundry-local-on-device-patterns.instructions.md β”œβ”€β”€ config/ # TuneKit β”‚ β”œβ”€β”€ openai.json # Model profiles, cloud fallback β”‚ β”œβ”€β”€ guardrails.json # Offline mode, hardware limits β”‚ └── agents.json # Routing rules, fallback config β”œβ”€β”€ infra/ # Bicep IaC (cloud fallback only) β”‚ β”œβ”€β”€ main.bicep β”‚ └── parameters.json └── spec/ # SpecKit └── fai-manifest.json

Quick Start

# 1. Install SDK and download models /deploy # 2. Test local inference and offline mode /test # 3. Audit hardware compatibility /review # 4. Compare local vs cloud quality and cost /evaluate

Key Metrics

MetricTargetDescription
Local Accuracy> 80%Response correctness for simple queries
Quality Parity> 0.75Local quality / cloud quality ratio
Local Inference Rate> 60%Queries handled locally (free)
Offline Success> 95%Queries answered without network
Routing Accuracy> 85%Correct source for query complexity
Cost Savings> 50%Reduction vs cloud-only inference

Estimated Cost

ServiceDev/moProd/moEnterprise/mo
Azure OpenAI$30$200$800
Azure IoT Hub$0$25$250
Azure Monitor$0$30$100
Blob Storage$2$15$50
Azure Container Registry$5$20$50
Key Vault$1$5$15
Azure Functions$0$10$120
Total$38$305$1,385

Estimates based on Azure retail pricing. Actual costs vary by region, usage, and enterprise agreements.

πŸ’° Full cost breakdown

WAF Alignment

PillarImplementation
Cost OptimizationLocal inference = $0 API cost, target 60%+ local rate
Performance EfficiencyHardware-aware model selection, INT4/INT8/FP16 quantization
ReliabilityOffline capability, graceful degradation, cloud fallback
SecurityData stays on device for local queries, no network exposure
Operational ExcellenceTelemetry logging, model cache management, auto warmup
Responsible AISame quality standards for local and cloud responses

FAI Manifest

FieldValue
Play44-foundry-local-on-device
Version1.0.0
KnowledgeO5-GPU-Infra, F2-LLM-Selection, T3-Production-Patterns, R3-Deterministic-AI, F1-GenAI-Foundations
WAF Pillarssecurity, reliability, performance-efficiency, cost-optimization
Groundednessβ‰₯ 85%
Safety0 violations max
Last updated on