Play 42 β Computer Use Agent
Vision-based desktop and web automation β AI agent that controls applications via screenshots and mouse/keyboard actions, replacing brittle RPA with intelligent screen understanding. Runs in a sandboxed VM with action replay, rollback, and full audit trail.
Architecture
| Component | Azure Service | Purpose |
|---|---|---|
| Vision Model | Azure OpenAI (GPT-4o Vision) | Screenshot analysis + action planning |
| Sandbox VM | Azure Virtual Machines | Isolated execution environment |
| Orchestrator | Azure Container Apps | Task queue, step coordination, replay |
| Storage | Azure Blob Storage | Screenshots, action replays, recordings |
| Secrets | Azure Key Vault | API keys, VM credentials |
| Telemetry | Application Insights | Step tracking, cost monitoring |
π Full architecture details
How It Differs from Related Plays
| Aspect | Play 23 (Browser Automation) | Play 42 (Computer Use) | Play 36 (Multimodal) |
|---|---|---|---|
| Scope | Web browsers only | Desktop + web + any GUI application | Image/text analysis |
| Method | DOM selectors, Playwright | Screenshots + vision + accessibility API | Vision API on images |
| Target | Web apps with API/DOM | Legacy apps, no-API systems | Documents, diagrams |
| Actions | Click, navigate, fill forms | Mouse, keyboard, hotkeys, scroll, waits | Read, classify, reason |
| Safety | Browser sandbox | VM sandbox, action whitelist, rollback | Content safety |
| Output | Extracted data, test results | Task completion + full recording | Structured analysis |
DevKit Structure
42-computer-use-agent/
βββ agent.md # Root orchestrator with handoffs
βββ .github/
β βββ copilot-instructions.md # Domain knowledge (<150 lines)
β βββ agents/
β β βββ builder.agent.md # Screenshot loop + action executor
β β βββ reviewer.agent.md # Sandbox safety + credential guards
β β βββ tuner.agent.md # Resolution, timing, cost tuning
β βββ prompts/
β β βββ deploy.prompt.md # Deploy agent + sandbox VM
β β βββ test.prompt.md # Run automation workflows
β β βββ review.prompt.md # Audit safety controls
β β βββ evaluate.prompt.md # Measure task completion
β βββ skills/
β β βββ deploy-computer-use-agent/ # Full deployment with VM + vision
β β βββ evaluate-computer-use-agent/ # Completion, accuracy, safety, cost
β β βββ tune-computer-use-agent/ # Resolution, timing, loops, cost
β βββ instructions/
β βββ computer-use-agent-patterns.instructions.md
βββ config/ # TuneKit
β βββ openai.json # Vision model + detail level
β βββ guardrails.json # Max steps, blocked actions, sandbox
β βββ agents.json # Screenshot config, timing, loops
βββ infra/ # Bicep IaC
β βββ main.bicep
β βββ parameters.json
βββ spec/ # SpecKit
βββ fai-manifest.jsonQuick Start
# 1. Deploy sandbox VM + vision model
/deploy
# 2. Run automation task in sandbox
/test
# 3. Audit safety controls
/review
# 4. Measure task completion rate
/evaluateKey Metrics
| Metric | Target | Description |
|---|---|---|
| Task Completion Rate | > 85% | Tasks fully completed correctly |
| Click Accuracy | > 90% | Clicked correct UI element |
| Step Efficiency | > 70% | Optimal steps / actual steps |
| Safety Compliance | 100% | Blocked actions correctly rejected |
| Loop Detection | > 95% | Stuck loops detected and exited |
| Cost per Task | < $0.50 | Vision API + VM runtime |
Estimated Cost
| Service | Dev/mo | Prod/mo | Enterprise/mo |
|---|---|---|---|
| Azure OpenAI (GPT-4o Vision) | $80 | $600 | $2,000 |
| Azure Container Apps | $25 | $200 | $800 |
| Blob Storage | $3 | $30 | $80 |
| Azure Container Registry | $5 | $20 | $50 |
| Cosmos DB | $5 | $60 | $300 |
| Key Vault | $1 | $5 | $15 |
| Virtual Network | $0 | $35 | $150 |
| Application Insights | $0 | $25 | $80 |
| Total | $119 | $975 | $3,475 |
Estimates based on Azure retail pricing. Actual costs vary by region, usage, and enterprise agreements.
π° Full cost breakdown
WAF Alignment
| Pillar | Implementation |
|---|---|
| Security | VM sandbox isolation, action whitelisting, credential entry blocked, no internet |
| Reliability | VM snapshot before each run, auto-rollback on failure, loop detection |
| Cost Optimization | Low-detail screenshots for navigation, accessibility API when possible, VM auto-deallocate |
| Operational Excellence | Full action replay recording, step-by-step audit trail, 30-day retention |
| Performance Efficiency | Hybrid accessibility+vision approach, adaptive detail level, multi-step planning |
| Responsible AI | Destructive action confirmation, blocked credential entry, sandbox containment |
FAI Manifest
| Field | Value |
|---|---|
| Play | 42-computer-use-agent |
| Version | 1.0.0 |
| Knowledge | O2-Agent-Coding, O3-MCP-Tools-Functions, F2-LLM-Selection, T3-Production-Patterns, R3-Deterministic-AI |
| WAF Pillars | security, reliability, cost-optimization, operational-excellence |
| Groundedness | β₯ 85% |
| Safety | 0 violations max |
Last updated on