Skip to Content
Solution PlaysPlay 42: Play 42 β€” Computer Use Agent

Play 42 β€” Computer Use Agent

Vision-based desktop and web automation β€” AI agent that controls applications via screenshots and mouse/keyboard actions, replacing brittle RPA with intelligent screen understanding. Runs in a sandboxed VM with action replay, rollback, and full audit trail.

Architecture

ComponentAzure ServicePurpose
Vision ModelAzure OpenAI (GPT-4o Vision)Screenshot analysis + action planning
Sandbox VMAzure Virtual MachinesIsolated execution environment
OrchestratorAzure Container AppsTask queue, step coordination, replay
StorageAzure Blob StorageScreenshots, action replays, recordings
SecretsAzure Key VaultAPI keys, VM credentials
TelemetryApplication InsightsStep tracking, cost monitoring

πŸ“ Full architecture details

AspectPlay 23 (Browser Automation)Play 42 (Computer Use)Play 36 (Multimodal)
ScopeWeb browsers onlyDesktop + web + any GUI applicationImage/text analysis
MethodDOM selectors, PlaywrightScreenshots + vision + accessibility APIVision API on images
TargetWeb apps with API/DOMLegacy apps, no-API systemsDocuments, diagrams
ActionsClick, navigate, fill formsMouse, keyboard, hotkeys, scroll, waitsRead, classify, reason
SafetyBrowser sandboxVM sandbox, action whitelist, rollbackContent safety
OutputExtracted data, test resultsTask completion + full recordingStructured analysis

DevKit Structure

42-computer-use-agent/ β”œβ”€β”€ agent.md # Root orchestrator with handoffs β”œβ”€β”€ .github/ β”‚ β”œβ”€β”€ copilot-instructions.md # Domain knowledge (<150 lines) β”‚ β”œβ”€β”€ agents/ β”‚ β”‚ β”œβ”€β”€ builder.agent.md # Screenshot loop + action executor β”‚ β”‚ β”œβ”€β”€ reviewer.agent.md # Sandbox safety + credential guards β”‚ β”‚ └── tuner.agent.md # Resolution, timing, cost tuning β”‚ β”œβ”€β”€ prompts/ β”‚ β”‚ β”œβ”€β”€ deploy.prompt.md # Deploy agent + sandbox VM β”‚ β”‚ β”œβ”€β”€ test.prompt.md # Run automation workflows β”‚ β”‚ β”œβ”€β”€ review.prompt.md # Audit safety controls β”‚ β”‚ └── evaluate.prompt.md # Measure task completion β”‚ β”œβ”€β”€ skills/ β”‚ β”‚ β”œβ”€β”€ deploy-computer-use-agent/ # Full deployment with VM + vision β”‚ β”‚ β”œβ”€β”€ evaluate-computer-use-agent/ # Completion, accuracy, safety, cost β”‚ β”‚ └── tune-computer-use-agent/ # Resolution, timing, loops, cost β”‚ └── instructions/ β”‚ └── computer-use-agent-patterns.instructions.md β”œβ”€β”€ config/ # TuneKit β”‚ β”œβ”€β”€ openai.json # Vision model + detail level β”‚ β”œβ”€β”€ guardrails.json # Max steps, blocked actions, sandbox β”‚ └── agents.json # Screenshot config, timing, loops β”œβ”€β”€ infra/ # Bicep IaC β”‚ β”œβ”€β”€ main.bicep β”‚ └── parameters.json └── spec/ # SpecKit └── fai-manifest.json

Quick Start

# 1. Deploy sandbox VM + vision model /deploy # 2. Run automation task in sandbox /test # 3. Audit safety controls /review # 4. Measure task completion rate /evaluate

Key Metrics

MetricTargetDescription
Task Completion Rate> 85%Tasks fully completed correctly
Click Accuracy> 90%Clicked correct UI element
Step Efficiency> 70%Optimal steps / actual steps
Safety Compliance100%Blocked actions correctly rejected
Loop Detection> 95%Stuck loops detected and exited
Cost per Task< $0.50Vision API + VM runtime

Estimated Cost

ServiceDev/moProd/moEnterprise/mo
Azure OpenAI (GPT-4o Vision)$80$600$2,000
Azure Container Apps$25$200$800
Blob Storage$3$30$80
Azure Container Registry$5$20$50
Cosmos DB$5$60$300
Key Vault$1$5$15
Virtual Network$0$35$150
Application Insights$0$25$80
Total$119$975$3,475

Estimates based on Azure retail pricing. Actual costs vary by region, usage, and enterprise agreements.

πŸ’° Full cost breakdown

WAF Alignment

PillarImplementation
SecurityVM sandbox isolation, action whitelisting, credential entry blocked, no internet
ReliabilityVM snapshot before each run, auto-rollback on failure, loop detection
Cost OptimizationLow-detail screenshots for navigation, accessibility API when possible, VM auto-deallocate
Operational ExcellenceFull action replay recording, step-by-step audit trail, 30-day retention
Performance EfficiencyHybrid accessibility+vision approach, adaptive detail level, multi-step planning
Responsible AIDestructive action confirmation, blocked credential entry, sandbox containment

FAI Manifest

FieldValue
Play42-computer-use-agent
Version1.0.0
KnowledgeO2-Agent-Coding, O3-MCP-Tools-Functions, F2-LLM-Selection, T3-Production-Patterns, R3-Deterministic-AI
WAF Pillarssecurity, reliability, cost-optimization, operational-excellence
Groundednessβ‰₯ 85%
Safety0 violations max
Last updated on