Play 23 β Browser Automation Agent π
AI agent that navigates web pages using Playwright with GPT-4o vision.
An autonomous browser agent that takes screenshots, uses GPT-4o vision to understand page state, decides what to click/type/navigate, and executes actions via Playwright. Handles login flows, form filling, data extraction, and multi-step web workflows.
Quick Start
cd solution-plays/23-browser-automation-agent
npx playwright install # Install browser binaries
az deployment group create -g $RG -f infra/main.bicep -p infra/parameters.json
code . # Use @builder for Playwright/vision, @reviewer for security audit, @tuner for efficiencyArchitecture
π See architecture.md for full data flow, service roles, security architecture, and scaling tables.
Agent Action Loop
Navigate β Screenshot β GPT-4o Vision β Decide Action β Execute β Verify β RepeatKey Metrics
- Task completion: β₯85% Β· Action success: β₯95% Β· Steps/task: <10 Β· Selector resilience: β₯90%
DevKit (Browser Automation-Focused)
| Primitive | What It Does |
|---|---|
| 3 agents | Builder (Playwright/DOM/vision), Reviewer (security/domains/credentials), Tuner (screenshots/selectors/cost) |
| 3 skills | Deploy (102 lines), Evaluate (100 lines), Tune (103 lines) |
| 4 prompts | /deploy (Playwright + vision), /test (navigation/forms), /review (security/domains), /evaluate (completion rate) |
Note: This is a browser automation/RPA play. TuneKit covers screenshot frequency strategies, wait strategies (never fixed delays), selector methods (accessible names > CSS), action planning prompts, and cost per automation β not AI model quality metrics.
Cost
π° See cost.json for full pricing breakdown with SKUs, notes, and optimization tips.
| Service | Purpose | Dev | Prod | Enterprise |
|---|---|---|---|---|
| Azure OpenAI | GPT-4o Vision for screenshot analysis + action planning | $80 | $450 | $1,500 |
| Container Apps | Headless browser runtime + agent orchestrator | $25 | $180 | $500 |
| Blob Storage | Screenshot storage, session recordings | $3 | $20 | $60 |
| Cosmos DB | Browser session state, task queue, action history | $5 | $45 | $180 |
| Key Vault | Site credentials, authentication tokens | $1 | $3 | $10 |
| App Insights | Action traces, vision API latency | $0 | $25 | $100 |
| Log Analytics | Browser container logs, failure diagnostics | $0 | $15 | $50 |
| Total | $114 | $738 | $2,400 |
π Full docs Β· π frootai.dev/solution-plays/23-browser-automation-agentΒ
FAI Manifest
| Field | Value |
|---|---|
| Play | 23-browser-automation-agent |
| Version | 1.0.0 |
| Knowledge | O2-AI-Agents, O3-MCP-Tools-Functions, R1-Prompt-Engineering, T3-Production-Patterns |
| WAF Pillars | security, reliability, performance-efficiency, responsible-ai |
| Groundedness | β₯ 85% |
| Safety | 0 violations max |