Skip to Content
Solution PlaysPlay 23: Play 23 β€” Browser Automation Agent 🌐

Play 23 β€” Browser Automation Agent 🌐

AI agent that navigates web pages using Playwright with GPT-4o vision.

An autonomous browser agent that takes screenshots, uses GPT-4o vision to understand page state, decides what to click/type/navigate, and executes actions via Playwright. Handles login flows, form filling, data extraction, and multi-step web workflows.

Quick Start

cd solution-plays/23-browser-automation-agent npx playwright install # Install browser binaries az deployment group create -g $RG -f infra/main.bicep -p infra/parameters.json code . # Use @builder for Playwright/vision, @reviewer for security audit, @tuner for efficiency

Architecture

πŸ“ See architecture.md for full data flow, service roles, security architecture, and scaling tables.

Agent Action Loop

Navigate β†’ Screenshot β†’ GPT-4o Vision β†’ Decide Action β†’ Execute β†’ Verify β†’ Repeat

Key Metrics

  • Task completion: β‰₯85% Β· Action success: β‰₯95% Β· Steps/task: <10 Β· Selector resilience: β‰₯90%

DevKit (Browser Automation-Focused)

PrimitiveWhat It Does
3 agentsBuilder (Playwright/DOM/vision), Reviewer (security/domains/credentials), Tuner (screenshots/selectors/cost)
3 skillsDeploy (102 lines), Evaluate (100 lines), Tune (103 lines)
4 prompts/deploy (Playwright + vision), /test (navigation/forms), /review (security/domains), /evaluate (completion rate)

Note: This is a browser automation/RPA play. TuneKit covers screenshot frequency strategies, wait strategies (never fixed delays), selector methods (accessible names > CSS), action planning prompts, and cost per automation β€” not AI model quality metrics.

Cost

πŸ’° See cost.json for full pricing breakdown with SKUs, notes, and optimization tips.

ServicePurposeDevProdEnterprise
Azure OpenAIGPT-4o Vision for screenshot analysis + action planning$80$450$1,500
Container AppsHeadless browser runtime + agent orchestrator$25$180$500
Blob StorageScreenshot storage, session recordings$3$20$60
Cosmos DBBrowser session state, task queue, action history$5$45$180
Key VaultSite credentials, authentication tokens$1$3$10
App InsightsAction traces, vision API latency$0$25$100
Log AnalyticsBrowser container logs, failure diagnostics$0$15$50
Total$114$738$2,400

πŸ“– Full docs Β· 🌐 frootai.dev/solution-plays/23-browser-automation-agentΒ 

FAI Manifest

FieldValue
Play23-browser-automation-agent
Version1.0.0
KnowledgeO2-AI-Agents, O3-MCP-Tools-Functions, R1-Prompt-Engineering, T3-Production-Patterns
WAF Pillarssecurity, reliability, performance-efficiency, responsible-ai
Groundednessβ‰₯ 85%
Safety0 violations max
Last updated on