AI Agent Evaluation Benchmarks — UI Grounding, Web Navigation and Task Automation
Static UI grounding benchmark — Evaluates agents' ability to locate interface elements.
Complete websites for testing navigation — Evaluates multi-page user journeys.