AgentArena

AI Agent Evaluation Benchmarks — UI Grounding, Web Navigation and Task Automation

UIArena

Static UI grounding benchmark — Evaluates agents' ability to locate interface elements.

Complete websites for testing navigation — Evaluates multi-page user journeys.

AgentArena·Benchmarks for evaluating AI agents·GitHub