Web Agents Python Packages

agentlab

AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.

3K 574 112

uground-demo-test

[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents

1K 312 18

clawbench-eval

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9

claw-harness

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9

clawbench-harness

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9

nail-clawbench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9

openclawbench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9

clawbench-cli

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

753 147 9

claw-ai

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

248 147 9

task-harness

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

241 147 9

claw-eval

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

227 147 9

claw-agent

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

219 147 9

mcq-bench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

213 147 9

harness-hub

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

212 147 9

life-bench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

210 147 9

doomarena-taubench

DoomArena is a Framework for Testing AI Agents Against Evolving Security Threats

209 58 6

doomarena

DoomArena is a Framework for Testing AI Agents Against Evolving Security Threats

208 58 6

nail-eval

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

207 147 9

nail-agent

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

206 147 9

nail-group

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

205 147 9

r2-harness

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

204 147 9

nail-bench

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

204 147 9

r2agent

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

204 147 9

harnessos

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

203 147 9

Search Packages