Browser Agent Python Packages

smolvm-core

Open-source AI sandbox infrastructure for code execution, browser use, and AI agents.

11K 492 32

clawbench-eval

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9

claw-harness

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9

clawbench-harness

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9

nail-clawbench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9

openclawbench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

991 147 9

clawbench-cli

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

731 147 9

webqa-agent

Autonomous web browser agent that audits performance, functionality & UX for engineers and vibe-coding creators. 网站自主评估测试 Agent，支持 GUI/CLI 一键完成性能、功能使用与交互体验的测试评估

549 205 16

claw-ai

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

236 147 9

task-harness

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

232 147 9

claw-eval

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

218 147 9

claw-agent

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

211 147 9

mcq-bench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

206 147 9

harness-hub

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

204 147 9

life-bench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

203 147 9

nail-eval

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

199 147 9

nail-agent

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

198 147 9

nail-group

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

197 147 9

harnessos

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

196 147 9

r2agent

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

196 147 9

r2-harness

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

195 147 9

nail-bench

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

195 147 9

everyday-bench

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

191 147 9

everyday-agent

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

187 147 9

Search Packages