PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
reacher-z
clawbench-eval

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9
reacher-z
claw-harness

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9
reacher-z
clawbench-harness

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9
reacher-z
nail-clawbench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

1K 147 9
reacher-z
openclawbench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

991 147 9
reacher-z
clawbench-cli

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

731 147 9
reacher-z
claw-ai

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

236 147 9
reacher-z
task-harness

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

232 147 9
reacher-z
claw-eval

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

218 147 9
reacher-z
claw-agent

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

211 147 9
reacher-z
mcq-bench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

206 147 9
reacher-z
harness-hub

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

204 147 9
reacher-z
life-bench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

203 147 9
reacher-z
nail-eval

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

199 147 9
reacher-z
nail-agent

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

198 147 9
reacher-z
nail-group

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

197 147 9
reacher-z
harnessos

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

196 147 9
reacher-z
r2agent

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

196 147 9
reacher-z
r2-harness

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

195 147 9
reacher-z
nail-bench

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

195 147 9
reacher-z
everyday-bench

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

191 147 9
reacher-z
everyday-agent

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

187 147 9
reacher-z
realtask-bench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

187 147 9
reacher-z
vlm-judge

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

185 147 9
    • Data from PyPI, GitHub, ClickHouse, and BigQuery