reads sessions you already have - nothing leaves your machine

Your AI tools, benchmarked on
the work you actually do.

EvalMyAgent reads your Claude Code, Codex, and Gemini CLI session history and turns it into a personal benchmark. When a new model drops, it tells you whether it's worth switching - for your tasks, not SWE-bench's.

view example dashboard ->
~/dev/evalmyagent
$ pipx install evalmyagent
-> installed package evalmyagent in an isolated environment
  apps are now globally available on your machine
  choose analysis: evalmyagent init
  then run: evalmyagent dashboard
  local dashboard: http://127.0.0.1:3847
ready - local rules unless you opt into a CLI analyzer
01

Link your sessions

Point the CLI at your local agent history. It parses Claude Code, Codex, and Gemini CLI logs in seconds - fully on your machine.

02

Build your bench

EvalMyAgent extracts your real task taxonomy - the refactors, debugs, and greenfield builds you actually do - into a graded personal benchmark.

03

Run any model

A new model ships? Replay your bench against it and get a clear answer: switch, stay, or route by task type.

Your bench, not
somebody else's.

Public leaderboards do not know your codebase or task distribution. EvalMyAgent derives a benchmark from your real sessions so its dashboard reflects the work in front of you.

example task mix - your dashboard uses live data
Refactoring32%
Debugging24%
Feature build19%
Test writing12%
API integration9%
Infra and config4%

See what wins on your work.

score = your bench · Δ = vs public bench · illustrative preview
#
model / harness
score
Δ
$/task
best at
2
Claude Opus 4.1Claude Code
89
+4
$0.74
Deep debugging
3
GPT-5Codex
84
-2
$0.22
API & integration
4
Gemini 2.5 ProGemini CLI
79
+1
$0.09
Boilerplate
5
GPT-5 miniCodex
71
$0.05
Quick fixes

Build your private benchmark locally.

Install the isolated CLI, choose local rules or an installed Codex/Claude CLI with evalmyagent init, then run evalmyagent dashboard.

$ pipx install evalmyagent

no Python tooling? curl -sSL https://evalmyagent.ai/install.sh | sh