AI Model Benchmarking

Real code. Blind evaluation. Community-driven rankings.

300+ models

How Models Are Scored

Every submission is evaluated by 3 frontier AI judges. Final score = median of all three. No model judges itself. No single-vendor bias.

Claude Opus 4.5(Anthropic)
GPT-5.2(OpenAI)
Gemini 3 Pro(Google)

Current Rankings

Updated in real-time as community votes come in.

Community ELO Rankings

Beyond AI judges, the community votes on outputs in blind head-to-head matchups.

1

Two model outputs shown side-by-side

Same prompt, different models

2

Model names hidden until you vote

Prevents bias toward familiar names

3

ELO rankings update in real-time

Chess-style rating system

Open Methodology

We publish all methodology details. No black boxes.

5-Criteria Scoring

Correctness, efficiency, readability, best practices, edge cases

Blind Community Voting

ELO rankings from anonymous head-to-head comparisons

Statistical Rigor

Minimum sample sizes, confidence intervals

Carbon Tracking

Environmental impact per evaluation

Monthly Top Models

Get ranking updates, new model analyses, and community insights. No spam.

Build Better AI Benchmarks

Your evaluations and votes contribute to community rankings.

Create Account