AI Model Benchmarking
Real code. Blind evaluation. Community-driven rankings.
How Models Are Scored
Every submission is evaluated by 3 frontier AI judges. Final score = median of all three. No model judges itself. No single-vendor bias.
Community ELO Rankings
Beyond AI judges, the community votes on outputs in blind head-to-head matchups.
Two model outputs shown side-by-side
Same prompt, different models
Model names hidden until you vote
Prevents bias toward familiar names
ELO rankings update in real-time
Chess-style rating system
Open Methodology
We publish all methodology details. No black boxes.
5-Criteria Scoring
Correctness, efficiency, readability, best practices, edge cases
Blind Community Voting
ELO rankings from anonymous head-to-head comparisons
Statistical Rigor
Minimum sample sizes, confidence intervals
Carbon Tracking
Environmental impact per evaluation
Monthly Top Models
Get ranking updates, new model analyses, and community insights. No spam.
Build Better AI Benchmarks
Your evaluations and votes contribute to community rankings.
Create AccountQuestions? contact@codelens.ai