CodeLens.AI Blog

Insights, case studies, and analysis of AI model performance on real-world coding challenges

Model ComparisonOctober 15, 2025•10 min read

10 AI Models, One WebSocket Task: What Code Volume Actually Tells Us

We tested 10 AI models on the same refactoring task. Output sizes varied 15x (888 to 13,666 tokens), but quality didn't follow a simple pattern. Haiku 4.5 wrote the most code and scored the highest, challenging the "less is more" narrative. Here's what code volume actually reveals about AI performance.

Read full article

Carbon AnalysisOctober 13, 2025•8 min read

Your AI Code Assistant's Hidden Carbon Cost: 90x Difference Between Models

We tracked carbon emissions across 173 AI code generations from 28 real developer tasks (5 days post-launch). Surprising finding: same category of work, 90x different carbon footprint. Gemini averages 0.65g CO2 per query, GPT-5 averages 2.31g (3.6x worse). Here's why grid renewables matter more than model efficiency.

Read full article

More case studies and insights coming soon!

Have a suggestion for a topic? Let us know