ℹ️ CodeLens has shut down. Read more about why we shut down or check out FlouState for developer productivity tracking.

Early Data (5 Days Post-Launch)October 13, 2025 · 8 min read

Your AI Code Assistant's Hidden Carbon Cost: 90x Difference Between Models

TL;DR

  • Tracked carbon emissions across 173 AI code generations (28 real developer tasks, 5 days post-launch)
  • Surprising finding: Same category of work = 90x different carbon footprint
  • Google Gemini (75%+ renewables): 0.65g CO2/query
  • OpenAI GPT-5 (AWS coal grid): 2.31g CO2/query (3.6x worse!)
  • Key insight: Grid carbon intensity matters more than model efficiency

The 90x Carbon Gap

We launched CodeLens.AI five days ago with carbon tracking built in. After analyzing 28 real developer tasks (173 AI executions across 7-8 models each), one finding stands out:

0.07g
LOWEST CARBON
Gemini 2.5 Pro
Simple JavaScript bug fix
392 output tokens
6.64g
HIGHEST CARBON
Grok 4
Complex Python debugging
11,910 output tokens
90x Difference
Same category of work (bug fixes), wildly different footprints

But here's what surprised us even more: when we looked at averages across all tasks, the gap between providers was still massive—and it had less to do with model efficiency than you'd think.

Why Google Beats Everyone (It's Not the Model)

We expected GPT-5 to be the most carbon-intensive (it's the most verbose). What we didn't expect: grid carbon intensity would matter more than model efficiency.

ModelAvg Carbonvs. GeminiGrid SourceSample Size
🥇 Gemini 2.5 Pro0.65g CO2GCP (75%+ renewable)34
Claude Opus 4.10.89g CO21.4xGCP (carbon-neutral)29
Grok 41.06g CO21.6xUnknown (US avg)34
Claude Sonnet 4.51.10g CO21.7xGCP (carbon-neutral)31
OpenAI o31.20g CO21.8xAWS US East34
GPT-52.31g CO23.6xAWS US East32
GLM 4.6 (Zhipu AI)3.09g CO24.8xChina (coal grid)5

The takeaway: Google's 75%+ renewable energy infrastructure gives Gemini a structural advantage that has nothing to do with the model itself. AWS US East (where OpenAI runs) relies heavily on coal, and it shows: GPT-5 produces 3.6x more CO2 per query than Gemini on average.

Output Length Matters More Than You Think

The second-biggest factor? How much the model outputs. We bucketed our 173 executions by output length and found a clear pattern:

Output LengthExecutionsAvg CarbonRangevs. Smallest
0-1K tokens240.39g CO20.07-1.51g
1K-3K tokens680.54g CO20.26-1.06g1.4x
3K-5K tokens541.15g CO20.48-5.37g2.9x
5K-10K tokens382.07g CO20.77-5.08g5.3x
10K+ tokens154.05g CO21.94-6.64g10.4x

10x carbon increase from shortest to longest outputs. This explains why GPT-5 ranks worst on average—it outputs nearly 8K tokens per query, compared to Gemini's 2.9K tokens.

Cost vs Carbon: The Tradeoff Nobody Talks About

Here's where it gets interesting: the greenest model is also the cheapest.

ModelAvg CostAvg CarbonCost Premium vs GeminiCarbon Premium vs Gemini
🥇 Gemini 2.5 Pro$0.0310.65g
Grok 4$0.0501.06g1.6x ($0.019 more)1.6x
Claude Sonnet 4.5$0.0721.10g2.3x ($0.041 more)1.7x
Claude Opus 4.1$0.2690.89g8.7x ($0.238 more)1.4x
OpenAI o3$0.0311.20g1.0x (same as Gemini)1.8x
GPT-5$0.0812.31g2.6x ($0.050 more)3.6x

GPT-5 costs 2.6x more per query than Gemini and produces 3.6x more CO2. There's no tradeoff here—Gemini wins on both dimensions.

Real-World Impact: Should You Care?

Individual queries have tiny footprints (under 3 grams for most models). But at scale, the differences become meaningful:

Annual Carbon Projections (100 queries/day)

🥇
Gemini 2.5 Pro
100 queries/day × 365 days
23.6 kg CO2/year
≈ 5 smartphone charges
Claude Sonnet 4.5
100 queries/day × 365 days
40.1 kg CO2/year
≈ 9 smartphone charges
🔥
GPT-5
100 queries/day × 365 days
84.2 kg CO2/year
≈ 18 smartphone charges
For context: A team of 100 developers using GPT-5 (10 queries/day each) = 8.4 tons CO2/year. Switching to Gemini would cut that to 2.4 tons—a 72% reduction.

What Developers Are Actually Using AI For

Our 28 evaluations came from 12 real developers. Here's what they submitted:

Task TypeLanguageCountAvg CarbonRange
Security AnalysisJavaScript41.06g CO20.22-4.16g
Bug FixingJavaScript30.46g CO20.07-1.87g
Security AnalysisPython30.93g CO20.37-2.29g
OptimizationPython22.27g CO20.61-5.25g
OptimizationTypeScript20.80g CO20.42-2.06g
Feature ImplementationWebix21.05g CO20.31-2.87g

Insight: Simple JavaScript bug fixes averaged 0.46g CO2, while Python optimization tasks averaged 2.27g—nearly 5x higher. Task complexity matters as much as model choice.

Methodology: How We Calculate Carbon

We use Epoch AI's 2025 research as our baseline for energy consumption, then multiply by provider-specific carbon intensities. Our calculations are directional (±30-50% accuracy), not precise measurements.

Read full methodology →

Limitations (We're Being Honest Here)

Big Caveats

  • Tiny dataset: Only 28 evaluations from 12 developers (5 days post-launch, growing daily)
  • GPT-4o baseline for all models: Real model efficiency varies (we don't have per-model energy data)
  • Unknown data center locations: Using provider averages (actual locations may differ)
  • o3 reasoning tokens: Hidden reasoning likely underestimates true carbon (we track visible output only)
  • ±30-50% accuracy: Directional insights, not precise measurements

We're not claiming to be carbon accounting experts. We're sharing what we've learned from real developer tasks in the hope that imperfect data beats no data.

See the Carbon Footprint of Your Code Task

Submit your code challenge and get instant carbon tracking across all 8 models. It's free, takes 3 minutes to process, and you'll see exactly which AI produces the least CO2 for your specific use case.

About CodeLens.AI: We're building the world's most accurate benchmark of AI model performance on real developer tasks. Community-driven, transparent methodology, and carbon tracking built in. Launched October 8, 2025.