Claude Haiku 4.5 Wrote 62% More Code But Scored 16% Lower Than Sonnet 4.5
Testing Anthropic's newly released Claude Haiku 4.5 revealed a surprising paradox: it produced 13,666 tokens (most of all 8 models) but scored 74.4/100, while Sonnet 4.5 wrote 8,425 tokens and scored 89.0. This is what over-engineering looks like in AI-generated code.