Gabriel Orlanski

Tag: slop-code-bench

All the articles with the tag "slop-code-bench".

GPT-5.4 Writes Clean Code That Fails More Tests

Published: 6 Mar, 2026 at 06:00 AM
[code]

GPT-5.4 writes the cleanest code of any GPT model we've tested — and fails more tests than 5.3. Erosion drops, duplication drops, but pass rate and core both regress. We dig into why.
Opus 4.6 and GPT-5.3 Codex Score Higher, but the Code Is Still a Mess.

Published: 11 Feb, 2026 at 06:00 AM
[code]

Anthropic's Opus 4.6 copy-pastes. OpenAI's GPT-5.3 Codex over-abstracts. Both miss edge cases at the same rate. New SCBench results with a guide for when to trust each model.
Coding Agents Are Lazy Patchers

Published: 12 Jan, 2026 at 07:40 AM
[code]

AI coding agents become lazy patchers under iterative changes, copying code instead of refactoring. This creates massive god functions that are unmaintainable—explaining the gap between benchmark scores and real-world experience.