Tag: slop-code-bench
All the articles with the tag "slop-code-bench".
-
GPT-5.4 Writes Clean Code That Fails More Tests
Published: at 06:00 AM[code]GPT-5.4 writes the cleanest code of any GPT model we've tested — and fails more tests than 5.3. Erosion drops, duplication drops, but pass rate and core both regress. We dig into why.
-
Opus 4.6 and GPT-5.3 Codex Score Higher, but the Code Is Still a Mess.
Published: at 06:00 AM[code]Anthropic's Opus 4.6 copy-pastes. OpenAI's GPT-5.3 Codex over-abstracts. Both miss edge cases at the same rate. New SCBench results with a guide for when to trust each model.
-
Coding Agents Are Lazy Patchers
Published: at 07:40 AM[code]AI coding agents become lazy patchers under iterative changes, copying code instead of refactoring. This creates massive god functions that are unmaintainable—explaining the gap between benchmark scores and real-world experience.