Tag: slop-code-bench
All the articles with the tag "slop-code-bench".
-
Opus 4.6 and GPT-5.3 Codex Score Higher, but the Code Is Still a Mess.
Published: at 06:00 AM[code]Anthropic's Opus 4.6 copy-pastes. OpenAI's GPT-5.3 Codex over-abstracts. Both miss edge cases at the same rate. New SCBench results with a guide for when to trust each model.
-
Coding Agents Are Lazy Patchers
Published: at 07:40 AM[code]AI coding agents become lazy patchers under iterative changes, copying code instead of refactoring. This creates massive god functions that are unmaintainable—explaining the gap between benchmark scores and real-world experience.