Posts
All the articles I've posted.
-
Coding Agents Are Lazy Patchers
Published: at 07:40 AM[code]AI coding agents become lazy patchers under iterative changes, copying code instead of refactoring. This creates massive god functions that are unmaintainable—explaining the gap between benchmark scores and real-world experience.
-
SlopCodeBench: Measuring Code Erosion Under Iterative Specification Refinement
Published: at 09:22 AM[code]SlopCodeBench evaluates AI coding agents under iterative specification updates. Unlike single-shot benchmarks, SCBench reveals verbosity and structural erosion that make agent-written code unmaintainable over time.
-
Reward Models Enable Scalable Code Verification by Trading Accuracy for Throughput
Outcome Reward Models for code verification allow one to trade accuracy for speed in the generate-then-rank paradigm. This can be further improved through a generate-prune-then-rank approach where a weaker verifier prunes solutions prior to ranking, thus saving work on incorrect tokens. We show that this hybrid approach can be 11.65 times faster than running the whole test suite while only being 8.33% less accurate.