Tag: benchmark
All the articles with the tag "benchmark".
-
SlopCodeBench: Measuring Code Erosion Under Iterative Specification Refinement
G. Orlanski , D. Roy , A. Yun , C. Shin , A. Gu , A. Ge , D. Adila , A. Albarghouthi , and F. SalaSlopCodeBench evaluates AI coding agents under iterative specification updates. Unlike single-shot benchmarks, SCBench reveals verbosity and structural erosion that make agent-written code unmaintainable over time.