Gabriel Orlanski

Tag: benchmark

All the articles with the tag "benchmark".

SlopCodeBench: Measuring Code Erosion Under Iterative Specification Refinement

G. Orlanski , D. Roy , A. Yun , C. Shin , A. Gu , A. Ge , D. Adila , A. Albarghouthi , and F. Sala

View code on GitHub Paper View paper on arXiv 2025 Technical Report

SlopCodeBench evaluates AI coding agents under iterative specification updates. Unlike single-shot benchmarks, SCBench reveals verbosity and structural erosion that make agent-written code unmaintainable over time.