Tag: benchmark
All the articles with the tag "benchmark".
-
SlopCodeBench: Measuring Code Erosion Under Iterative Specification Refinement
Published: at 09:22 AM[code]New benchmark for agentic coding focusing on iterative evolutions of specifications. Focuses on how the quality of the code changes over iterations