Posts
All the articles I've posted.
-
SlopCodeBench: Measuring Code Erosion Under Iterative Specification Refinement
Published: at 09:22 AM[code]New benchmark for agentic coding focusing on iterative evolutions of specifications. Focuses on how the quality of the code changes over iterations
-
Reward Models Enable Scalable Code Verification by Trading Accuracy for Throughput
Outcome Reward Models for code verification allow one to trade accuracy for speed in the generate-then-rank paradigm. This can be further improved through a generate-prune-then-rank approach where a weaker verifier prunes solutions prior to ranking, thus saving work on incorrect tokens. We show that this hybrid approach can be 11.65 times faster than running the whole test suite while only being 8.33% less accurate.