I currently am a Second Year Ph.D. Student and CGRS Fellow at the University of Wisconsin-Madison. I am advised by Aws Albarghouthi and Fred Sala.
News
- Co-Organized the third Deep Learning for Code Workshop at ICLR 2025.
- I will be working on agents in summer 2024 at Replit.
- I will be interning at Magic AI from May to August 2023.
- I will be interning at X, The Moonshot Company from May to December 2022.
- Co-Organizing the Deep Learning For Code Workshop at ICLR 2022.
Featured
-
Coding Agents Are Lazy Patchers
Published: at 07:40 AM[code]AI coding agents become lazy patchers under iterative changes, copying code instead of refactoring. This creates massive god functions that are unmaintainable—explaining the gap between benchmark scores and real-world experience.
-
SlopCodeBench: Measuring Code Erosion Under Iterative Specification Refinement
Published: at 09:22 AM[code]SlopCodeBench evaluates AI coding agents under iterative specification updates. Unlike single-shot benchmarks, SCBench reveals verbosity and structural erosion that make agent-written code unmaintainable over time.
Publications
-
Reward Models Enable Scalable Code Verification by Trading Accuracy for Throughput
G. Orlanski , N. Roberts , A. Albarghouthi , and F. Sala -
Measuring The Impact Of Programming Language Distribution
G. Orlanski , K. Xiao , X. Garcia , J. Hui , J. Howland , J. Malmaud , J. Austin , R. Singh , and M. Catasta -
Reading StackOverflow Encourages Cheating: Adding Question TextImproves Extractive Code Generation.
G. Orlanski and A. Gittens
Recent Posts
-
Reward Models Enable Scalable Code Verification by Trading Accuracy for Throughput
Outcome Reward Models for code verification allow one to trade accuracy for speed in the generate-then-rank paradigm. This can be further improved through a generate-prune-then-rank approach where a weaker verifier prunes solutions prior to ranking, thus saving work on incorrect tokens. We show that this hybrid approach can be 11.65 times faster than running the whole test suite while only being 8.33% less accurate.