Conference Paper

ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Iterative refinement with execution feedback Existing LM-based code editing approaches often leverage iterative refinement with execution feedback (Huang et al., 2024;Peng et al., 2024;Xia & Zhang, 2024;Waghjale et al., 2024), which relies on the availability of test inputs. However, the code to be edited may not always be well-maintained. ...
Preprint
Full-text available
Code editing is a foundational task in software development, where its effectiveness depends on whether it introduces desired code property changes without changing the original code's intended functionality. Existing approaches often formulate code editing as an implicit end-to-end task, omitting the fact that code-editing procedures inherently consist of discrete and explicit steps. Thus, they suffer from suboptimal performance and lack of robustness and generalization. We introduce EditLord, a code editing framework that makes the code transformation steps explicit. Our key insight is to employ a language model (LM) as an inductive learner to extract code editing rules from the training code pairs as concise meta-rule sets. Such rule sets will be manifested for each training sample to augment them for finetuning or assist in prompting- and iterative-based code editing. EditLordoutperforms the state-of-the-art by an average of 22.7% in editing performance and 58.1% in robustness while achieving 20.2% higher functional correctness across critical software engineering and security applications, LM models, and editing modes.
... In the past year, there have been several efforts to build LMs that can automate algorithmic coding [5,19,31], resolving GitHub issues [43,44], and domain-specific coding [17,46]. While these works focus on producing correct and functional code, subsequent works have explored LMs' ability to produce solutions with better algorithmic and asymptotic efficiency [21,40]. KernelBench focuses on wall-clock efficiency. ...
Preprint
Full-text available
Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore using language models (LMs) to automate kernel generation. We introduce KernelBench, an open-source framework for evaluating LMs' ability to write fast and correct kernels on a suite of 250 carefully selected PyTorch ML workloads. KernelBench represents a real-world engineering environment and making progress on the introduced benchmark directly translates to faster practical kernels. We introduce a new evaluation metric fast_p, which measures the percentage of generated kernels that are functionally correct and offer a speedup greater than an adjustable threshold p over baseline. Our experiments across various state-of-the-art models and test-time methods show that frontier reasoning models perform the best out of the box but still fall short overall, matching the PyTorch baseline in less than 20% of the cases. While we show that results can improve by leveraging execution and profiling feedback during iterative refinement, KernelBench remains a challenging benchmark, with its difficulty increasing as we raise speedup threshold p.
Conference Paper
Full-text available
In this paper, we present a novel, robust, scalable, and open-source online code execution system called Judge0. It features a modern modular architecture that can be deployed over an arbitrary number of computers and operating systems. We study its design, comment on the various challenges that arise in building such systems, compare it with other available online code execution systems and online judge systems, and finally comment on several scenarios how it can be used to build a wide range of applications varying from competitive programming platforms, educational and recruitment platforms, to online code editors. Though first presented now, Judge0 is in active use since October 2017 and has become a crucial part of several production systems.
Article
Programming is a powerful and ubiquitous problem-solving tool. Systems that can assist programmers or even generate programs themselves could make programming more productive and accessible. Recent transformer-based neural network models show impressive code generation abilities yet still perform poorly on more complex tasks requiring problem-solving skills, such as competitive programming problems. Here, we introduce AlphaCode, a system for code generation that achieved an average ranking in the top 54.3% in simulated evaluations on recent programming competitions on the Codeforces platform. AlphaCode solves problems by generating millions of diverse programs using specially trained transformer-based networks and then filtering and clustering those programs to a maximum of just 10 submissions. This result marks the first time an artificial intelligence system has performed competitively in programming competitions.
  • Seung-Yeop
  • Mingi Baik
  • Joonghyuk Jeon
  • Jungin Hahn
  • Yo-Sub Kim
  • Sang-Ki Han
  • Ko
Seung-Yeop Baik, Mingi Jeon, Joonghyuk Hahn, Jungin Kim, Yo-Sub Han, and Sang-Ki Ko. 2024. Codecomplex: A time-complexity dataset for bilingual source codes. arXiv preprint arXiv:2401.08719.
Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners
  • Tom Brown
  • Benjamin Mann
  • Nick Ryder
  • Melanie Subbiah
  • Jared D Kaplan
  • Prafulla Dhariwal
  • Arvind Neelakantan
  • Pranav Shyam
  • Girish Sastry
  • Amanda Askell
  • Sandhini Agarwal
  • Ariel Herbert-Voss
  • Gretchen Krueger
  • Tom Henighan
  • Rewon Child
  • Aditya Ramesh
  • Daniel Ziegler
  • Jeffrey Wu
  • Clemens Winter
  • Chris Hesse
  • Mark Chen
  • Eric Sigler
  • Mateusz Litwin
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877-1901. Curran Associates, Inc.
  • Rudy Bunel
  • Alban Desmaison
  • Pawan Kumar
  • H S Philip
  • Pushmeet Torr
  • Kohli
Rudy Bunel, Alban Desmaison, M Pawan Kumar, Philip HS Torr, and Pushmeet Kohli. 2016. Learning to superoptimize programs. arXiv preprint arXiv:1611.01787.
  • Mark Chen
  • Jerry Tworek
  • Heewoo Jun
  • Qiming Yuan
  • Henrique Ponde De Oliveira Pinto
  • Jared Kaplan
  • Harri Edwards
  • Yuri Burda
  • Nicholas Joseph
  • Greg Brockman
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  • Chris Cummins
  • Volker Seeker
  • Dejan Grubisic
  • Mostafa Elhoushi
  • Youwei Liang
  • Jonas Baptiste Roziere
  • Fabian Gehring
  • Kim Gloeckle
  • Hazelwood
Chris Cummins, Volker Seeker, Dejan Grubisic, Mostafa Elhoushi, Youwei Liang, Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Kim Hazelwood, Gabriel Synnaeve, et al. 2023a. Large language models for compiler optimization. arXiv preprint arXiv:2309.07062.
  • Chris Cummins
  • Volker Seeker
  • Dejan Grubisic
  • Mostafa Elhoushi
  • Youwei Liang
  • Jonas Baptiste Roziere
  • Fabian Gehring
  • Kim Gloeckle
  • Gabriel Hazelwood
  • Synnaeve
Chris Cummins, Volker Seeker, Dejan Grubisic, Mostafa Elhoushi, Youwei Liang, Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Kim Hazelwood, Gabriel Synnaeve, et al. 2023b. Large language models for compiler optimization. arXiv preprint arXiv:2309.07062.
Search-based llms for code optimization
  • Shuzheng Gao
  • Cuiyun Gao
  • Wenchao Gu
  • Michael Lyu
Shuzheng Gao, Cuiyun Gao, Wenchao Gu, and Michael Lyu. 2024. Search-based llms for code optimization. arXiv preprint arXiv:2408.12159.
Code-optimise: Selfgenerated preference data for correctness and efficiency
  • Leonidas Gee
  • Milan Gritta
  • Gerasimos Lampouras
  • Ignacio Iacobacci
Leonidas Gee, Milan Gritta, Gerasimos Lampouras, and Ignacio Iacobacci. 2024. Code-optimise: Selfgenerated preference data for correctness and efficiency. arXiv preprint arXiv:2406.12502.
Deepseek-coder: When the large language model meets programming-the rise of code intelligence
  • Daya Guo
  • Qihao Zhu
  • Dejian Yang
  • Zhenda Xie
  • Kai Dong
  • Wentao Zhang
  • Guanting Chen
  • Xiao Bi
  • Y Wu
  • Li
Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y Wu, YK Li, et al. 2024. Deepseek-coder: When the large language model meets programming-the rise of code intelligence. arXiv preprint arXiv:2401.14196.
LoRA: Low-rank adaptation of large language models
  • J Edward
  • Phillip Hu
  • Zeyuan Wallis
  • Yuanzhi Allen-Zhu
  • Shean Li
  • Lu Wang
  • Weizhu Wang
  • Chen
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
Effibench: Benchmarking the efficiency of automatically generated code
  • Dong Huang
  • M Jie
  • Yuhao Zhang
  • Heming Qing
  • Cui
Dong Huang, Jie M Zhang, Yuhao Qing, and Heming Cui. 2024. Effibench: Benchmarking the efficiency of automatically generated code. arXiv preprint arXiv:2402.02037.
Deep learning-based source code complexity prediction
  • Mingi Jeon
  • Joonghyuk Seung Yeop Baik
  • Yo-Sub Hahn
  • Sang-Ki Han
  • Ko
Mingi Jeon, Seung yeop Baik, Joonghyuk Hahn, Yo-Sub Han, and Sang-Ki Ko. 2023. Deep learning-based source code complexity prediction.
  • Mohammad Abdullah Matin Khan
  • Xuan Long Bari
  • Weishi Do
  • Wang
Mohammad Abdullah Matin Khan, M Saiful Bari, Xuan Long Do, Weishi Wang, Md Rizwan Parvez, and Shafiq Joty. 2023. xcodeeval: A large scale multilingual multitask benchmark for code understanding, generation, translation and retrieval. arXiv preprint arXiv:2303.03004.
Yifeng Ding, and Lingming Zhang. 2024. Evaluating language models for efficient code generation
  • Jiawei Liu
  • Songrun Xie
  • Junhao Wang
  • Yuxiang Wei
Jiawei Liu, Songrun Xie, Junhao Wang, Yuxiang Wei, Yifeng Ding, and Lingming Zhang. 2024. Evaluating language models for efficient code generation. arXiv preprint arXiv:2408.06450.
  • Anton Lozhkov
  • Raymond Li
  • Loubna Ben Allal
  • Federico Cassano
  • Joel Lamy-Poirier
  • Nouamane Tazi
  • Ao Tang
  • Dmytro Pykhtar
  • Jiawei Liu
  • Yuxiang Wei
Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, et al. 2024. Starcoder 2 and the stack v2: The next generation. arXiv preprint arXiv:2402.19173.
  • Ziyang Luo
  • Can Xu
  • Pu Zhao
  • Qingfeng Sun
  • Xiubo Geng
  • Wenxiang Hu
  • Chongyang Tao
  • Jing Ma
Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, and Daxin Jiang. 2023. Wizardcoder: Empowering code large language models with evolinstruct. arXiv preprint arXiv:2306.08568.
Self-refine: Iterative refinement with self-feedback
  • Aman Madaan
  • Niket Tandon
  • Prakhar Gupta
  • Skyler Hallinan
  • Luyu Gao
  • Sarah Wiegreffe
  • Uri Alon
  • Nouha Dziri
  • Shrimai Prabhumoye
  • Yiming Yang
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. 2024. Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 36.
  • Kaushik Moudgalya
  • Ankit Ramakrishnan
  • Vamsikrishna Chemudupati
  • Xing Han Lu
Kaushik Moudgalya, Ankit Ramakrishnan, Vamsikrishna Chemudupati, and Xing Han Lu. 2023. Tasty: A transformer based approach to space and time complexity. arXiv preprint arXiv:2305.05379.
Training language models to follow instructions with human feedback
  • Long Ouyang
  • Jeffrey Wu
  • Xu Jiang
  • Diogo Almeida
  • Carroll Wainwright
  • Pamela Mishkin
  • Chong Zhang
  • Sandhini Agarwal
  • Katarina Slama
  • Alex Ray
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730-27744.
  • Ruchir Puri
  • S David
  • Geert Kung
  • Wei Janssen
  • Giacomo Zhang
  • Vladimir Domeniconi
  • Julian Zolotov
  • Jie Dolby
  • Mihir Chen
  • Lindsey Choudhury
  • Decker
Ruchir Puri, David S Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, et al. 2021. Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks. arXiv preprint arXiv:2105.12655.
Code generation with alphacodium: From prompt engineering to flow engineering
  • Tal Ridnik
  • Dedy Kredo
  • Itamar Friedman
Tal Ridnik, Dedy Kredo, and Itamar Friedman. 2024. Code generation with alphacodium: From prompt engineering to flow engineering. arXiv preprint arXiv:2401.08500.
  • Jonas Baptiste Roziere
  • Fabian Gehring
  • Sten Gloeckle
  • Itai Sootla
  • Xiaoqing Ellen Gat
  • Yossi Tan
  • Jingyu Adi
  • Tal Liu
  • Jérémy Remez
  • Rapin
Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, et al. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
Deep symbolic superoptimization without human knowledge
  • Hui Shi
  • Yang Zhang
Hui Shi and Yang Zhang. 2020. Deep symbolic superoptimization without human knowledge. ICLR 2020.
Reflexion: Language agents with verbal reinforcement learning
  • Noah Shinn
  • Federico Cassano
  • Ashwin Gopinath
  • Karthik Narasimhan
  • Shunyu Yao
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2024. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36.
Learning to superoptimize real-world programs
  • Alex Shypula
  • Pengcheng Yin
  • Jeremy Lacomis
  • Claire Le Goues
  • Edward Schwartz
  • Graham Neubig
Alex Shypula, Pengcheng Yin, Jeremy Lacomis, Claire Le Goues, Edward Schwartz, and Graham Neubig. 2021. Learning to superoptimize real-world programs. arXiv preprint arXiv:2109.13498.
Parthasarathy Ranganathan, Osbert Bastani, and Amir Yazdanbakhsh. 2024. Learning performance-improving code edits
  • G Alexander
  • Aman Shypula
  • Yimeng Madaan
  • Uri Zeng
  • Jacob R Alon
  • Yiming Gardner
  • Milad Yang
  • Graham Hashemi
  • Neubig
Alexander G Shypula, Aman Madaan, Yimeng Zeng, Uri Alon, Jacob R. Gardner, Yiming Yang, Milad Hashemi, Graham Neubig, Parthasarathy Ranganathan, Osbert Bastani, and Amir Yazdanbakhsh. 2024. Learning performance-improving code edits. In The Twelfth International Conference on Learning Representations.
Learning based methods for code runtime complexity prediction
  • Jagriti Sikka
  • Kushal Satya
  • Yaman Kumar
  • Shagun Uppal
  • Rajiv Ratn
  • Roger Shah
  • Zimmermann
Jagriti Sikka, Kushal Satya, Yaman Kumar, Shagun Uppal, Rajiv Ratn Shah, and Roger Zimmermann. 2020. Learning based methods for code runtime complexity prediction. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14-17, 2020, Proceedings, Part I 42, pages 313-325. Springer.
  • Manav Singhal
  • Tushar Aggarwal
  • Abhijeet Awasthi
  • Nagarajan Natarajan
  • Aditya Kanade
Manav Singhal, Tushar Aggarwal, Abhijeet Awasthi, Nagarajan Natarajan, and Aditya Kanade. 2024. Nofuneval: Funny how code lms falter on requirements beyond functional correctness. arXiv preprint arXiv:2401.15963.
  • Gemma Team
  • Thomas Mesnard
  • Cassidy Hardin
  • Robert Dadashi
  • Surya Bhupatiraju
  • Shreya Pathak
  • Laurent Sifre
  • Morgane Rivière
  • Mihir Sanjay Kale
  • Juliette Love
Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. 2024. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295.
Finetuned language models are zero-shot learners
  • Dai
  • Le
Dai, and Quoc V Le. 2022. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
Iterative or innovative? a problem-oriented perspective for code optimization
  • Tengfei Tong Ye
  • Lingfei Ma
  • Xuhong Wu
  • Shouling Zhang
  • Wenhai Ji
  • Wang
Tong Ye, Tengfei Ma, Lingfei Wu, Xuhong Zhang, Shouling Ji, and Wenhai Wang. 2024. Iterative or innovative? a problem-oriented perspective for code optimization. arXiv preprint arXiv:2406.11935.