Conference PaperPDF Available

Robust and Scalable Online Code Execution System

Authors:

Abstract and Figures

In this paper, we present a novel, robust, scalable, and open-source online code execution system called Judge0. It features a modern modular architecture that can be deployed over an arbitrary number of computers and operating systems. We study its design, comment on the various challenges that arise in building such systems, compare it with other available online code execution systems and online judge systems, and finally comment on several scenarios how it can be used to build a wide range of applications varying from competitive programming platforms, educational and recruitment platforms, to online code editors. Though first presented now, Judge0 is in active use since October 2017 and has become a crucial part of several production systems.
Content may be subject to copyright.
... The compilation or execution of the program must be stopped after a certain time limit to save processor and memory resources and to ensure the smooth operation of the system. Non-functional requirements are not related to examples of use but to system characteristics, and in the context of code execution they are the following [2], [3]: ...
... When compiling the source code and executing the program, numerous errors are possible, and some of the causes of these errors [2], [3] are the impossibility of compiling the source code, infinite compilation time, exceptions, and infinite loops. Source code with syntax error cannot be compiled, therefore the program ends with the transfer of the cause of the error. ...
... For Python solutions, we utilized the existing source code provided by the TACO project for testing Python code. In the case of Java and C++ solutions, we use Judge0 [20], a robust and scalable online code execution system. As an open-source project with a readily available Docker image, Judge0 has become a crucial part of various production systems requiring online code execution capabilities. ...
Article
Full-text available
The widespread use of virtual assistants (e.g., GPT4 and Gemini, etc.) by students in their academic assignments raises concerns about academic integrity. Consequently, various machine-generated text (MGT) detection methods, developed from metric-based and model-based approaches, were proposed and shown to be highly effective. The model-based MGT methods often encounter difficulties when dealing with source codes due to disparities in semantics compared to natural languages. Meanwhile, the efficacy of metric-based MGT methods on source codes has not been investigated. Moreover, the challenge of identifying machine-generated codes (MGC) has received less attention, and existing solutions demonstrate low accuracy and high false positive rates across diverse human-written codes. In this paper, we take into account both semantic features extracted from Large Language Models (LLMs) and the applicability of metrics (e.g., Log-Likelihood, Rank, Log-rank, etc.) for source code analysis. Concretely, we propose MageCode, a novel method for identifying machine-generated codes. MageCode utilizes the pre-trained model CodeT5+ to extract semantic features from source code inputs and incorporates metric-based techniques to enhance accuracy. In order to assess the proposed method, we introduce a new dataset comprising more than 45,000 code solutions generated by LLMs for programming problems. The solutions for these programming problems which were obtained from three advanced LLMs (GPT4, Gemini, and Code-bison-32k), were written in Python, Java, and C++. The evaluation of MageCode on this dataset demonstrates superior performance compared to existing baselines, achieving up to 98.46% accuracy while maintaining a low false positive rate of less than 1%.
... accessed on 1 June 2023) to allow users to type text, and Judge0 (https://judge0.com/, accessed on 1 June 2023) [33], an open-source online code execution system, to allow users to code; the Monaco text editor and diff viewer (https://microsoft.github.io/monaco-editor/, accessed on 1 June 2023) to type and display code; and Plotly (https://github.com/plotly/react-plotly.js/, accessed on 1 June 2023) to display visualizations. ...
Article
Full-text available
Despite overwhelming evidence to the contrary, educational practices continue to be predominantly centered around outcome-oriented approaches. These practices are now thoroughly disrupted by the recent accessibility of online resources and chatbots. Among the most affected subjects are writing and computer programming. As educators transform their teaching practices to account for this disruption, it is important to note that writing and computer programming play a critical role in the development of logical and computational thinking. For instance, what and how we write shapes our thinking and sets us on the path of self-directed learning. Likewise, computer programming plays a similar role in the development of computational thinking. While most educators understand that “process” and “outcome” are both crucial and inseparable, providing constructive feedback on a learner’s formative process is challenging in most educational settings. To address this long-standing issue in education, this work presents Process Visualizations, a new set of interactive data visualizations that summarize the inherent and taught capabilities of a learner’s writing or programming process. These visualizations provide insightful, empowering, and personalized process-oriented feedback to learners and help to improve cognitive and metacognitive skills. Likewise, they assist educators in enhancing their effectiveness in the process-aware teaching of writing or computer programming. The toolbox for generating the visualizations, named Process Feedback, is ready to be tested by educators and learners and is publicly available as a website.
... Based on the collected UVa dataset, Skiena and Revilla [162] wrote the book "Programming Challenges: The Programming Contest Training Manual" to help students in programming contests. Judge0 [41] is an open-source 2 , scalable, and powerful online code execution tool that can be used in a wide range of programming-related applications such as programming competitions, e-learning platforms, recruitment, and online code editors and IDEs. A partial list of OJ systems is given in Table 1. ...
Preprint
Full-text available
The automated code evaluation system (AES) is mainly designed to reliably assess user-submitted code. The code is compiled and then tested in a unified environment with predefined input and output test cases. Due to their extensive range of applications and the accumulation of valuable resources, AESs are becoming increasingly popular. Research on the application of AES and their real-world resource exploration for diverse coding tasks is still lacking. In this study, we conducted a comprehensive survey on AESs and their resources. This survey explores the application areas of AESs, available resources, and resource utilization for coding tasks. AESs are categorized into programming contests, programming learning and education, recruitment, online compilers, and additional modules, depending on their application. We explore the available datasets and other resources of these systems for research, analysis, and coding tasks. The success of machine learning models for inference procedures depends primarily on the purity of the data, where the accumulated real-life data (e.g., codes and submission logs) from AESs can be a valuable treasure. Moreover, we provide an overview of machine learning-driven coding tasks, such as bug detection, code review, comprehension, refactoring, search, representation, and repair. These tasks are performed using real-life datasets. In addition, we briefly discuss the Aizu Online Judge platform as a real example of an AES from the perspectives of system design (hardware and software), operation (competition and education), and research. This is due to the scalability of the AOJ platform (programming education, competitions, and practice), open internal features (hardware and software), attention from the research community, open source data (e.g., solution codes and submission documents), and transparency. We also analyze the overall performance of this system and the perceived challenges over the years.
... Tailwind and JavaScript were used for building the website. The site includes CKEditor for allowing users to type text, and Judge0 API [29], an open-source online code execution system, for allowing users to code, Monaco text editor and diff viewer for typing and displaying code, and Plotly for displaying visualizations. The web application is hosted in Cloudflare. ...
Preprint
Full-text available
The landscape of educational practices for teaching and learning languages has been predominantly centered around outcome-driven approaches. The recent accessibility of large language models has thoroughly disrupted these approaches. As we transform our language teaching and learning practices to account for this disruption, it is important to note that language learning plays a pivotal role in developing human intelligence. Writing and computer programming are two essential skills integral to our education systems. What and how we write shapes our thinking and sets us on the path of self-directed learning. While most educators understand that `process' and `product' are both important and inseparable, in most educational settings, providing constructive feedback on a learner's formative process is challenging. For instance, it is straightforward in computer programming to assess whether a learner-submitted code runs. However, evaluating the learner's creative process and providing meaningful feedback on the process can be challenging. To address this long-standing issue in education (and learning), this work presents a new set of visualization tools to summarize the inherent and taught capabilities of a learner's writing or programming process. These interactive Process Visualizations (PVs) provide insightful, empowering, and personalized process-oriented feedback to the learners. The toolbox is ready to be tested by educators and learners and is publicly available at www.processfeedback.org. Focusing on providing feedback on a learner's process--from self, peers, and educators--will facilitate learners' ability to acquire higher-order skills such as self-directed learning and metacognition.
Preprint
Full-text available
Although large language models (LLMs) have been largely successful in generating functionally correct programs, conditioning models to produce efficient solutions while ensuring correctness remains a challenge. Further, unreliability in benchmarking code efficiency is a hurdle across varying hardware specifications for popular interpreted languages such as Python. In this paper, we present ECCO, a reproducible benchmark for evaluating program efficiency via two paradigms: natural language (NL) based code generation and history-based code editing. On ECCO, we adapt and thoroughly investigate the three most promising existing LLM-based approaches: in-context learning, iterative refinement with execution or NL feedback, and fine-tuning conditioned on execution and editing history. While most methods degrade functional correctness and moderately increase program efficiency, we find that adding execution information often helps maintain functional correctness, and NL feedback enhances more on efficiency. We release our benchmark to support future work on LLM-based generation of efficient code.
Article
Code Verse features a revolutionary online code editor that aims to revolutionize your coding experience by providing a seamless interface between different programming languages. This innovative platform supports developers working with languages such as JavaScript, Python, Java, php etc. facilitating a versatile programming environment that adapts to the diverse needs of users. Code verse's user-friendly interface removes the traditional barriers associated with language-specific editors and allows developers to seamlessly switch between languages within the same platform.
Article
Full-text available
Automated Programming Assessment Systems (APAS) are used for overcoming problems associated with manually managed programming assignments, such as objective and efficient assessing in large classes and providing timely and helpful feedback. In this paper we survey the literature and software in this field and identify the set of necessary features that make APAS comprehensive – such that it can support all key stages in the assessment process. Put differently, comprehensive APAS is generic enough to meet the demands of "any" computer science course. Despite the vast number of publications, the choice of software turns out to be very limited. We contribute by developing Edgar, a comprehensive open-source APAS which, to the best of our knowledge, exceeds any other similar free and/or open-source tool. Edgar is the result of three years of development and usage in, for the time being, eight courses dealing with various programming languages and paradigms (C, Java, SQL, etc.). Edgar supports various text-based programming languages, multi-correct multiple-choice questions, provides rich exam logging and monitoring infrastructure to prevent potential fraudulent behaviour, and subsequent data analysis and visualization of students’ scores, exams, question quality, etc. It can be deployed on all major operating systems and is written in a modular fashion so that it can be adjusted and scaled to a custom fit. We comment on the architecture and present data from real-world use-cases to support these claims. Edgar is in active use today (1000+ students per semester) and it is being constantly developed with new features.
Article
Full-text available
An increasing number of countries have recently included programming education in their curricula. Similarly, utilizing programming concepts in gameplay has become popular in the videogame industry. Although many games have been developed for learning to program, their variety and their correspondence to national curricula remain an uncharted territory. Consequently, this paper has three objectives. Firstly, an investigation on the guidelines on programming education in K‐12 in seven countries was performed by collecting curricula and other relevant data official from governmental and non‐profit educational websites. Secondly, a review of existing acquirable games that utilize programming topics in their gameplay was conducted by searching popular game stores. Lastly, we compared the curricula and made suggestions as to which age group the identified games would be suitable. The results of this study can be useful to educators and curriculum designers who wish to gamify programming education.
Article
Full-text available
In this paper, the security issues in online judge systems (OJ) will be discussed. The pros and cons in different sandbox approaches in open source OJs will be analyzed. After that, well explore other possibilities to build a more suitable sandbox for online judge purpose.
Article
Online judges are systems designed for the reliable evaluation of algorithm source code submitted by users, which is next compiled and tested in a homogeneous environment. Online judges are becoming popular in various applications. Thus, we would like to review the state of the art for these systems. We classify them according to their principal objectives into systems supporting organization of competitive programming contests, enhancing education and recruitment processes, facilitating the solving of data mining challenges, online compilers and development platforms integrated as components of other custom systems. Moreover, we introduce a formal definition of an online judge system and summarize the common evaluation methodology supported by such systems. Finally, we briefly discuss an Optil.io platform as an example of an online judge system, which has been proposed for the solving of complex optimization problems. We also analyze the competition results conducted using this platform. The competition proved that online judge systems, strengthened by crowdsourcing concepts, can be successfully applied to accurately and efficiently solve complex industrial- and science-driven challenges.
Conference Paper
A fork bomb attack is a denial of service attack. An attacker generates many processes rapidly, exhausting the resources of the target computer systems. There are several previous work to detect and remove the processes that cause fork bomb attacks. However, the operating system with the previous methods have the risks to terminate inappropriate processes that do not fork bomb processes. In this paper, we propose a new method that named process resource quarantine. With the proposed method, the operating systems don't terminate the detected fork bomb processes. Instead of the termination, the operating systems make resource limitations for the detected processes and inspect them periodically. We implemented the proposed method on Linux kernel and executed several evaluation experiments. The results show that the proposed method is effective for fork bomb attacks mitigation.
Article
We give an update on CMS, the free and open source grading system used in IOI 2012, 2013 and 2014. In particular, we focus on the new features and development practices; on what we learned by running dozens of contests with CMS; on the community of users and developers that has started to grow around it.
Article
Programming contests with automatic evaluation of submitted solutions usually employ a sandbox. Its job is to run the solution in a controlled environment, while enforcing security and resource limits. We present a new construction of a sandbox, based on recently added container features of Linux kernel. Unlike previous sandboxes, it has no measurable overhead and is able to handle multi-threaded programs.
Article
We present Contest Management System (CMS), the free and open source grading system that will be used in IOI 2012. CMS has been designed and developed from scratch, with the aim of providing a grading system that naturally adapts to the needs of an IOI-like competition, including the team selection processes. Particular care has been taken to make CMS secure, robust, developed for the community, extensible, easily adaptable and usable.
Article
Docker promises the ability to package applications and their dependencies into lightweight containers that move easily between different distros, start up quickly and are isolated from each other.