Conference Paper

Refactoring: Improving the Design of Existing Code

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Almost every expert in Object-Oriented Development stresses the importance of iterative development. As you proceed with the iterative development, you need to add function to the existing code base. If you are really lucky that code base is structured just right to support the new function while still preserving its design integrity. Of course most of the time we are not lucky, the code does not quite fit what we want to do. You could just add the function on top of the code base. But soon this leads to applying patch upon patch making your system more complex than it needs to be. This complexity leads to bugs, and cripples your productivity.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... • Refactoring large, complex codebases [5]; • Automating repetitive tasks such as boilerplate code generation for CRUD operations [6][7][8] or DTO (Data Transfer Object) classes [9]; • Implementing multi-step workflows like creating database models, updating service layers, and generating API endpoints; • Conducting automated code reviews to ensure adherence to coding standards and best practices; • Detecting and fixing code smells, such as duplicated code or overly complex methods [5]; • Generating comprehensive unit and integration tests to ensure high code coverage [10]; • Replacing deprecated APIs with modern alternatives; • Addressing security vulnerabilities by identifying and fixing common issues like SQL injection [11,12] or hardcoded secrets. ...
... • Refactoring large, complex codebases [5]; • Automating repetitive tasks such as boilerplate code generation for CRUD operations [6][7][8] or DTO (Data Transfer Object) classes [9]; • Implementing multi-step workflows like creating database models, updating service layers, and generating API endpoints; • Conducting automated code reviews to ensure adherence to coding standards and best practices; • Detecting and fixing code smells, such as duplicated code or overly complex methods [5]; • Generating comprehensive unit and integration tests to ensure high code coverage [10]; • Replacing deprecated APIs with modern alternatives; • Addressing security vulnerabilities by identifying and fixing common issues like SQL injection [11,12] or hardcoded secrets. ...
... • Refactoring Large Codebases: The framework enhances maintainability by breaking down large, complex methods into smaller, modular functions, improving readability and efficiency. • Fixing Code Smells: The system identifies issues such as duplicated code, deep nesting, and long methods [5] and suggests structured refactoring improvements. ...
Article
Full-text available
Recent AI-assisted coding tools, such as GitHub Copilot and Cursor, have enhanced developer productivity through real-time snippet suggestions. However, these tools primarily assist with isolated coding tasks and lack a structured approach to automating complex, multi-step software development workflows. This paper introduces a workflow-centric AI framework for end-to-end automation, from requirements gathering to code generation, validation, and integration, while maintaining developer oversight. Key innovations include automatic context discovery, which selects relevant codebase elements to improve LLM accuracy; a structured execution pipeline using Prompt Pipeline Language (PPL) for iterative code refinement; self-healing mechanisms that generate tests, detect errors, trigger rollbacks, and regenerate faulty code; and AI-assisted code merging, which preserves manual modifications while integrating AI-generated updates. These capabilities enable efficient automation of repetitive tasks, enforcement of coding standards, and streamlined development workflows. This approach lays the groundwork for AI-driven development that remains adaptable as LLM models advance, progressively reducing the need for human intervention while ensuring code reliability.
... Software refactoring is widely employed to improve software quality, especially its readability, maintainability, and reusability (Fowler 1999;Mens and Tourwé 2004;Baqais and Alshayeb 2020). To facilitate software refactoring, a large number of approaches/tools have been proposed to identify refactoring opportunities (Tourwé and Mens 2003;Chatzigeorgiou 2009, 2011), to recommend refactoring solutions (Mkaouer et al. 2017;Alizadeh and Kessentini 2018;Alizadeh et al. 2020), and to automatically conduct specified refactorings (Ge et al. 2012;Foster et al. 2012;Alizadeh et al. 2019). ...
... refactoring), it is likely that the output could be semantically nonequivalent to the input, which results in changes in software system's external behaviors. However, according to the definition of software refactoring (Fowler 1999), software refactoring should never change external behaviors. In this case, the suggested "refactorings" are unsafe. ...
... To gain a deeper understanding of the strengths and weaknesses of LLM-based identification of refactoring opportunities, we manually analyzed the reasons behind the refactorings in the subject dataset and classified them into 23 refactoring subcategories (reasons). For example, code duplication, one of the most common code smells, refers to the duplicated code snippets within a software project (Fowler 1999;Yamashita and Moonen 2013). The extract method refactoring is an effective and commonly employed strategy to mitigate this issue (Silva et al. 2016;. ...
Article
Full-text available
Software refactoring is an essential activity for improving the readability, maintainability, and reusability of software projects. To this end, a large number of automated or semi-automated approaches/tools have been proposed to locate poorly designed code, recommend refactoring solutions, and conduct specified refactorings. However, even equipped with such tools, it remains challenging for developers to decide where and what kind of refactorings should be applied. Recent advances in deep learning techniques, especially in large language models (LLMs), make it potentially feasible to automatically refactor source code with LLMs. However, it remains unclear how well LLMs perform compared to human experts in conducting refactorings automatically and accurately. To fill this gap, in this paper, we conduct an empirical study to investigate the potential of LLMs in automated software refactoring, focusing on the identification of refactoring opportunities and the recommendation of refactoring solutions. We first construct a high-quality refactoring dataset comprising 180 real-world refactorings from 20 projects, and conduct the empirical study on the dataset. With the to-be-refactored Java documents as input, ChatGPT and Gemini identified only 28 and 7 respectively out of the 180 refactoring opportunities. The evaluation results suggested that the performance of LLMs in identifying refactoring opportunities is generally low and remains an open problem. However, explaining the expected refactoring subcategories and narrowing the search space in the prompts substantially increased the success rate of ChatGPT from 15.6 to 86.7%. Concerning the recommendation of refactoring solutions, ChatGPT recommended 176 refactoring solutions for the 180 refactorings, and 63.6% of the recommended solutions were comparable to (even better than) those constructed by human experts. However, 13 out of the 176 solutions suggested by ChatGPT and 9 out of the 137 solutions suggested by Gemini were unsafe in that they either changed the functionality of the source code or introduced syntax errors, which indicate the risk of LLM-based refactoring.
... Transformations that require name binding preservation are common across many refactorings, such as those from Fowler's catalog [7]. Yet, manually refactoring code is time-consuming and error-prone. ...
... Yet, manually refactoring code is time-consuming and error-prone. Consequently, many modern Integrated Development Environments (IDEs) provide automated refactorings such as Rename, Inline/Extract Method, and Pull Up/Push Down [7], which attempt to automatically requalify references to maintain the program's binding structure. ...
... We derived test cases from Java files used to validate the implementation of refactorings in the JetBrains IntelliJ IDE. 7 (2) ChocoPy [20] is a statically-typed dialect of Python. We used an existing Statix specification and ChocoPy files for our evaluation. ...
Preprint
Full-text available
Modern Integrated Development Environments (IDEs) offer automated refactorings to aid programmers in developing and maintaining software. However, implementing sound automated refactorings is challenging, as refactorings may inadvertently introduce name-binding errors or cause references to resolve to incorrect declarations. To address these issues, previous work by Sch\"afer et al. proposed replacing concrete references with locked references to separate binding preservation from transformation. Locked references vacuously resolve to a specific declaration, and after transformation must be replaced with concrete references that also resolve to that declaration. Synthesizing these references requires a faithful inverse of the name lookup functions of the underlying language. Manually implementing such inverse lookup functions is challenging due to the complex name-binding features in modern programming languages. Instead, we propose to automatically derive this function from type system specifications written in the Statix meta-DSL. To guide the synthesis of qualified references we use scope graphs, which represent the binding structure of a program, to infer their names and discover their syntactic structure. We evaluate our approach by synthesizing concrete references for locked references in 2528 Java, 196 ChocoPy, and 49 Featherweight Generic Java test programs. Our approach yields a principled language-parametric method for synthesizing references.
... These preconditions guarantee that the resulting program compiles, maintains or improves its quality, and preserves the original program's behavior. For example, in the Extract Class refactoring [1], the newly introduced class must not have a name that conflicts with an existing class to avoid compilation errors. In practice, developers can refactor code manually, which is error-prone and time-consuming, or use automated tools that support various refactorings, such as VSCode [19], IntelliJ [20], Eclipse [21], and NetBeans [22]. ...
... This issue was reported in Eclipse's bug tracker. 1 Kerievsky encountered it while developing a refactoring mechanic to introduce the Factory Method pattern for his book [25]. He argued that "there should be no warnings as the transformation is harmless and correct." ...
... If the program failed to compile, we classified the response as incorrect. If it compiled successfully, we manually verified whether the transformation adhered to the expected refactoring mechanics [1]. For example, in the Pull Up Method refactoring, we ensured that the method was correctly removed from the subclass and properly declared in the parent class. ...
Preprint
Full-text available
Popular IDEs frequently contain bugs in their refactoring implementations. Ensuring that a transformation preserves a program's behavior is a complex task. Traditional detection methods rely on predefined preconditions for each refactoring type, limiting their scalability and adaptability to new transformations. These methods often require extensive static and dynamic analyses, which are computationally expensive, time-consuming, and may still fail to detect certain refactoring bugs. This study evaluates the effectiveness of Small Language Models (SLMs) in detecting two types of refactoring bugs in Java and Python: (i) transformations that introduce errors or behavioral changes (Type I) and (ii) transformations unnecessarily blocked by IDEs despite being valid (Type II). We assess whether Llama 3.2 3B, Mistral 7B, Gemma 2 9B, DeepSeek-R1 14B, Phi-4 14B, o1-mini, and o3-mini-high can accurately detect 100 refactoring bugs reported in widely used Java and Python IDEs, such as Eclipse and NetBeans. The study covers 16 refactoring types and employs zero-shot prompting on consumer-grade hardware to evaluate the models' ability to reason about refactoring correctness without explicit prior training. The proprietary o3-mini-high model achieved the highest detection rate, identifying 84.3% of Type I bugs. The open-source Phi-4 14B performed comparably well, demonstrating strong effectiveness across both bug types. However, o3-mini-high struggled with Type II bugs, correctly identifying and applying valid but blocked transformations in only 40% of cases. The findings highlight the potential of SLMs for efficiently detecting refactoring bugs, particularly in verifying behavioral changes. Additionally, SLMs offer a more adaptable solution capable of generalizing across different refactoring types and programming languages, addressing key limitations of traditional approaches.
... Low maintainability increases the overall development cost and harms other software quality attributes, such as security, performance, and reliability [3] [4]. In light of these considerations, the importance of training software engineers to build maintainable software has been underscored by both industry and academia [5] [6][7] [8]. ...
... A typical example of a maintainability issue is the Long Method [6], which occurs when a function does too many things. Understanding such functions requires analyzing their segments to discern intent before interpreting the function as a Fig. 1. ...
... A 40-line function with repetitive, easily understandable instructions may be maintainable, while a 10line function with complex logic might require refactoring. Even a single line of code may warrant extraction into a separate function if its intent is unclear [5] [6]. Although function length is an easily measurable metric, comparing it against a threshold is insufficient to determine whether a function exhibits the Long Method issue. ...
Article
Software engineers are tasked with writing functionally correct code of high quality. Maintainability is a crucial code quality attribute that determines the ease of analyzing, modifying, reusing, and testing a software component. This quality attribute significantly affects the software's lifetime cost, contributing to developer productivity and other quality attributes. Consequently, academia and industry emphasize the need to train software engineers to build maintainable software code. Unfortunately, code maintainability is an ill-defined domain and is challenging to teach and learn. This problem is aggravated by a rising number of software engineering students and a lack of capable instructors. Existing instructors rely on scalable one-sizefits-all teaching methods that are ineffective. Advances in elearning technologies can alleviate these issues. Our primary contribution is the design of a novel assessment item type, the maintainability challenge. It integrates into the standard intelligent tutoring system (ITS) architecture to develop skills for analyzing and refactoring high-level code maintainability issues. Our secondary contributions include the code maintainability knowledge component model and an implementation of an ITS that supports the maintainability challenge for the C# programming language. We designed, developed, and evaluated the ITS over two years of working with undergraduate students using a mixed-method approach anchored in design science. The empirical evaluations culminated with a field study with 59 undergraduate students. We report on the evaluation results that showcase the utility of our contributions. Our contributions support software engineering instructors in developing the code maintainability skills of their students at scale.
... Refactoring, as an essential approach, improves the quality of the software by improving its structure, readability, extensibility, and reusability, without altering its function and performance (Mens and Tourwé 2004). Martin Fowler, a renowned software engineering expert, systematically summarized 22 types of code smells and discussed refactoring strategies for each (Fowler and Beck 1997). Given that manual identification of code smells is time-consuming and tedious, numerous automatic and semi-automatic detection methods for various code smells have been proposed and applied. ...
... In the realm of software engineering, these code smells are typically regarded as defects that can adversely affect software quality. The concept of code smell, along with 22 different types, was introduced and defined by Fowler and Beck (1997). Despite extensive research, including recent reviews by Rasool and Arshad (2015), Gupta et al. (2017), and Tian et al. (2023), no single method or tool can currently detect the 22 types of code smell. ...
... 2. Long Method Long Method refers to a method entity that appears too large and complex with too many lines of code, parameters, and statements. However, it is different from the Long Parameter List, which is recognized as another type of code smell that burdens a method with an overabundance of parameters in research studies (Bafandeh Mayvan et al. 2020;Tian et al. 2023), adhering to Flower's definition and classification of code smell (Fowler and Beck 1997). Our research also aligns with Flower's perspective. ...
Article
Full-text available
Code smell detection is a task aimed at identifying sub-optimal programming structures within code entities that may indicate problems requiring attention. It plays a crucial role in improving software quality. Numerous automatic or semi-automatic methods for code smell detection have been proposed. However, these methods are constrained by the manual setting of detection rules and thresholds, leading to subjective determinations, or they require large-scale labeled datasets for model training. In addition, they exhibit poor detection performance across different projects. Related studies have revealed the existence of co-occurrences among different types of code smells. Therefore, we propose a smart code smell detection method based on code smell co-occurrences, termed BMCo-O. The key insight is that code smell co-occurrences can assist in improving code smell detection. We introduce and utilize code smell co-occurrence impact factor set, a code smell pre-filter mechanism, and a possibility mechanism, which enable BMCo-O to demonstrate outstanding detection performance. To reduce manual intervention, we propose an adaptive detection mechanism that automatically adjusts parameters to detect different types of code smell in various software projects. As an initial attempt, we applied the proposed method to seven classical high-criticality code smells: Message Chain, Feature Envy, Spaghetti Code, Large Class, Complex Class, Refused Bequest, and Long Method. The evaluation results on benchmarks composed of open source software projects demonstrated that BMCo-O significantly outperforms the well-known and widely used methods in detecting these seven classical code smells, especially in F1, with improvements of 137%, 155%, 23%, 195%, 364%, 552% and 35%, respectively. To further verify its effectiveness in actual detection across different software projects, we also implemented a prototype of a new code smell detector using BMCo-O.
... Students do not automatically learn to write well-structured code as they gain more programming knowledge [6]. Although there are established guidelines and standards for professional programmers to write well-structured code, such as Clean Code [7] and Refactoring [8], these may not be suitable for novice programmers. Similarly, professional code analyzers like PMD [9] and CheckStyle [10] can flag code structure violations, but these tools are not designed for beginner programmers. ...
... Stegeman et al. [21], [73] developed a rubric to assess the quality of student code and to provide students with feedback. The rubric is based on a model of code quality derived from literature, including Clean Code [7], Refactoring [8] and Code Complete [84], and instructor's input [73]. Based on that model, the authors designed the rubric in iterations (using 'educational design research'), in which trial assessments were conducted, and findings were discussed with instructors. ...
... In the second iteration, they put more emphasis on teaching code quality throughout the software development process. Clean Code [85] and Refactoring [8] principles were introduced, with tutorials on static analysis and refactoring using Eclipse and PMD. Then, students were asked to develop high-quality software. ...
Preprint
Full-text available
Teaching the software engineers of the future to write high-quality code with good style and structure is important. This systematic literature review identifies existing instructional approaches, their objectives, and the strategies used for measuring their effectiveness. Building on an existing mapping study of code quality in education, we identified 53 papers on code structure instruction. We classified these studies into three categories: (1) studies focused on developing or evaluating automated tools and their usage (e.g., code analyzers, tutors, and refactoring tools), (2) studies discussing other instructional materials, such as learning resources (e.g., refactoring lessons and activities), rubrics, and catalogs of violations, and (3) studies discussing how to integrate code structure into the curriculum through a holistic approach to course design to support code quality. While most approaches use analyzers that point students to problems in their code, incorporating these tools into classrooms is not straightforward. Combined with further research on code structure instruction in the classroom, we call for more studies on effectiveness. Over 40% of instructional studies had no evaluation. Many studies show promise for their interventions by demonstrating improvement in student performance (e.g., reduced violations in student code when using the intervention compared with code that was written without access to the intervention). These interventions warrant further investigation on learning, to see how students apply their knowledge after the instructional supports are removed.
... Code smell (Fowler, 1999) is the term used to discernible traits or recurring patterns in software source code, indicating probable deficiencies or regions warranting enhancement. Code smells differ from conventional software bugs or errors in that they do not manifest as malfunctions; instead, they function as indications of latent design or implementation quandaries with the potential to precipitate issues in subsequent phases of development and maintenance. ...
... Fowler introduced the concept of code smells to describe potential issues within software source code that require refactoring (Fowler, 1999). Software developers have identified a wide range of code smells which can be classified into categories such as implementation, design and architecture depending on their granularity level and the overall impact on the source code (Fowler, 1999;Suryanarayana et al., 2014). ...
... Fowler introduced the concept of code smells to describe potential issues within software source code that require refactoring (Fowler, 1999). Software developers have identified a wide range of code smells which can be classified into categories such as implementation, design and architecture depending on their granularity level and the overall impact on the source code (Fowler, 1999;Suryanarayana et al., 2014). Implementation smells emerge at a detailed level including long method, long parameter list, complex conditional, among others (Fowler, 1999). ...
Preprint
Full-text available
A smell in software source code denotes an indication of suboptimal design and implementation decisions, potentially hindering the code understanding and, in turn, raising the likelihood of being prone to changes and faults. Identifying these code issues at an early stage in the software development process can mitigate these problems and enhance the overall quality of the software. Current research primarily focuses on the utilization of deep learning-based models to investigate the contextual information concealed within source code instructions to detect code smells, with limited attention given to the importance of structural and design-related features. This paper proposes a novel approach to code smell detection, constructing a deep learning architecture that places importance on the fusion of structural features and statistical semantics derived from pre-trained models for programming languages. We further provide a thorough analysis of how different source code embedding models affect the detection performance with respect to different code smell types. Using four widely-used code smells from well-designed datasets, our empirical study shows that incorporating design-related features significantly improves detection accuracy, outperforming state-of-the-art methods on the MLCQ dataset with with improvements ranging from 5.98% to 28.26%, depending on the type of code smell.
... However, fixing maintainability issues can be challenging, because a specialist review is expensive and slow, while automated tools are imprecise and require human interpretation [3]. Code smells are a common type of maintainability issue, with examples including methods that become complex and take on too many responsibilities, code that is no longer used, or instances where the same code snippets are repeated twice or more [4,5,6]. ...
... We identify SonarQube rules shared by the three most-used languages on GitHub (JavaScript, Python, and Java) [29], resulting in a total of 79 rules. We run SonarQube on each of the 10 selected projects and collect all violations 4 of these 79 rules, focusing only on the main source 4 In this paper, rule violation and issue are used interchangeably. Table I presents information on projects and the number of issues evaluated in the study. ...
... Only 4 methods (44.4%) were fixed by all strategies, while at least one LLM successfully fixed 7 methods (77.8%). All successful solutions generated by the LLMs use the Consolidate Conditional Expression refactoring [4]. Compilation errors and test failures occur when LLMs hallucinate. ...
Preprint
Full-text available
Large Language Models (LLMs) have gained attention for addressing coding problems, but their effectiveness in fixing code maintainability remains unclear. This study evaluates LLMs capability to resolve 127 maintainability issues from 10 GitHub repositories. We use zero-shot prompting for Copilot Chat and Llama 3.1, and few-shot prompting with Llama only. The LLM-generated solutions are assessed for compilation errors, test failures, and new maintainability problems. Llama with few-shot prompting successfully fixed 44.9% of the methods, while Copilot Chat and Llama zero-shot fixed 32.29% and 30%, respectively. However, most solutions introduced errors or new maintainability issues. We also conducted a human study with 45 participants to evaluate the readability of 51 LLM-generated solutions. The human study showed that 68.63% of participants observed improved readability. Overall, while LLMs show potential for fixing maintainability issues, their introduction of errors highlights their current limitations.
... • Refused Bequest (RB) happens when a subclass inherits data and methods from a super class, but it does not need them. Push down method and replace inheritance with delegation are considered as refactoring solutions for that smell [23]. • Middle Man (MM) occurs when the main objective of a class is delegation through calling another class's methods, hence increasing the code's complexity. ...
... • Middle Man (MM) occurs when the main objective of a class is delegation through calling another class's methods, hence increasing the code's complexity. Deleting middle man, inline methods, and replacing delegation with inheritance are refactoring solutions for middle man [23]. • Inappropriate Intimacy (II) happens when two classes know too much about each other, thus increasing the coupling between classes. ...
... • Inappropriate Intimacy (II) happens when two classes know too much about each other, thus increasing the coupling between classes. Merging classes, extracting a super class for original classes are considered as refactoring solutions [23]. • Speculative Generality (SG) happens when unused code exists and developers leave that code as they might use it in the future. ...
Article
Full-text available
Code smells are design flaws that reduce software quality and maintainability. Machine learning classification models have been used for detecting different code smells. However, such studies targeted some code smells in depth, while leaving other smells under-explored, even so such smells have significant impact on the source code quality. Recent surveys have highlighted a group of code smells that are rarely studied by researchers. Furthermore, some machine learning classification models were evaluated, on a subset of the source code features while ignoring significant features during classification. This paper proposes a novel approach, called Smell-ML, for detecting five rarely studied code smells, namely: Middle Man (MM), Class Data Should Be Private (CDSBP), Inappropriate Intimacy (II), Refused Bequest (RB), and Speculative Generality (SG). The approach’s novelty stems from improving both data preparation and classification phases. During data preparation, Smell-ML relies on data balancing and an extended source code feature list to improve accuracy. For the classification phase, different classifiers were assessed, including traditional classifiers, ensemble classifiers, and multi-level classifiers. We evaluated Smell-ML on a dataset composed of 13 open source Java projects with 125 versions per project. The results show Smell-ML’s detection F1-score values surpassing those of previous studies with significant improvements across various code smells. The F1-score measure of 11 machine learning classifiers improved after using the extended features list. Data balancing and multi-level classification notably boosted accuracy.
... Prior studies highlight the effectiveness of web-based dyslexia support tools in fostering engagement and learning outcomes (Alkhadrawi, 2020). The application's architecture follows a modular approach, which is recommended for scalable and maintainable systems (Fowler, 2018). This design allows each module to be updated or improved independently, increasing adaptability and long-term usability (Martin, 2019). ...
... The TTS engine handles text efficiently and provides clear pronunciation so that users can pursue spoken language. [16][17][18][19] 3.1.1. Text-to-Image Module This function allows users to enter the word that the system will be converted to the corresponding image. ...
Article
Full-text available
Dyslexia is a widespread, lifelong neurobiological condition that impacts an individual's ability to read, write, and spell. People with dyslexia often face difficulties in traditional educational settings, as conventional teaching methods do not align with their unique learning styles. To bridge this gap, we have developed an innovative web-based application designed to provide comprehensive reading, comprehension, and accessibility support. Our platform incorporates multiple assistive features tailored to the needs of dyslexic individuals. One of the core functionalities is a text-to-speech feature that reads aloud user-provided text, helping users process written content more effectively. Additionally, our PDF-to-speech converter enhances document accessibility by converting digital text into speech, making it easier for individuals to engage with educational or professional materials. For visual learners, we offer an image generation function that transforms text into meaningful visuals, aiding comprehension through pictorial representation. The application also includes a symbol recognition tool, which identifies and interprets common symbols such as school zones, road signs, and warning labels, ensuring better understanding of critical visual information. To further enhance readability, our dyslexia-friendly text converter simplifies complex words and replaces them with more accessible alternatives, making reading a smoother experience. Leveraging natural language processing (NLP), optical character recognition (OCR), and responsive web design, this platform is dedicated to empowering individuals with dyslexia by fostering improved reading, comprehension, and overall accessibility.
... Once these bad smells have been detected, it is equally important to consider how they can be effectively addressed. This leads us to the concept of refactoring, defined as "the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure" [14]. Most existing model repair approaches focus on three main types: (1) Structural repair: Addresses inconsistencies in the model's structure, such as incorrect relationships or missing elements, (2) Behavioral repair: Fixes issues related to the model's behavior, ensuring it aligns with the expected functional requirements, and (3) Semantic repair: Corrects deviations from the intended meaning or purpose of the model elements, aligning them with domain-specific knowledge [8,30]. ...
... The correctness of the refactoring is evaluated through a survey. 14 ...
Article
Full-text available
Goal-oriented requirements engineering (GORE) facilitates effective communication and collaboration between stakeholders. Using goal models, GORE provides a structured approach to elicit, analyze, and manage requirements from the perspective of stakeholders’ goals and intentions. However, goal models are prone to several poor practices, called bad smells, which can obstruct effective communication between stakeholders. As a result, there might be misinterpretations and inconsistencies in the requirements. Goal models are particularly prone to linguistic bad smells, encompassing unclear or ambiguous goal statements, conflicting or contradictory requirements, and occurrences of misspellings. It is therefore imperative that linguistic bad smells are identified and addressed in goal models to ensure their quality and accuracy. In this paper, we build upon our previous research by enhancing the catalog of 17 goal-oriented requirements language (GRL) linguistic bad smells. We refine the detection techniques using a combination of NLP-based and LLM-based techniques. These enhancements significantly improved the tool’s detection capabilities compared to our previous work. Furthermore, we offer automated refactoring solutions for 9 of these bad smells through GPT prompts. The remaining four identified bad smells are left to the user’s discretion for refactoring, due to their subjective nature. The detection and refactoring processes are implemented in a tool, tailored to the Textual GRL (TGRL) language. We evaluated the bad smells refactoring approach and tool by administering a questionnaire to 13 participants, who assessed the correctness of the refactoring of 71 linguistic bad smells found in four (4) TGRL models. Participants perceived the refactored sentences as highly correct across the different types of linguistic bad smells.
... Code smell dapat menyebabkan berbagai masalah dalam perangkat lunak, termasuk kesulitan dalam memahami kode, yang dapat menghambat kolaborasi tim, memperlambat proses pengembangan, dan meningkatkan risiko kesalahan [1]. Code smell dapat berupa pola yang tidak efisien, pengulangan kode yang tidak perlu, atau struktur kode yang kompleks dan sulit dipahami. ...
... Hal ini menyebabkan berbagai masalah, termasuk kesulitan dalam memahami kode, yang dapat menghambat kolaborasi tim, memperlambat proses pengembangan, dan meningkatkan risiko kesalahan. Oleh karena itu, penting untuk mendeteksi dan mengatasi code smell sejak awal dalam siklus pengembangan perangkat lunak [1]. Salah satu pendekatan yang dapat digunakan adalah penggunaan software metrics pada struktur Abstract Syntax Tree (AST) dari kode program. ...
Article
Full-text available
Code smell adalah kelemahan desain atau praktik buruk dalam kode program yang merugikan proyek pengembangan perangkat lunak. Hal ini dapat menyebabkan penurunan kualitas kode, meningkatkan kompleksitas proyek, hambatan dalam pemeliharaan kode, dan risiko kesalahan. Penelitian ini bertujuan untuk mengembangkan aplikasi deteksi code smell pada bahasa pemrograman Python. Metode yang digunakan adalah software metrics yang diimplementasikan pada struktur Abstract Syntax Tree (AST). Sistem ini mengonversi program Python menjadi AST, mengembangkan logika deteksi code smell dengan pendekatan software metrics, dan diuji menggunakan blackbox testing. Hasil penelitian menunjukkan bahwa sistem mampu mendeteksi jenis code smell seperti long method, lazy class, feature envy, dan kompleksitas kode. Pengujian blackbox membuktikan bahwa fungsionalitas sistem berjalan dengan baik dan sesuai harapan.
... Typical cases include type, function, or property removal [52]. In addition, API refactoring, a term introduced by Fowler and Beck, is a technique that improves the design of a code base by implementing a series of code transformations while preserving the observable program behaviour [29]. Despite the behaviour-preserving property, refactorings are not necessarily backwards compatible. ...
Article
Full-text available
API use has become prevalent in current times and its purposeful management is of foremost importance to avoid undesired effects on client code. A plethora of studies focusing on the isolated investigation of different types of harmful API uses (e.g., API misuse and security vulnerabilities) have been conducted before. However, a comprehensive overview of possible harmful API uses is required to help both library and client developers on the management of implemented and used APIs. Moreover, repairing such harmful uses remains a significant challenge in software development, yet recent studies indicate its widespread prevalence despite efforts to develop automatic repair techniques. This paper presents the first systematic review of 35 peer-reviewed studies on harmful API uses and their corresponding (semi-)automatic repair techniques. We categorise common types of harmful API uses in terms of the origin and root cause of events triggering the undesired use and the type of harm incurred on the client. We further analyse their repair approaches, assessing their strengths and weaknesses. Additionally, we investigate the evaluation processes and metrics employed in the outlined repair techniques. Our study contributes to advancing the state-of-the-art in harmful API repair research, by addressing open research problems and paving the way to improve and develop new repair techniques and tool capabilities.
... Related to our work is the book of Fowler on domain-specific languages [15]. Fowler defines refactoring as changing the internals of source code without changing its usage while preserving its behavior. ...
Conference Paper
Full-text available
Software maintenance requires a significant amount of time and effort. One of the tasks involved in this process is to replace outdated frameworks with newer, state-of-the-art alternatives. For such a replacement, static and dynamic techniques can be utilized. Using a static technique, such as metaprogramming, the source code can be inspected in a whitebox manner to gain insight in its functionality. By means of a dynamic technique, such as Active Model Learning (AML), the behavior of the code can be extracted in a blackbox manner. Since both techniques have their advantages and disadvantages, we propose a novel combination of static and dynamic analysis techniques that complements each other. Model checking is used to check the equivalence of the models obtained by the different techniques. The approach has been applied at Philips to upgrade a legacy framework written in C++ to describe Finite State Machines (FSMs). This framework has been replaced by a more modern tool called Dezyne. By means of our new approach, we were able to semi-automatically replace 14 FSMs with high confidence in the preserved behavior.
... The necessity and benefits of testing in the software development process are acknowledged by both practitioners and researchers. Testing techniques often drive development efforts (such as in Test-Driven Development (TDD) [7]), they are a prerequisite for the refactoring process [18], and a critical part of DevOps [15]. However, testing is not often employed within the research process and is even less often properly reported. ...
Preprint
Full-text available
The paper has been accepted for publication in Computer Science journal: http://journals.agh.edu.pl/csci Software engineering (SE) research often involves creating software, either as a primary research output (e.g. in design science research) or as a supporting tool for the traditional research process. Ensuring software quality is essential, as it influences both the research process and the credibility of findings. Integrating software testing methods into SE research can streamline efforts by addressing the goals of both research and development processes simultaneously. This paper highlights the advantages of incorporating software testing in SE research, particularly for research evaluation. Through qualitative analysis of software artifacts and insights from two PhD projects, we present ten lessons learned. These experiences demonstrate that, when effectively integrated, software testing offers significant benefits for both the research process and its results.
... Code smells, initially conceptualized by Kent Beck and popularized by Martin Fowler [21], have evolved beyond traditional software development concerns to encompass machine learning-specific challenges. ...
Preprint
Full-text available
Machine learning (ML) codebases face unprecedented challenges in maintaining code quality and sustainability as their complexity grows exponentially. While traditional code smell detection tools exist, they fail to address ML-specific issues that can significantly impact model performance, reproducibility, and maintainability. This paper introduces MLScent, a novel static analysis tool that leverages sophisticated Abstract Syntax Tree (AST) analysis to detect anti-patterns and code smells specific to ML projects. MLScent implements 76 distinct detectors across major ML frameworks including TensorFlow (13 detectors), PyTorch (12 detectors), Scikit-learn (9 detectors), and Hugging Face (10 detectors), along with data science libraries like Pandas and NumPy (8 detectors each). The tool's architecture also integrates general ML smell detection (16 detectors), and specialized analysis for data preprocessing and model training workflows. Our evaluation demonstrates MLScent's effectiveness through both quantitative classification metrics and qualitative assessment via user studies feedback with ML practitioners. Results show high accuracy in identifying framework-specific anti-patterns, data handling issues, and general ML code smells across real-world projects.
... Refactoring is the process of improving the design of existing code without changing its behavior [11]. Developers perform refactorings frequently [26], thus attaining benefits, such as reduced code duplication as well as improved readability, maintainability and testability [17,34]. ...
Preprint
Commits often involve refactorings -- behavior-preserving code modifications aiming at software design improvements. Refactoring operations pose a challenge to code reviewers, as distinguishing them from behavior-altering changes is often not a trivial task. Accordingly, research on automated refactoring detection tools has flourished over the past two decades, however, the majority of suggested tools is limited to Java projects. In this work, we present RefactoringMiner++, a refactoring detection tool based on the current state of the art: RefactoringMiner 3. While the latter focuses exclusively on Java, our tool is -- to the best of our knowledge -- the first publicly available refactoring detection tool for C++ projects. RefactoringMiner's thorough evaluation provides confidence in our tool's performance. In addition, we test RefactoringMiner++ on a small seeded dataset and demonstrate the tool's capability in a short demo involving both refactorings and behavior-altering changes. A screencast demonstrating our tool can be found at https://cloud.tugraz.at/index.php/s/oCzmjfFSaBxNZoe.
... It can yield significant speedups. As such, our studies and attribute suggestions are relevant for many codes beyond SPH, but they notably provide a protoype how future C/C++ annotations can change the character of performance optimisation, where more code technical refeactorings [6] are deployed into the translation chain freeing human time for analysis and high-level tasks. This is in the spirit of C++ annotations. ...
Preprint
The C++ programming language provides classes and structs as fundamental modeling entities. Consequently, C++ code tends to favour array-of-structs (AoS) for encoding data sequences, even though structure-of-arrays (SoA) yields better performance for some calculations. We propose a C++ language extension based on attributes that allows developers to guide the compiler in selecting memory arrangements, i.e.~to select the optimal choice between AoS and SoA dynamically depending on both the execution context and algorithm step. The compiler can then automatically convert data into the preferred format prior to the calculations and convert results back afterward. The compiler handles all the complexity of determining which data to convert and how to manage data transformations. Our implementation realises the compiler-extension for the new annotations in Clang and demonstrates their effectiveness through a smoothed particle hydrodynamics (SPH) code, which we evaluate on an Intel CPU, an ARM CPU, and a Grace-Hopper GPU. While the separation of concerns between data structure and operators is elegant and provides performance improvements, the new annotations do not eliminate the need for performance engineering. Instead, they challenge conventional performance wisdom and necessitate rethinking approaches how to write efficient implementations.
... Additionally, the developers desire to spend less time on debugging code, and instead want to invest more time on preventative measures like 'PR/Code Reviews', 'Documentation', and 'Mentoring/Onboarding' that could potentially alleviate pains fixing issues due to poor documentation or code. This indicates a potential desire for improving code quality and reducing long-term maintenance costs and focusing and reducing technical debt [7]. Similarly, the preference to spend more time on learning new technologies, and engaging in technical presentations suggests that developers consider continuous learning and knowledge-sharing an essential part of their workweek. ...
Preprint
Full-text available
Software developers balance a variety of different tasks in a workweek, yet the allocation of time often differs from what they consider ideal. Identifying and addressing these deviations is crucial for organizations aiming to enhance the productivity and well-being of the developers. In this paper, we present the findings from a survey of 484 software developers at Microsoft, which aims to identify the key differences between how developers would like to allocate their time during an ideal workweek versus their actual workweek. Our analysis reveals significant deviations between a developer's ideal workweek and their actual workweek, with a clear correlation: as the gap between these two workweeks widens, we observe a decline in both productivity and satisfaction. By examining these deviations in specific activities, we assess their direct impact on the developers' satisfaction and productivity. Additionally, given the growing adoption of AI tools in software engineering, both in the industry and academia, we identify specific tasks and areas that could be strong candidates for automation. In this paper, we make three key contributions: 1) We quantify the impact of workweek deviations on developer productivity and satisfaction 2) We identify individual tasks that disproportionately affect satisfaction and productivity 3) We provide actual data-driven insights to guide future AI automation efforts in software engineering, aligning them with the developers' requirements and ideal workflows for maximizing their productivity and satisfaction.
... Las decisiones metodológicas estuvieron respaldadas por estudios y bibliografía académica que destacan la necesidad de modularidad y la segmentación de responsabilidades en sistemas complejos (Barde, 2023), validando además el uso de estándares de codificación y herramientas que impactan positivamente en la calidad y escalabilidad del software (Fowler, 2018). Esta fundamentación asegura que cada elección técnica realizada respondió a criterios objetivos y no al azar, maximizando el potencial de desarrollo y mitigando riesgos asociados al acoplamiento y la complejidad. ...
Article
Full-text available
Para mejorar la cohesión y reducir el acoplamiento de los componentes en sistemas web complejos, se diseñó y desarrolló una arquitectura modular en Laravel que busca superar las limitaciones del enfoque tradicional basado en el patrón MVC. Para ello, se adoptaron principios de diseño que incluían la creación de componentes especializados, como acciones para la lógica de negocio, consultas separadas en clases específicas, y la utilización de objetos para la transferencia y validación de datos. Se implementó una estructura de directorios modular que facilitó el mantenimiento y escalabilidad del código, apoyándose en el estándar PSR-4 para la carga automática de clases. La metodología incluyó la adopción de estándares de codificación, como PSR-12, y el uso de herramientas de análisis estático y control de versiones para garantizar la calidad y coherencia del proyecto. En términos de resultados, se logró una arquitectura más flexible y comprensible, con un código más limpio y fácil de modificar. Como conclusión, la adopción de una arquitectura modular en Laravel contribuyó significativamente a la mejora de la calidad del código y la escalabilidad de los sistemas desarrollados. Se recomienda continuar con la implementación de buenas prácticas de codificación, así como realizar capacitaciones periódicas para mantener la coherencia en el desarrollo y facilitar la integración de nuevas funcionalidades.
... Horizontal scaling methods, which distribute tasks among several servers or virtual machines, have been shown to effectively reduce resource bottlenecks in cloud settings. As cloud-based and distributed systems become more critical, the development of new caching solutions, query tuning methods, and real-time performance monitoring tools has occurred (Fowler, 2018). ...
Article
Full-text available
System performance bottlenecks are a critical issue in the design, operation, and scalability of software and hardware systems. These bottlenecks arise when a specific component or process limits the overall performance of a system, thereby affecting user experience, efficiency, and operational costs. Identifying, diagnosing, and mitigating performance bottlenecks are essential tasks for systems administrators, developers, and IT professionals. This paper examines the various types of system performance bottlenecks, strategies for identifying them, common causes, and the latest approaches for mitigation. Through a detailed review of current practices, tools, and case studies, this paper aims to provide a comprehensive framework for effectively managing system performance bottlenecks in diverse environments.
... Code smells are recurring patterns or indicators found within source code that indicate potential design flaws or sub optimal coding practices in the field of software development [1]. They are manifestations of deeper structural issues that, if left unaddressed, can jeopardize a software system's maintainability, extensibility, and overall quality. ...
Article
Full-text available
Detecting code smells early in the development process is crucial for ensuring system longevity and performance. This paper proposes a semisupervised learning approach to detect code smells using both widely used labeled and manually collected unlabeled data. The approach addresses the limited availability of labeled data and improves generalizability. The effectiveness of the proposed method is demonstrated through a clustering-based approach for class-level and a label propagation-based approach for method-level code smell detection. The data collection process involves gathering a diverse set of code samples from peers, ensuring a diverse range of software projects and coding styles. A comparative analysis is performed against traditional supervised algorithms to evaluate the performance of the semisupervised approach. Compared with traditional supervised learning approaches and past studies, the recommended semisupervised learning method resulted in accuracy increases of up to 8.4% for class-level and method-level code smells. The proposed approach has the potential to enhance code quality, maintainability, and software development processes, overcoming the challenges posed by the scarcity of labeled data.
... The various classifications have prompted discussion regarding potential correlations between smells and their impact on various parts of the analyzed code. According to the literature review, code smells do not occur in the code alone [10,11] and are accompanied by other code smells [12][13][14][15][16]. Inter smell relations happen when two or more code smells are connected or dependent on one another [17]. To identify the several relations among code smells, initially, Pietrzak and Walter [18] investigated 830 classes in the Apache Tomcat 5.5.4 project and discovered seven distinct smell relations (Plain support, Rejection, Aggregate support, Inclusion, mutual support, Transitive support, and Common refactoring) called inter-smell relations. ...
Article
Full-text available
Code smells have an adverse impact on the quality of source code. Martin Fowler initially identified a set of 22 code smells. Since the term "code smell", there have been multiple attempts to understand them through their detection and to discover relationships between them using correlation and other approaches. The literature demonstrates multiple studies in which code smells have been found to exhibit relationships with other code smells. Nevertheless, the temporary field is one of the 22 code smells that has not been analysed to determine its relationship with other code smells. It is important to consider temporary field, as it has a detrimental impact on the maintainability of the source code. The study has conducted a review of the 7 smell relations identified by Pietrzak and Walter and proposed 3 new smell relations. It has evaluated these smell relations between temporary field and 17 design smells in 10 popular open-source Java applications that are widely cited in the literature and publicly accessible. The study has also done a correlation analysis of temporary field with 17 design smells. All code smells in the study were detected using an open-source tool called "TFfinder". The study reveals 18 significant smell relations between temporary field and design smells. Utilization of smell relations can facilitate an in-depth comprehension of code smells and aid in the prioritization of code smells for refactoring purposes. In addition, it can assist a developer in identifying classes that need more maintenance effort and impact the maintainability of the code.
... The foundational studies on data clumps and code smells have significantly shaped the understanding of recurring issues in software engineering. The concept of data clumps, introduced by Fowler [7], is defined as recurring groups of variables that often appear together, indicating a potential code smell. However, this definition lacks precise criteria for identification, leaving its interpretation ambiguous. ...
Article
Full-text available
This paper explores a modular pipeline architecture that integrates ChatGPT, a Large Language Model (LLM), to automate the detection and refactoring of data clumps—a prevalent type of code smell that complicates software maintainability. Data clumps refer to clusters of code that are often repeated and should ideally be refactored to improve code quality. The pipeline leverages ChatGPT’s capabilities to understand context and generate structured outputs, making it suitable for addressing complex software refactoring tasks. Through systematic experimentation, our study not only addresses the research questions outlined but also demonstrates that the pipeline can accurately identify data clumps, particularly excelling in cases that require semantic understanding—where localized clumps are embedded within larger codebases. While the solution significantly enhances the refactoring workflow, facilitating the management of distributed clumps across multiple files, it also presents challenges such as occasional compiler errors and high computational costs. Feedback from developers underscores the usefulness of LLMs in software development but also highlights the essential role of human oversight to correct inaccuracies. These findings demonstrate the pipeline’s potential to enhance software maintainability, offering a scalable and efficient solution for addressing code smells in real-world projects, and contributing to the broader goal of enhancing software maintainability in large-scale projects.
... Code smells are indicators of potential problems in software design and implementation [1]. While not bugs themselves, these smells often signal design weaknesses that can impede development and increase the risk of future failures. ...
Preprint
Full-text available
The growth of Python adoption across diverse domains has led to increasingly complex codebases, presenting challenges in maintaining code quality. While numerous tools attempt to address these challenges, they often fall short in providing comprehensive analysis capabilities or fail to consider Python-specific contexts. PyExamine addresses these critical limitations through an approach to code smell detection that operates across multiple levels of analysis. PyExamine architecture enables detailed examination of code quality through three distinct but interconnected layers: architectural patterns, structural relationships, and code-level implementations. This approach allows for the detection and analysis of 49 distinct metrics, providing developers with an understanding of their codebase's health. The metrics span across all levels of code organization, from high-level architectural concerns to granular implementation details. Through evaluation on 7 diverse projects, PyExamine achieved detection accuracy rates: 91.4\% for code-level smells, 89.3\% for structural smells, and 80.6\% for architectural smells. These results were further validated through extensive user feedback and expert evaluations, confirming PyExamine's capability to identify potential issues across all levels of code organization with high recall accuracy. In additional to this, we have also used PyExamine to analysis the prevalence of different type of smells, across 183 diverse Python projects ranging from small utilities to large-scale enterprise applications. PyExamine's distinctive combination of comprehensive analysis, Python-specific detection, and high customizability makes it a valuable asset for both individual developers and large teams seeking to enhance their code quality practices.
... This increased effort leads to higher maintenance costs over the software's lifecycle. When this happens, the portion of the code of low-quality is called Technical Debt (TD) [1], [2]. Low-quality code can also increase the risk of vulnerabilities being introduced and make identifying and fixing these flaws significantly more challenging. ...
Preprint
Full-text available
Multi-task learning is a paradigm that leverages information from related tasks to improve the performance of machine learning. Self-Admitted Technical Debt (SATD) are comments in the code that indicate not-quite-right code introduced for short-term needs, i.e., technical debt (TD). Previous research has provided evidence of a possible relationship between SATD and the existence of vulnerabilities in the code. In this work, we investigate if multi-task learning could leverage the information shared between SATD and vulnerabilities to improve the automatic detection of these issues. To this aim, we implemented VulSATD, a deep learner that detects vulnerable and SATD code based on CodeBERT, a pre-trained transformers model. We evaluated VulSATD on MADE-WIC, a fused dataset of functions annotated for TD (through SATD) and vulnerability. We compared the results using single and multi-task approaches, obtaining no significant differences even after employing a weighted loss. Our findings indicate the need for further investigation into the relationship between these two aspects of low-quality code. Specifically, it is possible that only a subset of technical debt is directly associated with security concerns. Therefore, the relationship between different types of technical debt and software vulnerabilities deserves future exploration and a deeper understanding.
... Code smells may indicate deeper issues within software systems (Fernandes et al., 2017;Oizumi et al., 2016;Fowler, 2018). They represent a pattern or structure in the code that suggests the presence of design flaws or poor coding practices. ...
Article
Full-text available
Regression testing is a selective retesting of a system or component to verify that modifications have not induced unintended effects and that the system or component maintains compliance with the specified requirements. However, it can be time-consuming and resource-intensive, especially for large systems. Regression testing selection techniques can help address this issue by selecting a subset of test cases to run. The Change Based technique selects a subset of the existing test cases and executes modified classes. Besides effectively reducing the test suite, this technique may reduce the capability of revealing faults. From this perspective, code smells are known to identify poor design and software quality issues. Some works have explored the association between smells and faults with some promising results. Inspired by these results, we propose combining code change and smell to select regression tests and present eight techniques. Additionally, we developed the Regression Testing Selection Tool (RTST) to automate the selection process using these techniques. We empirically evaluated the approach in Defects4J projects by comparing the techniques’ effectiveness with the Change Based and Class Firewall as a baseline. The results show that the Change and Smell Intersection Based technique achieves the highest reduction rate in the test suite size but with less class coverage. On the other hand, Change and Smell Firewall technique achieves the lowest test suite size reduction with the highest fault detection effectiveness test cases, suggesting the combination of smells and changed classes can potentially find more bugs. The Smell Based technique provides a comparable class coverage to the code change and smell approach. Our findings indicate opportunities for improving the efficiency and effectiveness of regression testing and highlight that software quality should be a concern throughout the software evolution.
... In total, 150 variants are generated for the five seeds in Table VI. We set the JDK version in Table V to 22.0.1, and the corresponding refactoring definition is obtained from [49]. We manually tested the input program variants generated by our tool in the latest version of ECLIPSE (2024-09) and INTELLIJ IDEA (2024.2.4). ...
Preprint
Full-text available
Refactoring is the process of restructuring existing code without changing its external behavior while improving its internal structure. Refactoring engines are integral components of modern Integrated Development Environments (IDEs) and can automate or semi-automate this process to enhance code readability, reduce complexity, and improve the maintainability of software products. Similar to traditional software systems such as compilers, refactoring engines may also contain bugs that can lead to unexpected behaviors. In this paper, we propose a novel approach called RETESTER, a LLM-based framework for automated refactoring engine testing. Specifically, by using input program structure templates extracted from historical bug reports and input program characteristics that are error-prone, we design chain-of-thought (CoT) prompts to perform refactoring-preserving transformations. The generated variants are then tested on the latest version of refactoring engines using differential testing. We evaluate RETESTER on two most popular modern refactoring engines (i.e., ECLIPSE, and INTELLIJ IDEA). It successfully revealed 18 new bugs in the latest version of those refactoring engines. By the time we submit our paper, seven of them were confirmed by their developers, and three were fixed.
... Code smells, a concept popularized by Martin Fowler in his seminal book "Refactoring: Improving the Design of Existing Code" [10] which refer to any characteristic in the source code that suggests deeper issues, even though they do not prevent the program from functioning. These smells signal the need for refactoring to improve maintainability, readability, and quality. ...
Preprint
Full-text available
Practitioners use Infrastructure as Code (IaC) scripts to efficiently configure IT infrastructures through machine-readable definition files. However, during the development of these scripts, some code patterns or deployment choices may lead to sustainability issues like inefficient resource utilization or redundant provisioning for example. We call this type of patterns sustainability smells. These inefficiencies pose significant environmental and financial challenges, given the growing scale of cloud computing. This research focuses on Terraform, a widely adopted IaC tool. Our study involves defining seven sustainability smells and validating them through a survey with 19 IaC practitioners. We utilized a dataset of 28,327 Terraform scripts from 395 open-source repositories. We performed a detailed qualitative analysis of a randomly sampled 1,860 Terraform scripts from the original dataset to identify code patterns that correspond to the sustainability smells and used the other 26,467 Terraform scripts to study the prevalence of the defined sustainability smells. Our results indicate varying prevalence rates of these smells across the dataset. The most prevalent smell is Monolithic Infrastructure, which appears in 9.67\% of the scripts. Additionally, our findings highlight the complexity of conducting root cause analysis for sustainability issues, as these smells often arise from a confluence of script structures, configuration choices, and deployment contexts.
... • Localization: We examine how effectively Passerine pinpoints the correct file(s) to modify by measuring the filesystem distance between the files modified by Passerine and the actual location of the ground truth bug fix. • Trajectory Smells: We analyze the presence of potentially suboptimal or unusual patterns within the trajectories, which we refer to as "trajectory smells," inspired by code smells [24]. We consider four specific "smells": (i) NO_TEST_SMELL: trajectory does not include any test execution commands (i.e., agent never confirmed the bug nor tested the patch); (ii) NO_OP_CAT_SMELL: trajectory contains instances where a file is re-read without any intervening modifications to that file (i.e., agent made an unnecessary read); (iii) CONSECUTIVE_SEARCH: trajectory contains at least three code search commands in a sequence (i.e., agent is repeatedly searching); (iv) CONSECUTIVE_EDITS: trajectory contains at least 3 edits to the same file in a sequence (i.e., agent is repeatedly editing the same file); ...
Preprint
Full-text available
Agent-based program repair offers to automatically resolve complex bugs end-to-end by combining the planning, tool use, and code generation abilities of modern LLMs. Recent work has explored the use of agent-based repair approaches on the popular open-source SWE-Bench, a collection of bugs from highly-rated GitHub Python projects. In addition, various agentic approaches such as SWE-Agent have been proposed to solve bugs in this benchmark. This paper explores the viability of using an agentic approach to address bugs in an enterprise context. To investigate this, we curate an evaluation set of 178 bugs drawn from Google's issue tracking system. This dataset spans both human-reported (78) and machine-reported bugs (100). To establish a repair performance baseline on this benchmark, we implement Passerine, an agent similar in spirit to SWE-Agent that can work within Google's development environment. We show that with 20 trajectory samples and Gemini 1.5 Pro, Passerine can produce a patch that passes bug tests (i.e., plausible) for 73% of machine-reported and 25.6% of human-reported bugs in our evaluation set. After manual examination, we found that 43% of machine-reported bugs and 17.9% of human-reported bugs have at least one patch that is semantically equivalent to the ground-truth patch. These results establish a baseline on an industrially relevant benchmark, which as we show, contains bugs drawn from a different distribution -- in terms of language diversity, size, and spread of changes, etc. -- compared to those in the popular SWE-Bench dataset.
... Our research can help developers better understand common code smells and their frequencies in Java and Python languages [3]. We also studied the modules that may easily cause code smells over time, which can assist developers in refactoring and maintaining resources. ...
... • Use smell information [134] to support MDP as suggested by previous works [198,290,369,291,369]. ...
Preprint
Full-text available
Context. Developing secure and reliable software remains a key challenge in software engineering (SE). The ever-evolving technological landscape offers both opportunities and threats, creating a dynamic space where chaos and order compete. Secure software engineering (SSE) must continuously address vulnerabilities that endanger software systems and carry broader socio-economic risks, such as compromising critical national infrastructure and causing significant financial losses. Researchers and practitioners have explored methodologies like Static Application Security Testing Tools (SASTTs) and artificial intelligence (AI) approaches, including machine learning (ML) and large language models (LLMs), to detect and mitigate these vulnerabilities. Each method has unique strengths and limitations. Aim. This thesis seeks to bring order to the chaos in SSE by addressing domain-specific differences that impact AI accuracy. Methodology. The research employs a mix of empirical strategies, such as evaluating effort-aware metrics, analyzing SASTTs, conducting method-level analysis, and leveraging evidence-based techniques like systematic dataset reviews. These approaches help characterize vulnerability prediction datasets. Results. Key findings include limitations in static analysis tools for identifying vulnerabilities, gaps in SASTT coverage of vulnerability types, weak relationships among vulnerability severity scores, improved defect prediction accuracy using just-in-time modeling, and threats posed by untouched methods. Conclusions. This thesis highlights the complexity of SSE and the importance of contextual knowledge in improving AI-driven vulnerability and defect prediction. The comprehensive analysis advances effective prediction models, benefiting both researchers and practitioners.
Chapter
Full-text available
Article
Full-text available
Advanced cloud computing paradigms have significantly revolutionized the enterprise application landscape, providing organizations with the tools and flexibility to innovate and scale rapidly. By leveraging technologies such as serverless computing, containerization, edge computing, and multi-cloud strategies, enterprises can build applications that are not only scalable and agile but also cost-efficient and secure. These paradigms eliminate the need for extensive infrastructure management, allowing organizations to focus on core business objectives and accelerate time-to-market for new applications. Serverless computing, for example, offers a highly efficient, event-driven approach where applications scale automatically based on demand, reducing costs associated with idle resources. Containerization, driven by platforms like Docker and Kubernetes, enables application portability and modularity, facilitating seamless deployments across diverse environments. Multi-cloud strategies empower enterprises to harness the strengths of various cloud providers, avoiding vendor lock-in while optimizing costs and performance. Similarly, edge computing processes data 134 closer to its source, reducing latency and bandwidth usage crucial for real-time analytics and IoT applications. This paper explores the impact of these advanced paradigms on enterprise scalability, agility, and operational efficiency, supported by benchmarking results and real-world case studies from industries such as finance, healthcare, and retail. Challenges such as interoperability, compliance, and integration complexities are discussed, along with emerging trends like AI-driven cloud automation and sustainability-focused infrastructure. By adopting these paradigms, organizations can future-proof their IT ecosystems, enabling them to adapt to evolving market demands and maintain a competitive edge.
Article
With the increasing reliance on machine learning (ML) across diverse disciplines, ML code has been subject to a number of issues that impact its quality, such as lack of documentation, algorithmic biases, overfitting, lack of reproducibility, inadequate data preprocessing, and potential for data leakage, all of which can significantly affect the performance and reliability of ML models. Data leakage can affect the quality of ML models where sensitive information from the test set inadvertently influences the training process, leading to inflated performance metrics that do not generalize well to new, unseen data. Data leakage can occur at either the dataset-level ( i.e ., during dataset construction) or at the code-level. Existing studies introduced methods to detect code-level data leakage using manual and code analysis approaches. However, automated tools with advanced ML techniques are increasingly recognized as essential for efficiently identifying quality issues in large and complex codebases, enhancing the overall effectiveness of code review processes. In this article, we aim to explore ML-based approaches for limited annotated datasets to detect code-level data leakage in ML code. We proposed three approaches, namely, transfer learning, active learning, and low-shot prompting. Additionally, we introduced an automated approached to handle the imbalance issues of code data. Our results show that active learning outperformed the other approaches with an F-2 score of 0.72 and reduced the number of needed annotated samples from 1,523 to 698. We conclude that existing ML-based approaches can effectively mitigate the challenges associated with limited data availability.
Chapter
During analysis, we constructed a conceptual model and a behavioral model for the proposed system. In the design step, we use the class structure defined in the conceptual model to design a system that behaves in the manner specified by the behavioral model. The main UML tool that we employ here is the sequence diagram. In a sequence diagram, the designer details how the behavior specified in the model will be realized. This process requires the system’s actions to be broken down into specific tasks, and the responsibility for these tasks to be assigned to the various players (classes and objects) in the system. In the course of assigning these responsibilities, we determine the public methods of each class and also describe the function performed by each method. Since the stage after design is implementation, which is coding, testing, and debugging, it is imperative that we have a full understanding of how the required functionality will be realized through code.
Article
Extracting local variable refactoring is frequently employed to replace one or more occurrences of a complex expression with simple accesses to a newly introduced variable. To facilitate the refactoring, most IDEs can automate the extract local variable refactorings when the to-be-extracted expressions are selected by developers. However, refactoring tools usually replace all expressions that are lexically identical to the selected one without a comprehensive analysis of the safety of the refactoring. The automatically conducted refactorings may lead to serious software defects. Besides that, existing refactoring tools rely heavily on software developers to spot to-be-extracted expressions although it is often challenging for inexperienced developers and maintainers to make the selection. To this end, in this paper, we propose an automated approach, called ValExtractor+ , to recommending extract local variable refactoring opportunities and to automatically and safely conduct the refactorings. ValExtractor+ is composed of two parts, i.e., solutionAdvisor and opportunityAdvisor. Given a to-be-extracted expression, solutionAdvisor leverages lightweight static source code analysis to validate potential side effects of the expression, and to identify expressions that could be extracted together with the selected expression as a single variable without changing the semantics of the program or introducing any new exceptions. The static code analysis significantly improves the safety in automated extraction of local variables. To free programmers from manually selecting to-be-extracted expressions, opportunityAdvisor leverages solutionAdvisor to automatically retrieve all expressions that could be extracted safely as well as their refactoring solutions. It then leverages a learning-based classifier to predict which of the retrieved expressions should be extracted. Evaluations on open-source applications suggest that solutionAdvisor successfully avoided all defects (more than two hundred) caused by extracting local variable refactorings conducted by Eclipse (243 defects) or IntelliJ IDEA (263 defects). Additionally, opportunityAdvisor was able to effectively recommend expressions for extraction, achieving 307 true positives (TP) and 21,121 true negatives (TN). Four pull requests from our work (PR IDs: 66 , 333 , 439 , and 360 ) were successfully merged into the Eclipse community repository, showcasing the practical impact and robustness of our approach as recognized by the wider developer community.
Article
Waris merupakan aturan-aturan tentang memindahkan harta milik seseorang yang sudah meninggal dunia kepada penerimanya atau ahli warisnya. Sering kali waris ini menjadi permasalahan diberbagai tempat khususnya di Indonesia. Waris ini bisa menimbulkan pertengkaran dalam sebuah keluarga ketika pembagian warisan dari orang tuanya ataupun dari yang mewarisi harta tersebut. Konflik mengenai waris ini merupakan salah satu kurangnya informasi atau teknologi mengenai pembagian warisan. Dalam penelitian ini memberikan solusi tentang permasalahan diatas dalam bentuk sistem informasi yang dapat dijalankan dengan online berbasis web. Metode yang digunakan dalam perancangan dan pengembangan sistem informasi ini menggunakan metode waterfall. Waterfall merupakan metode yang biasa digunakan dalam perancangan sistem perangkat lunak, metode waterfall sering disebut juga dengan metode hidup klasik atau metode air terjun. Metode dalam pengumpulan data nya menggunakan studi pustaka dan bahasa pemrograman yang digunakan adalah phyton. Hasil dari penelitian ini yaitu sebuah program sistem informasi pembagian ahli waris yang terdapat fitur kalkulator perhitungan ahli waris tersebut. Serta dengan adanya sistem informasi ini dapat mempermudah masyarakat dalam pembagian ahli waris.
Article
Full-text available
The insurance industry is characterized by complex workflows, regulatory compliance requirements, and frequent policy updates, all of which necessitate robust and scalable software systems. Automated testing is a vital component in ensuring that these systems continue to function as expected, even as they evolve over time. However, maintaining test scripts in long-term automation projects presents unique challenges, particularly when dealing with dynamic insurance platforms. This paper outlines best practices for maintaining test scripts in long-term automation projects within the insurance domain. Through practical examples and case studies, we explore strategies that ensure test reliability, minimize technical debt, and maintain relevance in the face of evolving insurance policies and regulatory requirements. Topics include the implementation of modular test scripts, proper version control mechanisms, and the adoption of industry-specific tools and frameworks. The goal is to provide a roadmap that allows insurance companies to sustain their automated testing efforts, thereby ensuring system quality and regulatory compliance over time.
Article
Full-text available
Behavior-Driven Development (BDD) is an evolution of Test-Driven Development (TDD) that addresses the communication gap between developers, QA engineers and business stakeholders by utilizing natural language to define requirements and tests. Such a mechanism paves the way for seamless integration of software functionality with business objectives, more specifically in industries like insurance where compliance must be carried out in alignment and precise logic handling should go hand-in-hand. The insurance sector's complexity due to policy handling, regulatory frameworks, and integration with legacy systems makes it a prime candidate for leveraging BDD. In this paper we examine how BDD is influencing the insurance domain, where it can be used in automation testing (or non-UI level tests), continuous integration and even assist to fulfil regulatory requirements. This paper provides an extensive review on how BDD tools such as Cucumber, SpecFlow and JBehave have been integrated in insurance domain testing for improving collaboration, accuracy of testing and compliance.
ResearchGate has not been able to resolve any references for this publication.