Chetan Arora’s research while affiliated with Monash University (Australia) and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (53)


On the Impact of Requirements Smells in Prompts: The Case of Automated Traceability
  • Preprint

January 2025

·

Alexander Korn

·

·

[...]

·

Chetan Arora

Large language models (LLMs) are increasingly used to generate software artifacts, such as source code, tests, and trace links. Requirements play a central role in shaping the input prompts that guide LLMs, as they are often used as part of the prompts to synthesize the artifacts. However, the impact of requirements formulation on LLM performance remains unclear. In this paper, we investigate the role of requirements smells-indicators of potential issues like ambiguity and inconsistency-when used in prompts for LLMs. We conducted experiments using two LLMs focusing on automated trace link generation between requirements and code. Our results show mixed outcomes: while requirements smells had a small but significant effect when predicting whether a requirement was implemented in a piece of code (i.e., a trace link exists), no significant effect was observed when tracing the requirements with the associated lines of code. These findings suggest that requirements smells can affect LLM performance in certain SE tasks but may not uniformly impact all tasks. We highlight the need for further research to understand these nuances and propose future work toward developing guidelines for mitigating the negative effects of requirements smells in AI-driven SE processes.




Fig. 4. Line plot visualisation of the outcomes against temperature
Fig. 5. Line plot visualisation of the outcomes against top p
Optimizing Large Language Model Hyperparameters for Code Generation
  • Preprint
  • File available

August 2024

·

27 Reads

Large Language Models (LLMs), such as GPT models, are increasingly used in software engineering for various tasks, such as code generation, requirements management, and debugging. While automating these tasks has garnered significant attention, a systematic study on the impact of varying hyperparameters on code generation outcomes remains unexplored. This study aims to assess LLMs' code generation performance by exhaustively exploring the impact of various hyperparameters. Hyperparameters for LLMs are adjustable settings that affect the model's behaviour and performance. Specifically, we investigated how changes to the hyperparameters: temperature, top probability (top_p), frequency penalty, and presence penalty affect code generation outcomes. We systematically adjusted all hyperparameters together, exploring every possible combination by making small increments to each hyperparameter at a time. This exhaustive approach was applied to 13 Python code generation tasks, yielding one of four outcomes for each hyperparameter combination: no output from the LLM, non executable code, code that fails unit tests, or correct and functional code. We analysed these outcomes for a total of 14,742 generated Python code segments, focusing on correctness, to determine how the hyperparameters influence the LLM to arrive at each outcome. Using correlation coefficient and regression tree analyses, we ascertained which hyperparameters influence which aspect of the LLM. Our results indicate that optimal performance is achieved with a temperature below 0.5, top probability below 0.75, frequency penalty above -1 and below 1.5, and presence penalty above -1. We make our dataset and results available to facilitate replication.

Download

Managing Human-Centric Software Defects: Insights from GitHub and Practitioners' Perspectives

August 2024

·

9 Reads

Context: Human-centric defects (HCDs) are nuanced and subjective defects that often occur due to end-user perceptions or differences, such as their genders, ages, cultures, languages, disabilities, socioeconomic status, and educational backgrounds. Development teams have a limited understanding of these issues, which leads to the neglect of these defects. Defect reporting tools do not adequately handle the capture and fixing of HCDs. Objective: This research aims to understand the current defect reporting process and tools for managing defects. Our study aims to capture process flaws and create a preliminary defect categorisation and practices of a defect-reporting tool that can improve the reporting and fixing of HCDs in software engineering. Method: We first manually classified 1,100 open-source issues from the GitHub defect reporting tool to identify human-centric defects and to understand the categories of such reported defects. We then interviewed software engineering practitioners to elicit feedback on our findings from the GitHub defects analysis and gauge their knowledge and experience of the defect-reporting process and tools for managing human-centric defects. Results: We identified 176 HCDs from 1,100 open-source issues across six domains: IT-Healthcare, IT-Web, IT-Spatial, IT-Manufacturing, IT-Finance, and IT-Gaming. Additionally, we interviewed 15 software practitioners to identify shortcomings in the current defect reporting process and determine practices for addressing these weaknesses. Conclusion: HCDs present in open-source repositories are fairly technical, and due to the lack of awareness and improper defect reports, they present a major challenge to software practitioners. However, the management of HCDs can be enhanced by implementing the practices for an ideal defect reporting tool developed as part of this study.



Fig. 5. The silhouette scores for each cluster of k=10
The number of the identified AI-based apps in each category.
Number/percentage of fairness reviews per category
Frequency of fairness concerns (clusters) in each category. FC: Fairness Concern
Fairness Concerns in App Reviews: A Study on AI-based Mobile Apps

July 2024

·

363 Reads

·

1 Citation

ACM Transactions on Software Engineering and Methodology

Fairness is one of the socio-technical concerns that must be addressed in software systems. Considering the popularity of mobile software applications (apps) among a wide range of individuals worldwide, mobile apps with unfair behaviors and outcomes can affect a significant proportion of the global population, potentially more than any other type of software system. Users express a wide range of socio-technical concerns in mobile app reviews. This research aims to investigate fairness concerns raised in mobile app reviews. Our research focuses on AI-based mobile app reviews as the chance of unfair behaviors and outcomes in AI-based mobile apps may be higher than in non-AI-based apps. To this end, we first manually constructed a ground-truth dataset, including 1,132 fairness and 1,473 non-fairness reviews. Leveraging the ground-truth dataset, we developed and evaluated a set of machine learning and deep learning models that distinguish fairness reviews from non-fairness reviews. Our experiments show that our best-performing model can detect fairness reviews with a precision of 94%. We then applied the best-performing model on approximately 9.5M reviews collected from 108 AI-based apps and identified around 92K fairness reviews. Next, applying the K-means clustering technique to the 92K fairness reviews, followed by manual analysis, led to the identification of six distinct types of fairness concerns (e.g., ‘receiving different quality of features and services in different platforms and devices’ and ‘lack of transparency and fairness in dealing with user-generated content’). Finally, the manual analysis of 2,248 app owners’ responses to the fairness reviews identified six root causes (e.g., ‘copyright issues’) that app owners report to justify fairness concerns.





Citations (25)


... Organisational support: One significant challenge in the incorporation of personas within RE arises from the organisational culture itself [40]. The study further reveals a tendency among software practitioners to prioritise functional requirements over personas. ...

Reference:

Lessons Learned from Persona Usage in Requirements Engineering Practice
Who uses personas in requirements engineering: The practitioners’ perspective
  • Citing Article
  • November 2024

Information and Software Technology

... In the rapidly evolving field of AI, prompt engineering has become a key area of interest, focusing on developing and optimizing prompts for large language models (LLMs). Developers are increasingly relying on LLMs for a variety of software engineering (SE) tasks, such as generating code from requirements [1], deriving test cases from requirements [2], [3], or tracing requirements to code [4]. These tasks often hinge on the quality of the requirements used to prompt the LLM, yet the impact of the specific formulation of requirements on LLM performance is still poorly understood. ...

Generating Test Scenarios from NL Requirements Using Retrieval-Augmented LLMs: An Industrial Study
  • Citing Conference Paper
  • June 2024

... In addition to quantitative analysis, we will gather qualitative insights to identify emerging problems in generated artifacts. Expert opinions will complement performance results to ensure LLM outputs are measured and explained, following the approach of Ferrari et al. [16]. ...

Model Generation with LLMs: From Requirements to UML Sequence Diagrams

... Our study aims to progress this conversation in the specific context of SE and the perception of software engineers, by systematically exploring potential societal biases in text and visual outputs of LLMs across multiple facets of gender, race/ethnicity, culture or religion, age, body type, and geographic locations. LLMs, widely used in SE for tasks such as recruitment [35], [36], requirements engineering [37], code generation [38], [39], and testing [40], pose significant risks of reinforcing societal biases. Given the diversity challenges already prevalent in SE [7], [16], [17], it becomes crucial to assess these tools' fairness and inclusivity in domain-specific contexts. ...

Advancing Requirements Engineering Through Generative AI: Assessing the Role of LLMs
  • Citing Chapter
  • June 2024

... With the rapid advancement of Large Language Models (LLMs) like ChatGPT, leveraging LLMs for persona development has become increasingly significant across various domains such as software engineering [Arora et al. 2023], UI design [Atlas 2023] and personas based on LLMs [Zhang et al. 2024]. There is a lack of exploring the synergy between prompt engineering and proto-persona strategies. ...

Auto-Generated Personas: Enhancing User-centered Design Practices among University Students
  • Citing Conference Paper
  • May 2024

... Huang et al. [4] contribute by emphasizing adaptive user experiences through generative AI. Their review offers insights into how dynamic personalization can be achieved using generative AI but lacks an extensive exploration of multimodal systems. ...

Unlocking Adaptive User Experience with Generative AI
  • Citing Conference Paper
  • January 2024

... The other direction (user models as input of ML pipelines to produce AI components that are better tailored to the user profiles) is also a promising approach to increase the awareness and impact of user models. We have not yet seen an explicit use of user models in the ML field which is surprising given that a significant portion of machine learning tasks leverage user data [10] and a significant amount of MDE works already tries to get closer to the needs of machine learning [51]. We believe there is an opportunity to leverage MDE to enhance ML models and algorithms with user models, by providing easy-to-use pipelines to train ML components with user data, for example, for classification purposes. ...

Model driven engineering for machine learning components: A systematic literature review
  • Citing Article
  • February 2024

Information and Software Technology

... Within the context of this paper, the analysis is focused on ChatGPT's CI feature, but a similar approach could be taken with OpenAI's API 6 , or with custom GPTs 7 . Also, even though the use case is on transforming tool notes into a tool review report, the approach presented in this paper could be applied to other cases as well. ...

Generative Artificial Intelligence for Software Engineering -A Research Agenda

... Prior research [26,54] used LLMs to provide feedback to participants from different perspectives enhancing their decision making. We note that there are two areas of research focus regarding LLM-powered personas: one focuses on using LLMs to generate personas [29,48,85,104], whereas the other area of research focuses on how personas act as CAs for between the LLMs and the user [41,43,62,72,82]. The later use-case is how we use our LLM-powered personas. ...

PersonaGen: A Tool for Generating Personas from User Feedback
  • Citing Conference Paper
  • September 2023

... Estudos afirmam que a gestão adequada das emoções pode melhorar a eficácia das atividades da ER (E8, E11, E12, E16, E19, E25, E30, E35, E38, E42, E44, E45 e E41). A importância de capturar e monitorar as emoções partes interessadas é essencial para promover uma melhor adaptação às mudanças de requisitos [15]. El-Migid et al. [1] afirmam que o monitoramento das emoções pode ajudar as equipes a se ajustarem de maneira mais eficaz às demandas e mudanças durante o desenvolvimento do projeto. ...

Multi-Modal Emotion Recognition for Enhanced Requirements Engineering: A Novel Approach
  • Citing Conference Paper
  • September 2023