Mehrdad Sabetzadeh’s research while affiliated with University of Ottawa and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (141)


Test Input Validation for Vision-based DL Systems: An Active Learning Approach
  • Preprint

January 2025

·

14 Reads

Delaram Ghobari

·

Mohammad Hossein Amini

·

·

[...]

·

Mehrdad Sabetzadeh

Testing deep learning (DL) systems requires extensive and diverse, yet valid, test inputs. While synthetic test input generation methods, such as metamorphic testing, are widely used for DL testing, they risk introducing invalid inputs that do not accurately reflect real-world scenarios. Invalid test inputs can lead to misleading results. Hence, there is a need for automated validation of test inputs to ensure effective assessment of DL systems. In this paper, we propose a test input validation approach for vision-based DL systems. Our approach uses active learning to balance the trade-off between accuracy and the manual effort required for test input validation. Further, by employing multiple image-comparison metrics, it achieves better results in classifying valid and invalid test inputs compared to methods that rely on single metrics. We evaluate our approach using an industrial and a public-domain dataset. Our evaluation shows that our multi-metric, active learning-based approach produces several optimal accuracy-effort trade-offs, including those deemed practical and desirable by our industry partner. Furthermore, provided with the same level of manual effort, our approach is significantly more accurate than two state-of-the-art test input validation methods, achieving an average accuracy of 97%. Specifically, the use of multiple metrics, rather than a single metric, results in an average improvement of at least 5.4% in overall accuracy compared to the state-of-the-art baselines. Incorporating an active learning loop for test input validation yields an additional 7.5% improvement in average accuracy, bringing the overall average improvement of our approach to at least 12.9% compared to the baselines.





Developing a Llama-Based Chatbot for CI/CD Question Answering: A Case Study at Ericsson

August 2024

·

17 Reads

This paper presents our experience developing a Llama-based chatbot for question answering about continuous integration and continuous delivery (CI/CD) at Ericsson, a multinational telecommunications company. Our chatbot is designed to handle the specificities of CI/CD documents at Ericsson, employing a retrieval-augmented generation (RAG) model to enhance accuracy and relevance. Our empirical evaluation of the chatbot on industrial CI/CD-related questions indicates that an ensemble retriever, combining BM25 and embedding retrievers, yields the best performance. When evaluated against a ground truth of 72 CI/CD questions and answers at Ericsson, our most accurate chatbot configuration provides fully correct answers for 61.11% of the questions, partially correct answers for 26.39%, and incorrect answers for 12.50%. Through an error analysis of the partially correct and incorrect answers, we discuss the underlying causes of inaccuracies and provide insights for further refinement. We also reflect on lessons learned and suggest future directions for further improving our chatbot's accuracy.


A Lean Simulation Framework for Stress Testing IoT Cloud Systems

July 2024

·

35 Reads

IEEE Transactions on Software Engineering

The Internet of Things (IoT) connects a plethora of smart devices globally across various applications like smart cities, autonomous vehicles, and health monitoring. Simulation plays a key role in the testing of IoT systems, noting that field testing of a complete IoT product may be infeasible or prohibitively expensive. This paper addresses a specific yet important need in simulation-based testing for IoT: Stress testing of cloud systems that are increasingly employed in IoT applications. Existing stress testing solutions for IoT demand significant computational resources, making them ill-suited and costly. We propose a lean simulation framework designed for IoT cloud stress testing. The framework enables efficient simulation of a large array of IoT and edge devices that communicate with the cloud. To facilitate simulation construction for practitioners, we develop a domain-specific language (DSL) , named IoTECS, for generating simulators from model-based specifications. We provide the syntax and semantics of IoTECS and implement IoTECS using Xtext and Xtend. We assess simulators generated from IoTECS specifications for stress testing two real-world systems: a cloud-based IoT monitoring system developed by our industry partner and an IoT-connected vehicle system. Our empirical results indicate that simulators created using IoTECS: (1) achieve best performance when configured with Docker containerization; (2) effectively assess the service capacity of our case-study systems, and (3) outperform industrial stress-testing baseline tools, JMeter and Locust, by a factor of 3.5 in terms of the number of IoT and edge devices they can simulate using identical hardware resources. To gain initial insights about the usefulness of IoTECS in practice, we interviewed two engineers from our industry partner who have firsthand experience with IoTECS. Feedback from these interviews suggests that IoTECS is effective in stress testing IoT cloud systems, saving significant time and effort.





Enhancing Automata Learning with Statistical Machine Learning: A Network Security Case Study

May 2024

·

8 Reads

Intrusion detection systems are crucial for network security. Verification of these systems is complicated by various factors, including the heterogeneity of network platforms and the continuously changing landscape of cyber threats. In this paper, we use automata learning to derive state machines from network-traffic data with the objective of supporting behavioural verification of intrusion detection systems. The most innovative aspect of our work is addressing the inability to directly apply existing automata learning techniques to network-traffic data due to the numeric nature of such data. Specifically, we use interpretable machine learning (ML) to partition numeric ranges into intervals that strongly correlate with a system's decisions regarding intrusion detection. These intervals are subsequently used to abstract numeric ranges before automata learning. We apply our ML-enhanced automata learning approach to a commercial network intrusion detection system developed by our industry partner, RabbitRun Technologies. Our approach results in an average 67.5% reduction in the number of states and transitions of the learned state machines, while achieving an average 28% improvement in accuracy compared to using expertise-based numeric data abstraction. Furthermore, the resulting state machines help practitioners in verifying system-level security requirements and exploring previously unknown system behaviours through model checking and temporal query checking. We make our implementation and experimental data available online.


Citations (74)


... These aspects include the observation that some research areas are not as well covered as others, e.g., licensing issues related to models or reverse engineering; also, they believe we have a lack of guidelines (e.g., for doing prompting engineering) and also that we may have lack of expertise for ML niche areas. The last point may be (at least partially) related to a recent editorial, in which the editorial board of a large SE journal frames which aspects of research in the intersection of ML and SE are suitable for a SE venue [93]. ...

Reference:

Perspective of Software Engineering Researchers on Machine Learning Practices Regarding Research, Review, and Education
Scoping Software Engineering for AI: The TSE Perspective
  • Citing Article
  • November 2024

IEEE Transactions on Software Engineering

... In addition to enhancing completeness, LLMs have also shown promise in ensuring regulatory compliance. For instance, the study by Hassani et al. (2024) illustrates how Data Processing Agreements (DPAs) can be evaluated for compliance with the General Data Protection Regulation (GDPR), a legal framework aimed at ensuring data privacy in the European Union, using advanced techniques powered by Large Language Models. Their work highlights how automating compliance checks can streamline the validation process, reducing manual efforts and improving accuracy in legal and regulatory adherence. ...

Rethinking Legal Compliance Automation: Opportunities with Large Language Models
  • Citing Conference Paper
  • June 2024

... The Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) are container elastic scaling strategies. In the Horizontal Pod Autoscaler, the two most common optimization objectives are to reduce the rate of SLA violations and improve CPU utilization (Karol Santos Nunes et al., 2024;Daradkeh & Agarwal, 2023). According to the trigger conditions and execution methods, there are two common ways to implement the container elastic scaling strategy: (1) adjusting thresholds based on the load prediction model; and (2) using a reinforcement learning agent to make scaling decisions and iteratively learn the best scaling strategy (Khaleq & Ra, 2021;Zhong et al., 2022;Huo et al., 2022;Ahmad et al., 2022). ...

Self-adaptive, Requirements-driven Autoscaling of Microservices
  • Citing Conference Paper
  • June 2024

... Effective compliance checking in SRS documents plays a crucial role in mitigating many associated risks. Luitel et al. (2024) demonstrate how BERT, a Large Language Model, is employed to detect and address incomplete requirements by predicting missing terminology, thereby improving the overall completeness of SRS. In addition to enhancing completeness, LLMs have also shown promise in ensuring regulatory compliance. ...

Improving requirements completeness: automated assistance through large language models
  • Citing Article
  • Publisher preview available
  • March 2024

Requirements Engineering

... The identification of such diverse failing test inputs can support the detection of underlying failure causes and conditions leading to failures (Ben Abdessalem et al. 2018;Jodat et al. 2024). Existing research on software testing views the identification of diverse failures as an important testing objective (Aghababaeyan et al. 2023;Feldt et al. 2016). ...

Test Generation Strategies for Building Failure Models and Explaining Spurious Failures
  • Citing Article
  • December 2023

ACM Transactions on Software Engineering and Methodology

... The language understanding capability helps deal with stakeholders or manage and process a large amount of text [4], [12], [38], [45], [47]. • Modeling: ChatGPT is applied for requirements modeling, showcasing its use in semi-structured formats that bridge the gap between natural language and formal model representation [39]- [41]. • Specification: ChatGPT's capacity to formalize and standardize language has been leveraged for generating clear, structured specifications, with multiple studies highlighting its use in creating specifications from the requirements [4], [28], [42]- [44]. ...

On the Use of GPT-4 for Creating Goal Models: An Exploratory Study

... Methods: To answer this question, we selected five benchmark methods for comparative experiments, including MultiCNN (Li et al. [19]), CNN (Li et al. [11]), NBM (Huang et al. [16]), BiLSTM (Yu et al. [17]), and BERT (Kenton et al. [52]). ...

Measuring Improvement of F 1 -Scores in Detection of Self-Admitted Technical Debt
  • Citing Conference Paper
  • May 2023

... A satisfiability checking tool [30] and many runtime monitoring tools support (different fragments of) MFOTL [23], including MonPoly [13,[17][18][19], Ver-iMon [9,10,59] and DejaVu [35]. Lima et al. [48] recently introduced Explana-tor2, an MTL monitor that outputs explanations. ...

Early Verification of Legal Compliance via Bounded Satisfiability Checking

Lecture Notes in Computer Science

... Overall, based on working paradigm and real-world applications, existing language models can be divided into two distinct groups: those relying on fine-tuning and those requiring prompt engineering. Extensive research has been conducted to evaluate and compare performance of language models of first category across a wide range of requirement engineering applications namely; requirements extraction [39], classification [11], duplicate requirement identification [40], requirements reuse [41,42], NER tagging [43], question answering [44] and sentiment analysis of software's reviews [45]. Contrarily, language models in second category such as ChatGPT and Gemini are being extensively utilized for text re-phrasing [46,47], data generation, healthcare [48][49][50], finance [51][52][53], education [54] and opinion mining [55] [56][57][58][59] has been performed to explore their potential across software engineering tasks. ...

AI-based Question Answering Assistance for Analyzing Natural-language Requirements
  • Citing Conference Paper
  • May 2023

... MELA assumes that S accepts time-series data as input and generates time-series data as output. Examples of such systems include cyber-physical systems (CPS) and network systems [17,18]. Our approach treats S as a black box and does not make any assumptions about its internals. ...

Learning Non-robustness using Simulation-based Testing: a Network Traffic-shaping Case Study
  • Citing Conference Paper
  • April 2023