Article

Tailoring Requirements Engineering for Responsible AI

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Recently reported issues concerning the acceptance of artificial intelligence (AI) solutions after deployment stress the importance of requirements engineering (RE) for designing and delivering responsible AI systems. In this article, we argue that RE should not only be carefully conducted but also tailored for responsible AI.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Similarly, researchers [31,32,36,42] also conducted semi-structured interviews for requirements elicitation. Additionally, there are studies [47][48][49]52] concentrated on requirements elicitation that prioritize stakeholders' perspectives and needs. ...
... Other methods identified for requirements elicitation across the reviewed literature encompass surveys, scenariobased elicitation techniques [36,37,[53][54][55][56], questionnaires [34,45], think-aloud protocols [57], focus groups [32,49,59,60], and controlled experiments, showcasing a diverse range of strategies for gathering and understanding project requirement. ...
... [46] Investigate terminologies' effectiveness in AI projects for operationalizing responsible AI and enhancing requirements related to environmental impacts. [49] Develop new methods for specifying data collection's variety and completeness, and improve ways to define context for data selection. [105] Explore ethical implications of AI outcomes, suggesting progress in technology applications without clear ethical guidelines. ...
Article
Full-text available
Artificial intelligence (AI) permeates all fields of life, which resulted in new challenges in requirements engineering for artificial intelligence (RE4AI), e.g., the difficulty in specifying and validating requirements for AI or considering new quality requirements due to emerging ethical implications. It is currently unclear if existing RE methods are sufficient or if new ones are needed to address these challenges. Therefore, our goal is to provide a comprehensive overview of RE4AI to researchers and practitioners. What has been achieved so far, i.e., what practices are available, and what research gaps and challenges still need to be addressed? To achieve this, we conducted a systematic mapping study combining query string search and extensive snowballing. The extracted data was aggregated, and results were synthesized using thematic analysis. Our selection process led to the inclusion of 126 primary studies. Existing RE4AI research focuses mainly on requirements analysis and elicitation, with most practices applied in these areas. Furthermore, we identified requirements specification, explainability, and the gap between machine learning engineers and end-users as the most prevalent challenges, along with a few others. Additionally, we proposed seven potential research directions to address these challenges. Practitioners can use our results to identify and select suitable RE methods for working on their AI-based systems, while researchers can build on the identified gaps and research directions to push the field forward.
... Trade-off analysis techniques used in requirements engineering [1], [20], [28] may be applicable for determining how each ethics aspect affects an AI system under construction and for addressing the interplay between ethics aspects [25], [30]. Each applicable ethics aspect is listed, followed by listing possible ML models and data types that may be suitable for implementing an AI system. ...
... This ties in with the dimension of justification suggested in [10], where the justification provides a context-specific rationale for the drivers behind giving more weight to one aspect over another. [30]. For an AI based authentication system using biometrics, the consideration involves two ML models (HMM and DNN) and two data types (speech only, and speech in conjunction with face). ...
... The requirements engineering approach (Sec. II-C) encompasses proactive design consideration where ethics aspects, models and data types are examined, and the associated tensions and solutions are explored [30]. Similarly, a Value Sensitive Design (VSD) approach exemplifies a proactive approach to applying high-level ethics principles in practice [10]. ...
Conference Paper
Full-text available
While the operationalisation of high-level AI ethics principles into practical AI/ML systems has made progress, there is still a theory-practice gap in managing tensions between the underlying AI ethics aspects. We cover five approaches for addressing the tensions via trade-offs, ranging from rudimentary to complex. The approaches differ in the types of considered context, scope, methods for measuring contexts, and degree of justification. None of the approaches is likely to be appropriate for all organisations, systems, or applications. To address this, we propose a framework which consists of: (i) proactive identification of tensions, (ii) prioritisation and weighting of ethics aspects, (iii) justification and documentation of trade-off decisions. The proposed framework aims to facilitate the implementation of well-rounded AI/ML systems that are appropriate for potential regulatory requirements.
... For example, Langer et al. [20] highlight the role of explainability in building trustworthy systems and proposes techniques to audit the explanations from four crucial perspectives: technical, psychological, legal, and ethical. Two studies on the development of ethical AI systems report explainability as an important requirement when developing responsible AI [14,15]. ...
... Likewise, Nuseibeh and Easterbrook suggest interviews as a elicitation technique [29]. Maalej et al. also recommend conducting interviews and workshops with stakeholders for responsible AI [15]. ...
... Using a simple template similar to user stories helps representing the individual explainability requirements in a structured way [6]. Similarly, Inayat et al. [30] suggest user stories for agile RE, and Maalej et al. [15] for responsible AI. ...
Conference Paper
[Context and Motivation] Many recent studies highlight explainability as an important requirement that supports in building transparent, trustworthy, and responsible AI systems. As a result, there is an increasing number of solutions that researchers have developed to assist in the definition of explainability requirements. [Question] We conducted a literature study to analyze what kind of candidate solutions are proposed for defining the explainability requirements of AI systems. The focus of this literature review is especially on the field of requirements engineering (RE). [Results] The proposed solutions for defining explainability requirements such as approaches, frameworks, and models are comprehensive. They can be used not only for RE activities but also for testing and evaluating the explainability of AI systems. In addition to the comprehensive solutions, we identified 30 practices that support the development of explainable AI systems. The literature study also revealed that most of the proposed solutions have not been evaluated in real projects, and there is a need for empirical studies. [Contribution] For researchers, the study provides an overview of the candidate solutions and describes research gaps. For practitioners, the paper summarizes potential practices that can help them define and evaluate the explainability requirements of AI systems.
... Trade-off analysis techniques used in requirements engineering [1], [20], [28] may be applicable for determining how each ethics aspect affects an AI system under construction and for addressing the interplay between ethics aspects [25], [30]. Each applicable ethics aspect is listed, followed by listing possible ML models and data types that may be suitable for implementing an AI system. ...
... This ties in with the dimension of justification suggested in [10], where the justification provides a context-specific rationale for the drivers behind giving more weight to one aspect over another. [30]. For an AI based authentication system using biometrics, the consideration involves two ML models (HMM and DNN) and two data types (speech only, and speech in conjunction with face). ...
... The requirements engineering approach (Sec. II-C) encompasses proactive design consideration where ethics aspects, models and data types are examined, and the associated tensions and solutions are explored [30]. Similarly, a Value Sensitive Design (VSD) approach exemplifies a proactive approach to applying high-level ethics principles in practice [10]. ...
Preprint
Full-text available
While the operationalisation of high-level AI ethics principles into practical AI/ML systems has made progress, there is still a theory-practice gap in managing tensions between the underlying AI ethics aspects. We cover five approaches for addressing the tensions via trade-offs, ranging from rudimentary to complex. The approaches differ in the types of considered context, scope, methods for measuring contexts, and degree of justification. None of the approaches is likely to be appropriate for all organisations, systems, or applications. To address this, we propose a framework which consists of: (i) proactive identification of tensions, (ii) prioritisation and weighting of ethics aspects, (iii) justification and documentation of trade-off decisions. The proposed framework aims to facilitate the implementation of well-rounded AI/ML systems that are appropriate for potential regulatory requirements.
... Responsible AI is the concept of developing and deploying AI systems that are ethical, transparent, fair, and trustworthy. It involves considering the potential impact of AI on individuals and society as a whole and taking steps to ensure that AI systems are developed and used responsibly [21]- [23]. AI systems should be transparent and explainable to ensure that users and stakeholders can make informed decisions about their use. ...
... Recent work by Maalej, W., Pham, Y. D., and Chazette, L. [23] discusses the challenges of applying requirements engineering (RE) in machine learning (ML) projects based on several recent studies. The studies highlighted the difficulties encountered by practitioners in ML projects, including an insufficient understanding of customers or too many expectations. ...
Preprint
Full-text available
Integrating ethical practices into the AI development process for artificial intelligence (AI) is essential to ensure safe, fair, and responsible operation. AI ethics involves applying ethical principles to the entire life cycle of AI systems. This is essential to mitigate potential risks and harms associated with AI, such as algorithm biases. To achieve this goal, responsible design patterns (RDPs) are critical for Machine Learning (ML) pipelines to guarantee ethical and fair outcomes. In this paper, we propose a comprehensive framework incorporating RDPs into ML pipelines to mitigate risks and ensure the ethical development of AI systems. Our framework comprises new responsible AI design patterns for ML pipelines identified through a survey of AI ethics and data management experts and validated through real-world scenarios with expert feedback. The framework guides AI developers, data scientists, and policy-makers to implement ethical practices in AI development and deploy responsible AI systems in production.
... Notably, new quality aspects must be considered when defining requirements for MLSSs, such as explainability, fairness, and legal aspects of data. Similar work is done by Maalej et al. (2023); Horkoff (2019). Horkoff (2019) focused on non-functional requirements and explained the difficulty of defining measurable success criteria for quality attributes. ...
Article
Full-text available
Context An increasing demand is observed in various domains to employ Machine Learning (ML) for solving complex problems. ML models are implemented as software components and deployed in Machine Learning Software Systems (MLSSs). Problem There is a strong need for ensuring the serving quality of MLSSs. False or poor decisions of such systems can lead to malfunction of other systems, significant financial losses, or even threats to human life. The quality assurance of MLSSs is considered a challenging task and currently is a hot research topic. Objective This paper aims to investigate the characteristics of real quality issues in MLSSs from the viewpoint of practitioners. This empirical study aims to identify a catalog of quality issues in MLSSs. Method We conduct a set of interviews with practitioners/experts, to gather insights about their experience and practices when dealing with quality issues. We validate the identified quality issues via a survey with ML practitioners. Results Based on the content of 37 interviews, we identified 18 recurring quality issues and 24 strategies to mitigate them. For each identified issue, we describe the causes and consequences according to the practitioners’ experience. Conclusion We believe the catalog of issues developed in this study will allow the community to develop efficient quality assurance tools for ML models and MLSSs. A replication package of our study is available on our public GitHub repository.
... A high number of AI solutions fail or do not make it to production due to missing or bad RE processes. Therefore, Maalej et al. discussed six aspects that need careful consideration and tailoring to the AI context that include acceptable levels of quality requirements, data-and user-centered prototyping, expanding RE to focus on data, embedding responsible AI terminology into the engineering workflows, trade-off analysis for responsible AI, and requirements as foundation for quality and testing of AI [21]. Gjorgjevikj et al. discussed the challenges in applying conventional RE practices to ML systems and proposed best practices and adjustments to RE concepts [22]. ...
Article
Full-text available
Driving automation systems, including autonomous driving and advanced driver assistance, are an important safety-critical domain. Such systems often incorporate perception systems that use machine learning to analyze the vehicle environment. We explore new or differing topics and challenges experienced by practitioners in this domain, which relate to requirements engineering (RE), quality, and systems and software engineering. We have conducted a semi-structured interview study with 19 participants across five companies and performed thematic analysis of the transcriptions. Practitioners have difficulty specifying upfront requirements and often rely on scenarios and operational design domains (ODDs) as RE artifacts. RE challenges relate to ODD detection and ODD exit detection, realistic scenarios, edge case specification, breaking down requirements, traceability, creating specifications for data and annotations, and quantifying quality requirements. Practitioners consider performance, reliability, robustness, user comfort, and—most importantly—safety as important quality attributes. Quality is assessed using statistical analysis of key metrics, and quality assurance is complicated by the addition of ML, simulation realism, and evolving standards. Systems are developed using a mix of methods, but these methods may not be sufficient for the needs of ML. Data quality methods must be a part of development methods. ML also requires a data-intensive verification and validation process, introducing data, analysis, and simulation challenges. Our findings contribute to understanding RE, safety engineering, and development methodologies for perception systems. This understanding and the collected challenges can drive future research for driving automation and other ML systems.
Preprint
Full-text available
Artificial intelligence (AI) permeates all fields of life, which resulted in new challenges in requirements engineering for artificial intelligence (RE4AI), e.g., the difficulty in specifying and validating requirements for AI or considering new quality requirements due to emerging ethical implications. It is currently unclear if existing RE methods are sufficient or if new ones are needed to address these challenges. Therefore, our goal is to provide a comprehensive overview of RE4AI to researchers and practitioners. What has been achieved so far, i.e., what practices are available, and what research gaps and challenges still need to be addressed? To achieve this, we conducted a systematic mapping study combining query string search and extensive snowballing. The extracted data was aggregated, and results were synthesized using thematic analysis. Our selection process led to the inclusion of 126 primary studies. Existing RE4AI research focuses mainly on requirements analysis and elicitation, with most practices applied in these areas. Furthermore, we identified requirements specification, explainability, and the gap between machine learning engineers and end-users as the most prevalent challenges, along with a few others. Additionally, we proposed seven potential research directions to address these challenges. Practitioners can use our results to identify and select suitable RE methods for working on their AI-based systems, while researchers can build on the identified gaps and research directions to push the field forward.
Conference Paper
Full-text available
Deep learning algorithms promise to improve clinician workflows and patient outcomes. However, these gains have yet to be fully demonstrated in real world clinical settings. In this paper, we describe a human-centered study of a deep learning system used in clinics for the detection of diabetic eye disease. From interviews and observation across eleven clinics in Thailand, we characterize current eye-screening workflows, user expectations for an AI-assisted screening process, and post-deployment experiences. Our findings indicate that several socio-environmental factors impact model performance, nursing workflows, and the patient experience. We draw on these findings to reflect on the value of conducting human- centered evaluative research alongside prospective evaluations of model accuracy.
Article
Full-text available
Adding an ability for a system to learn inherently adds non-determinism into the system. Given the rising popularity of incorporating machine learning into systems, we wondered how the addition alters software development practices. We performed a mixture of qualitative and quantitative studies with 14 interviewees and 342 survey respondents from 26 countries across four continents to elicit significant differences between the development of machine learning systems and the development of non-machine-learning systems. Our study uncovers significant differences in various aspects of software engineering (e.g., requirements, design, testing, and process) and work features (e.g., skill variety, problem solving and task identity). Based on our findings, we highlight future research directions and provide recommendations for practitioners.
Article
Full-text available
Energy consumption has been widely studied in the computer architecture field for decades. While the adoption of energy as a metric in machine learning is emerging, the majority of research is still primarily focused on obtaining high levels of accuracy without any computational constraint. We believe that one of the reasons for this lack of interest is due to their lack of familiarity with approaches to evaluate energy consumption. To address this challenge, we present a review of the different approaches to estimate energy consumption in general and machine learning applications in particular. Our goal is to provide useful guidelines to the machine learning community giving them the fundamental knowledge to use and build specific energy estimation methods for machine learning algorithms. We also present the latest software tools that give energy estimation values, together with two use cases that enhance the study of energy consumption in machine learning.
Article
Andrew Ng has serious street cred in artificial intelligence. He pioneered the use of graphics processing units (GPUs) to train deep learning models in the late 2000s with his students at Stanford University, cofounded Google Brain in 2011, and then served for three years as chief scientist for Baidu, where he helped build the Chinese tech giant's AI group. So when he says he has identified the next big shift in artificial intelligence, people listen. And that's what he told IEEE Spectrum in an exclusive Q&A. • Ng's current efforts are focused on his company, Landing AI, which built a platform called LandingLens to help manufacturers improve visual inspection with computer vision. He has also become something of an evangelist for what he calls the data-centric AI movement, which he says can yield “small data” solutions to big issues in AI, including model efficiency, accuracy, and bias.
Conference Paper
Although algorithmic auditing has emerged as a key strategy to expose systematic biases embedded in software platforms, we struggle to understand the real-world impact of these audits, as scholarship on the impact of algorithmic audits on increasing algorithmic fairness and transparency in commercial systems is nascent. To analyze the impact of publicly naming and disclosing performance results of biased AI systems, we investigate the commercial impact of Gender Shades, the first algorithmic audit of gender and skin type performance disparities in commercial facial analysis models. This paper 1) outlines the audit design and structured disclosure procedure used in the Gender Shades study, 2) presents new performance metrics from targeted companies IBM, Microsoft and Megvii (Face++) on the Pilot Parliaments Benchmark (PPB) as of August 2018, 3) provides performance results on PPB by non-target companies Amazon and Kairos and, 4) explores differences in company responses as shared through corporate communications that contextualize differences in performance on PPB. Within 7 months of the original audit, we find that all three targets released new API versions. All targets reduced accuracy disparities between males and females and darker and lighter-skinned subgroups, with the most significant update occurring for the darker-skinned female subgroup, that underwent a 17.7% - 30.4% reduction in error between audit periods. Minimizing these disparities led to a 5.72% to 8.3% reduction in overall error on the Pilot Parliaments Benchmark (PPB) for target corporation APIs. The overall performance of non-targets Amazon and Kairos lags significantly behind that of the targets, with error rates of 8.66% and 6.60% overall, and error rates of 31.37% and 22.50% for the darker female subgroup, respectively.
Conference Paper
A large part of Requirements Engineering is concerned with involving system users, capturing their needs, and getting their feedback. As users are becoming more and more demanding, markets and technologies are evolving fast, and systems are getting more and more individual, a broad and systematic user involvement in Requirements Engineering is becoming more important than ever. This paper presents the idea of pushing user involvement in Requirements Engineering to its extreme by systematically delegating the responsibility for developing the requirements and deciding about future releases to the crowd of users. We summarize the pros and cons of this vision, its main challenges, and sketch promising solution concepts, which have been proposed and used in E-Participation and E-Democracy. We discussed our vision with ten experts from the fields of Requirements Engineering, politics, psychology, and market research, who were partly supportive partly skeptical.
Hello World: How to Be Human in the Age of the Machine
  • H Fry