Article

Deliberating with AI: Improving Decision-Making for the Future through Participatory AI Design and Stakeholder Deliberation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Research exploring how to support decision-making has often used machine learning to automate or assist human decisions. We take an alternative approach for improving decision-making, using machine learning to help stakeholders surface ways to improve and make fairer decision-making processes. We created "Deliberating with AI", a web tool that enables people to create and evaluate ML models in order to examine strengths and shortcomings of past decision-making and deliberate on how to improve future decisions. We apply this tool to a context of people selection, having stakeholders---decision makers (faculty) and decision subjects (students)---use the tool to improve graduate school admission decisions. Through our case study, we demonstrate how the stakeholders used the web tool to create ML models that they used as boundary objects to deliberate over organization decision-making practices. We share insights from our study to inform future research on stakeholder-centered participatory AI design and technology for organizational decision-making.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Participatory AI, also known as inclusive and equitable AI or co-creative AI, is a field that has emerged in recent years [29]. It involves a model that allows the participation of various stakeholders in the design, development, and decisionmaking process of AI systems [29]. ...
... Participatory AI, also known as inclusive and equitable AI or co-creative AI, is a field that has emerged in recent years [29]. It involves a model that allows the participation of various stakeholders in the design, development, and decisionmaking process of AI systems [29]. Unlike traditional approaches in which AI systems are primarily developed and controlled by a small group of experts, participatory AI seeks to democratize technology by actively engaging individuals and communities who may be affected by or have valuable insights into its applications [30]. ...
... Participatory AI systems involve various participatory processes, including co-design, public consultation, citizen science initiatives, and ongoing collaboration among researchers, developers, and the community. Through these processes, participants collectively state the problem, identify data sources, create algorithmic models, establish evaluation metrics, and shape the AI system's deployment and governance framework [29]. ...
Article
Full-text available
Algorithmic technologies are widely applied in organizational decision-making today, which can improve resource allocation and decision-making coordination to facilitate the accuracy and efficiency of the decision-making process within and across organizations. However, algorithmic controls also introduce and amplify organizational inequalities—workers who are female, people of color and the marginalized population, and workers with low skills, a low level of education, or who have low technology literacy can be disadvantaged and discriminated against due to the lack of transparency, explainability, objectivity, and accountability in these algorithms. Through a systematic literature review, this study comprehensively compares three different types of controls in organizations: technical controls, bureaucratic controls, and algorithmic controls, which led to our understanding of the advantages and disadvantages associated with algorithmic controls. The literature on the organizational inequality related to the employment of algorithmic controls is then discussed and summarized. Finally, we explore the potential of trustworthy algorithmic controls and participatory development of algorithms to mitigate organizational inequalities associated with algorithmic controls. Our findings raise the awareness related to the potential corporate inequalities associated with algorithmic controls in organizations and endorse the development of future generations of hiring and employment algorithms through trustworthy and participatory approaches.
... Hence, balancing diverse fairness needs and seeking consensus among stakeholders is crucial for implementing fair AI successfully. Current research primarily focuses on seeking consensus on AI model building or predictive outcomes through participatory design [40,62] or co-design [45,51] for AI fairness democratization. For example, in the context of goods division, Lee et al. [38] reached distribution outcome consensus by allowing individual adjustments first, then group discussions for collective decision-making on the allocations. ...
... This leads to the question of what to do with personal preferences once we have elicited them, in order to mitigate fairness issues. One approach might be to incorporate some simple voting for metrics and features, similar to the weighting that we have used, with the "winning" metric and features being the ones implemented [62]. We might also consider optimizing all metrics chosen by stakeholders. ...
Preprint
Full-text available
Numerous fairness metrics have been proposed and employed by artificial intelligence (AI) experts to quantitatively measure bias and define fairness in AI models. Recognizing the need to accommodate stakeholders' diverse fairness understandings, efforts are underway to solicit their input. However, conveying AI fairness metrics to stakeholders without AI expertise, capturing their personal preferences, and seeking a collective consensus remain challenging and underexplored. To bridge this gap, we propose a new framework, EARN Fairness, which facilitates collective metric decisions among stakeholders without requiring AI expertise. The framework features an adaptable interactive system and a stakeholder-centered EARN Fairness process to Explain fairness metrics, Ask stakeholders' personal metric preferences, Review metrics collectively, and Negotiate a consensus on metric selection. To gather empirical results, we applied the framework to a credit rating scenario and conducted a user study involving 18 decision subjects without AI knowledge. We identify their personal metric preferences and their acceptable level of unfairness in individual sessions. Subsequently, we uncovered how they reached metric consensus in team sessions. Our work shows that the EARN Fairness framework enables stakeholders to express personal preferences and reach consensus, providing practical guidance for implementing human-centered AI fairness in high-risk contexts. Through this approach, we aim to harmonize fairness expectations of diverse stakeholders, fostering more equitable and inclusive AI fairness.
... Future FAccT research can build upon these further understandings to guide policy making and consensus building. For example, in addition to conducting surveys with single participant, future work can explore how group discussions and deliberations shape communities' collective understanding of AI impacts (e.g., [28,32,44,45,84]) ...
Preprint
Full-text available
In recent years, there has been a growing recognition of the need to incorporate lay-people's input into the governance and acceptability assessment of AI usage. However, how and why people judge different AI use cases to be acceptable or unacceptable remains under-explored. In this work, we investigate the attitudes and reasons that influence people's judgments about AI's development via a survey administered to demographically diverse participants (N=197). We focus on ten distinct professional (e.g., Lawyer AI) and personal (e.g., Digital Medical Advice AI) AI use cases to understand how characteristics of the use cases and the participants' demographics affect acceptability. We explore the relationships between participants' judgments and their rationales such as reasoning approaches (cost-benefit reasoning vs. rule-based). Our empirical findings reveal number of factors that influence acceptance such as general negative acceptance and higher disagreement of professional usage over personal, significant influence of demographics factors such as gender, employment, and education as well as AI literacy level, and reasoning patterns such as rule-based reasoning being used more when use case is unacceptable. Based on these findings, we discuss the key implications for soliciting acceptability and reasoning of AI use cases to collaboratively build consensus. Finally, we shed light on how future FAccT researchers and practitioners can better incorporate diverse perspectives from lay people to better develop AI that aligns with public expectations and needs.
... Furthermore, AI can be integrated into digital platforms to enhance various processes, such as moderating discussions, analysing and summarising proposals through Natural Language Processing (NLP) (Hadfi et al. 2021), and creating visual representations of arguments and counterarguments made during online deliberations (Zhang et al. 2023). NLP can also detect logical fallacies (Sourati et al. 2023), which is particularly beneficial for political debates and deliberative practices by enabling participants to focus on more robust arguments and identify the weaknesses in others' reasoning. ...
Preprint
Full-text available
This chapter explores the influence of Artificial Intelligence (AI) on digital democracy, focusing on four main areas: citizenship, participation, representation, and the public sphere. It traces the evolution from electronic to virtual and network democracy, underscoring how each stage has broadened democratic engagement through technology. Focusing on digital citizenship, the chapter examines how AI can improve online engagement and promote ethical behaviour while posing privacy risks and fostering identity stereotyping. Regarding political participation, it highlights AI's dual role in mobilising civic actions and spreading misinformation. Regarding representation, AI's involvement in electoral processes can enhance voter registration, e-voting, and the efficiency of result tabulation but raises concerns regarding privacy and public trust. Also, AI's predictive capabilities shift the dynamics of political competition, posing ethical questions about manipulation and the legitimacy of democracy. Finally, the chapter examines how integrating AI and digital technologies can facilitate democratic political advocacy and personalised communication. However, this also comes with higher risks of misinformation and targeted propaganda.
... PD and Co-C approaches to incorporating fairness in AI applications must negotiate the multiple notions of fairness held by diverse stakeholders [26][27][28][29]. In this way, machine learning algorithms can be used in deliberative PD processes as a form of S.L. Star's "boundary object" through which participants can negotiate shared beliefs and values with other stakeholders, as well as "the complexity of their differences within the problem space" [30]. A specific area of focus for PD/Co-C for AI is an explicitly value-led approach that aligns responsible AI design with social good. ...
Article
Full-text available
Participatory design (PD) and co-creation (Co-C) approaches to building Artificial Intelligence (AI) systems have become increasingly popular exercises for ensuring greater social inclusion and fairness in technological transformation by accounting for the experiences of vulnerable or disadvantaged social groups; however, such design work is challenging in practice, partly because of the inaccessible domain of technical expertise inherent to AI design. This paper evaluates a methodological approach to make addressing AI bias more accessible by incorporating a training component on AI bias in a Co-C process with vulnerable and marginalized participant groups. This was applied by socio-technical researchers involved in creating an AI bias mitigation developer toolkit. This paper’s analysis emphasizes that critical reflection on how to use training in Co-C appropriately and how such training should be designed and implemented is necessary to ensure training allows for a genuinely more inclusive approach to AI systems design when those most at risk of being adversely affected by AI technologies are often not the intended end-users of said technologies. This is acutely relevant as Co-C exercises are increasingly used to demonstrate regulatory compliance and ethical practice by powerful institutions and actors developing AI systems, particularly in the ethical and regulatory environment coalescing around the European Union’s recent AI Act.
... However, popular copilots place humans in the primary decision-making role and typically do not alter the dynamics of human-to-human interaction. Most prior research on human-AI collaboration to date (e.g., [16,17,23,24,28,33,34,41,44,47]) has similarly focused on dyadic interaction and quantitatively measurable tasks such as collaborative gaming [47], object identification [46] or decision making [33]. More recently, a new class of AI agent systems [2][3][4]36] has emerged, coordinating multiple AI agents to complete software development or similar tasks. ...
Preprint
Full-text available
We explore the potential for productive team-based collaboration between humans and Artificial Intelligence (AI) by presenting and conducting initial tests with a general framework that enables multiple human and AI agents to work together as peers. ChatCollab's novel architecture allows agents - human or AI - to join collaborations in any role, autonomously engage in tasks and communication within Slack, and remain agnostic to whether their collaborators are human or AI. Using software engineering as a case study, we find that our AI agents successfully identify their roles and responsibilities, coordinate with other agents, and await requested inputs or deliverables before proceeding. In relation to three prior multi-agent AI systems for software development, we find ChatCollab AI agents produce comparable or better software in an interactive game development task. We also propose an automated method for analyzing collaboration dynamics that effectively identifies behavioral characteristics of agents with distinct roles, allowing us to quantitatively compare collaboration dynamics in a range of experimental conditions. For example, in comparing ChatCollab AI agents, we find that an AI CEO agent generally provides suggestions 2-4 times more often than an AI product manager or AI developer, suggesting agents within ChatCollab can meaningfully adopt differentiated collaborative roles. Our code and data can be found at: https://github.com/ChatCollab.
... Participatory Social-AI Research In order to better align development and deployment of societally-beneficial Social-AI, we advocate that future directions include embracing Participatory AI frameworks (Bondi et al., 2021;Birhane et al., 2022;Zhang et al., 2023a), consciously involving a diverse range of stakeholders, to ensure Social-AI researchers prioritize concerns, ethical frameworks, risks, and needs raised by stakeholders. When humans interact with computers, virtual agents, and robots, they impose social norms and expectations on these agents (Nass and Moon, 2000); it is important to directly assess and center human social and functional expectations from Social-AI agents when creating and deploying them (Takayama et al., 2008;Dennler et al., 2022;Olatunji et al., 2024). ...
... Further, research in the field of computer-supported collaborative work provides approaches for the analysis of collaborative settings [30], group composition [21], and group interactions [81]. Recent research on public AI [95] has emphasized the importance of including stakeholders by introducing deliberation early in the design process [39,92,94]. Further, recent work in XAI has begun to consider how explanations for group interactions could be approached, describing that "many-to-one" interactions (multiple people interacting with an explanation) will likely differ enormously from "one-to-one" interactions due to "complexities in group dynamics, cognitive bias amplification, trust issues within the group, and group-centric evaluation" [62]. ...
Preprint
Full-text available
XAI research often focuses on settings where people learn about and assess algorithmic systems individually. However, as more public AI systems are deployed, it becomes essential for XAI to facilitate collective understanding and deliberation. We conducted a task-based interview study involving 8 focus groups and 12 individual interviews to explore how explanations can support AI novices in understanding and forming opinions about AI systems. Participants received a collection of explanations organized into four information categories to solve tasks and decide about a system's deployment. These explanations improved or calibrated participants' self-reported understanding and decision confidence and facilitated group discussions. Participants valued both technical and contextual information and the self-directed and modular explanation structure. Our contributions include an explanation approach that facilitates both individual and collaborative interaction and explanation design recommendations, including active and controllable exploration, different levels of information detail and breadth, and adaptations to the needs of decision subjects.
... Furthermore, beliefs about AI's inner workings, such as objectivity or transparency, shape how decision-makers value and engage in reflexivity. Crucially, a shared understanding of reflexivity is necessary, moving beyond superficial interpretations to encompass a critical examination of assumptions, continuous learning, and adaptation (Zhang et al. 2023). By fostering a culture of reflexivity, organizations can empower decision-makers to navigate the complexities of AI integration in a responsible and beneficial manner. ...
Article
Full-text available
In the emerging literature on artificial intelligence (AI) and leadership, there is increasing recognition of the importance played by advanced technologies in decision-making. AI is viewed as the next frontier to improve decision-making processes and as a result enhance human decision-making in general. However, existing literature lacks studies on how AI, operating as a “warrior” or innovator in business, can in turn enhance leadership reflexivity, and thereby improve decision-making outcomes. This study is aimed at addressing this gap by drawing on the reflexivity perspective and existing research on AI and leadership to examine the integration of the concepts of warrior AI with leadership reflexivity to improve decision-making. The study used a systematic literature review to identify and map articles using specified inclusion and exclusion criteria to achieve this. Selected articles were included for in-depth analysis to address the issue under investigation. The study explored the potential benefits of blending advanced AI with reflective leadership strategies, offering insights into how organizations can optimize their decision-making processes through this innovative approach. A comprehensive literature review was thus the foundation for our investigation into how warrior AI may enhance human decision-making especially under high-stress conditions by providing real-time data analysis capabilities, pattern recognition skills, and predictive simulations. Our work emphasizes how leadership reflexivity plays a critical role in assessing AI-driven recommendations to ensure ethical soundness and contextual appropriateness of the decisions being taken. Based on our findings, we suggest that integrating AI capabilities with reflective leadership practices can lead to more effective and adaptable decision-making frameworks, particularly when swift yet well-informed action is necessary. This study adds to the existing body of knowledge by illustrating that, with the aid of a flow diagram, the integration of warrior AI into the reflective process can potentially amplify the benefits of AI, offering data-driven insights for leaders to reflect upon, thereby reinforcing the decision-making process with a more rigorous, ethical, and nuanced approach in alignment with organizational objectives and societal values. It is recommended that leadership actively engage in discussions regarding ethical AI use, ensuring alignment with organizational values and ethics. Ultimately, this study contributes valuable insights to discussions around AI and leadership by underscoring the significance of maintaining a balanced relationship between machine efficiency and human wisdom.
... The challenge in process mining involves enhancing output clarity and usability for non-expert stakeholders, especially within complex industrial settings [18]. Transforming complex data into understandable insights remains a well-known issue in data science, highlighting the importance of 'analytics translators'-professionals skilled at interpreting data to identify business challenges and convert needs into data-driven inquiries [8], [24]. ...
Preprint
Full-text available
Anomalies in complex industrial processes are often obscured by high variability and complexity of event data, which hinders their identification and interpretation using process mining. To address this problem, we introduce WISE (Weighted Insights for Evaluating Efficiency), a novel method for analyzing business process metrics through the integration of domain knowledge, process mining, and machine learning. The methodology involves defining business goals and establishing Process Norms with weighted constraints at the activity level, incorporating input from domain experts and process analysts. Individual process instances are scored based on these constraints, and the scores are normalized to identify features impacting process goals. Evaluation using the BPIC 2019 dataset and real industrial contexts demonstrates that WISE enhances automation in business process analysis and effectively detects deviations from desired process flows. While LLMs support the analysis, the inclusion of domain experts ensures the accuracy and relevance of the findings.
... Participatory approaches. Driven by calls from civil society organizations, academia, and others, there is growing emphasis on participatory AI governance to make AI/ML design more inclusive and equitable [99,100]. However, there is limited empirical literature on involving stakeholders in refining AI performance. ...
Preprint
Full-text available
This paper examines the governance of multimodal large language models (MM-LLMs) through individual and collective deliberation, focusing on analyses of politically sensitive videos. We conducted a two-step study: first, interviews with 10 journalists established a baseline understanding of expert video interpretation; second, 114 individuals from the general public engaged in deliberation using Inclusive.AI, a platform that facilitates democratic decision-making through decentralized autonomous organization (DAO) mechanisms. Our findings show that while experts emphasized emotion and narrative, the general public prioritized factual clarity, objectivity of the situation, and emotional neutrality. Additionally, we explored the impact of different governance mechanisms: quadratic vs. weighted ranking voting and equal vs. 20-80 power distributions on users decision-making on how AI should behave. Specifically, quadratic voting enhanced perceptions of liberal democracy and political equality, and participants who were more optimistic about AI perceived the voting process to have a higher level of participatory democracy. Our results suggest the potential of applying DAO mechanisms to help democratize AI governance.
... Computer scientists suggest switching to more sophisticated model architectures to reduce multiplicity (e.g., ensemble methods as a way to reduce variance) [7,22,76]. Lay stakeholders will not necessarily like those approaches, as research shows that different types of stakeholders (like decision makers and decision subjects) can have different perceptions of what matters for fairness in ML [57,84,102]. Critical scholars argue that considering stakeholder preferences is important [10,87] and this belief underlies frameworks like participatory ML [31] and value-sensitive design [94]. ...
Preprint
Full-text available
Machine learning (ML) is increasingly used in high-stakes settings, yet multiplicity -- the existence of multiple good models -- means that some predictions are essentially arbitrary. ML researchers and philosophers posit that multiplicity poses a fairness risk, but no studies have investigated whether stakeholders agree. In this work, we conduct a survey to see how the presence of multiplicity impacts lay stakeholders' -- i.e., decision subjects' -- perceptions of ML fairness, and which approaches to address multiplicity they prefer. We investigate how these perceptions are modulated by task characteristics (e.g., stakes and uncertainty). Survey respondents think that multiplicity lowers distributional, but not procedural, fairness, even though existing work suggests the opposite. Participants are strongly against resolving multiplicity by using a single good model (effectively ignoring multiplicity) or by randomizing over possible outcomes. Our results indicate that model developers should be intentional about dealing with multiplicity in order to maintain fairness.
... Furthermore, AI can be integrated into digital platforms to enhance various processes, such as moderating discussions, analysing and summarising proposals through Natural Language Processing (NLP) (Hadfi et al. 2021), and creating visual representations of arguments and counterarguments made during online deliberations (Zhang et al. 2023). NLP can also detect logical fallacies (Sourati et al. 2023), which is particularly beneficial for political debates and deliberative practices by enabling participants to focus on more robust arguments and identify the weaknesses in others' reasoning. ...
Article
Full-text available
This chapter explores the influence of Artificial Intelligence (AI) on digital democracy, focusing on four main areas: citizenship, participation, representation, and the public sphere. It traces the evolution from electronic to virtual and network democracy, underscoring how each stage has broadened democratic engagement through technology. Focusing on digital citizenship, the chapter examines how AI can improve online engagement while posing privacy risks and fostering identity stereotyping. Regarding political participation, it highlights AI's dual role in mobilising civic actions and spreading misinformation. Regarding representation, AI's involvement in electoral processes can enhance voter registration, e-voting, and the efficiency of result tabulation but raises concerns regarding privacy and public trust. Also, AI's predictive capabilities shift the dynamics of political competition, posing ethical questions about manipulation and the legitimacy of democracy. Finally, the chapter examines how integrating AI and digital technologies can facilitate democratic political advocacy and personalised communication. However, this also comes with higher risks of misinformation and targeted propaganda.
... Furthermore, AI can be integrated into digital platforms to enhance various processes, such as moderating discussions, analysing and summarising proposals through Natural Language Processing (NLP) (Hadfi et al. 2021), and creating visual representations of arguments and counterarguments made during online deliberations (Zhang et al. 2023). NLP can also detect logical fallacies (Sourati et al. 2023), which is particularly beneficial for political debates and deliberative practices by enabling participants to focus on more robust arguments and identify the weaknesses in others' reasoning. ...
Preprint
Full-text available
This chapter explores the influence of Artificial Intelligence (AI) on digital democracy, focusing on four main areas: citizenship, participation, representation, and the public sphere. It traces the evolution from electronic to virtual and network democracy, underscoring how each stage has broadened democratic engagement through technology. Focusing on digital citizenship, the chapter examines how AI can improve online engagement while posing privacy risks and fostering identity stereotyping. Regarding political participation, it highlights AI's dual role in mobilising civic actions and spreading misinformation. Regarding representation, AI's involvement in electoral processes can enhance voter registration, e-voting, and the efficiency of result tabulation but raises concerns regarding privacy and public trust. Also, AI's predictive capabilities shift the dynamics of political competition, posing ethical questions about manipulation and the legitimacy of democracy. Finally, the chapter examines how integrating AI and digital technologies can facilitate democratic political advocacy and personalised communication. However, this also comes with higher risks of misinformation and targeted propaganda.
... Responsible tools that enable one to take automatic decisions using ML or method for stakeholders that are not educated to use ML is a particularly difficult activity, even while participatory techniques attempt to create designs using AI/ML that encourage more inclusivity and egalitarian (Zhang et al., 2023). ...
Chapter
Management is an art of getting things done through and with the people in formally organized groups. It is an art of creating an environment in which people can perform and individuals and can co-operate towards attainment of group goals. Management Study HQ describes Management as a set of principles relating to the functions of planning, directing and controlling, and the application of these principles in harnessing physical, financial, human and informational resources efficiently and effectively to achieve organizational goals. A good management is the backbone of all successful organizations. And to assist business and non-business organizations in their quest for excellence, growth and contribution to the economy and society, Management Book Series covers research knowledge that exists in the world in various management sectors of business through peer review chapters. This book series helps company leaders and key decision-makers to have a clear, impartial, and data-driven perspective of how factors will impact the economy moving forward and to know what they should be doing in response.
... Acknowledging the value-based tensions across the many stakeholders of AI tools [21], many of these approaches aim to incorporate stakeholder-specific values and knowledge into the design of algorithms, for example, through expanding participation along the AI design process. Prior work has proposed a myriad of participatory approaches [12] (e.g., value-sensitive algorithm design [59], deliberationbased participatory algorithm design [58]) to solicit and operationalize stakeholders' desires and values into the design of algorithmic systems. Such work has often also surfaced the challenges of actually incorporating multiple, conflicting values into an algorithm. ...
Preprint
Full-text available
As public sector agencies rapidly introduce new AI tools in high-stakes domains like social services, it becomes critical to understand how decisions to adopt these tools are made in practice. We borrow from the anthropological practice to ``study up'' those in positions of power, and reorient our study of public sector AI around those who have the power and responsibility to make decisions about the role that AI tools will play in their agency. Through semi-structured interviews and design activities with 16 agency decision-makers, we examine how decisions about AI design and adoption are influenced by their interactions with and assumptions about other actors within these agencies (e.g., frontline workers and agency leaders), as well as those above (legal systems and contracted companies), and below (impacted communities). By centering these networks of power relations, our findings shed light on how infrastructural, legal, and social factors create barriers and disincentives to the involvement of a broader range of stakeholders in decisions about AI design and adoption. Agency decision-makers desired more practical support for stakeholder involvement around public sector AI to help overcome the knowledge and power differentials they perceived between them and other stakeholders (e.g., frontline workers and impacted community members). Building on these findings, we discuss implications for future research and policy around actualizing participatory AI approaches in public sector contexts.
... Adding VRRSability ontology and epistemology to that mix of GeoAI and LLM technology will likely advance the transparency of decision evaluation trade-offs within geodesign decision problem assessment for ULWS within participatory decision settings. For example, Participatory AI can be used to help professional facilitators improve the way stakeholder participatory and decision-making processes are structured and performed [83]. The amount of LLM AI agents that can support VRRSability is exponentially increasing [84]. ...
Article
Full-text available
Improving geo-information decision evaluation is an important part of geospatial decision support research, particularly when considering vulnerability, risk, resilience, and sustainability (V-R-R-S) of urban land–water systems (ULWSs). Previous research enumerated a collection of V-R-R-S conceptual component commonalties and differences resulting in a synthesis concept called VRRSability. As a single concept, VRRSability enhances our understanding of the relationships within and among V-R-R-S. This paper reports research that extends and deepens the VRRSability synthesis by elucidating relationships among the V-R-R-S concepts, and organizes them into a VRRSability conceptual framework meant to guide operationalization within decision support systems. The core relationship within the VRRSability framework is ‘functional performance’, which couples land and water concerns within complex ULWS. Using functional performance, we elucidate other significant conceptual relationships, e.g., scale, scenarios and social knowledge, among others. A narrative about the functional performance of green stormwater infrastructure as part of a ULWS offers a practical application of the conceptual framework. VRRSability decision evaluation trade-offs among land and water emerge through the narrative, particularly how land cover influences water flow, which in turn influences water quality. The discussion includes trade-offs along risk–resilience and vulnerability–sustainability dimensions as key aspects of functional performance. Conclusions include knowledge contributions about a VRRSability conceptual framework and the next steps for operationalization within decision support systems using artificial intelligence.
... For example, Holstein et al. [32] employed historical data through "Replay Enactments" to conduct feature prototyping, simulating experiences for teachers on technical systems. Zhang et al. [88] also used historical data with focus groups to elicit stakeholder feedback about future organizational decision-making practices. Subramonyam et al. [73] used data probes where designer-engineer teams considered enduser data to surface use cases and outcomes of AI. ...
Preprint
Full-text available
AI technologies continue to advance from digital assistants to assisted decision-making. However, designing AI remains a challenge given its unknown outcomes and uses. One way to expand AI design is by centering stakeholders in the design process. We conduct co-design sessions with gig workers to explore the design of gig worker-centered tools as informed by their driving patterns, decisions, and personal contexts. Using workers' own data as well as city-level data, we create probes -- interactive data visuals -- that participants explore to surface the well-being and positionalities that shape their work strategies. We describe participant insights and corresponding AI design considerations surfaced from data probes about: 1) workers' well-being trade-offs and positionality constraints, 2) factors that impact well-being beyond those in the data probes, and 3) instances of unfair algorithmic management. We discuss the implications for designing data probes and using them to elevate worker-centered AI design as well as for worker advocacy.
Article
As public sector agencies rapidly introduce new AI tools in high-stakes domains like social services, it becomes critical to understand how decisions to adopt these tools are made in practice. We borrow from the anthropological practice to "study up" those in positions of power, and reorient our study of public sector AI around those who have the power and responsibility to make decisions about the role that AI tools will play in their agency. Through semi-structured interviews and design activities with 16 agency decision-makers, we examine how decisions about AI design and adoption are influenced by their interactions with and assumptions about other actors within these agencies (e.g., frontline workers and agency leaders), as well as those above (legal systems and contracted companies), and below (impacted communities). By centering these networks of power relations, our findings shed light on how infrastructural, legal, and social factors create barriers and disincentives to the involvement of a broader range of stakeholders in decisions about AI design and adoption. Agency decision-makers desired more practical support for stakeholder involvement around public sector AI to help overcome the knowledge and power differentials they perceived between them and other stakeholders (e.g., frontline workers and impacted community members). Building on these findings, we discuss implications for future research and policy around actualizing participatory AI approaches in public sector contexts.
Preprint
Full-text available
CONTEXT Within agricultural systems design, there is increasing focus on transformative change, moving beyond productivity as the sole aim and towards systems that are beneficial for both society and nature. With this trend, it has become common in agricultural systems design approaches to include non-economically driven objectives, such as animal welfare and ecological health. However, this selection often fails to move beyond anthropocentric needs and values. Research has indicated the importance of critically analyzing which actors, relations and power dynamics determine how agricultural systems design play out. Simultaneously, there has been a trend towards participatory design approaches that involve relevant actors to enhance the transformative potential of systems. Notably, non-humans, which are foundational to agricultural systems, are largely overlooked as actors to be involved in a design process. OBJECTIVE In this perspective, we embark on considering the non-human dimension of agricultural systems design with the goal of contributing to a just and inclusive transformation of agricultural systems. To do so, we analyze the state-of-affairs in how agricultural systems research is considering non-humans in design processes. METHODS We consult several approaches such as actor-network theory, assemblages, transition studies, indigenous literature, de-colonial literature, and feminist posthumanism. We explore insights into the engagement of non-humans in design, notably regarding the role of (1) agency; (2) temporality; and (3) deliberation of non-humans. RESULTS AND CONCLUSION Based on this discussion, we provide practical steps forward to include non-humans in five design phases: problem definition; system analysis; design requirements; measurements and selecting design solutions. SIGNIFICANCE By opening this dialogue between agricultural systems design and research on non-human actors, our work aims to enhance the transformative potential of agricultural systems research and design by considering non-human actors beyond anthropocentric perspectives.
Chapter
Protecting AI in web applications is necessary. This domain is a composite of technology and huge scope with good prospects and immense difficulties. This chapter covers the landscape of security issues with advancing generative AI techniques for integration into web development frameworks. The initial section is on security in web development—a conversation on the subtleties of generative AI-based methods. In a literal stance, the chapter offers 13 ways to approach it. Among the threats are those that introduce security issues related to generative AI deployments, which illustrate why it is vital for defenders and infrastructure owners to implement mitigation measures proactively. This chapter pertains to the security and privacy of data and lessons for securing and preventing vulnerability. The chapter explores attacks, model poisoning, bias issues, defence mechanisms, and long-term mitigation strategies. Additionally, Service A promotes transparency, explainability, and compliance with applicable laws while structuring a development methodology and deployment methods/operation. The text outlines how to respond and recover from incidents as it provides response frameworks for everyone involved in managing security breaches. Finally, it addresses trends, possible threats, and lessons learned from real-world case studies. In order to contribute to addressing these research needs, this chapter sheds light on the security considerations associated with AI for web development and suggests recommendations that can help researchers, practitioners, and policymakers enhance the security posture of popular generative AI advancements used in generating web applications.
Article
Full-text available
This study aims to enhance marketing strategies in higher education institutions by applying data mining techniques, specifically K-means clustering. The research focuses on Mindanao State University-Lanao del Norte Agricultural College (MSU-LNAC), a tertiary institution in Northern Mindanao, Philippines, with the objective of increasing enrollment. The study utilizes the K-means algorithm to group attributes into different clusters. The clustering analysis provides valuable insights into the characteristics and preferences of the surveyed student population. Based on the findings, recommendations are presented to guide targeted marketing efforts, such as geographic targeting, collaborations with senior high schools, financial assistance programs, and the development of marketing campaigns that emphasize the institution's strengths and advantages. By implementing these recommendations, MSU-LNAC can enhance its recruitment and marketing strategies to attract and retain students effectively.
Article
We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information. In particular, we show that, despite restricting each agent's attention to only its κ-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in κ. In addition, we show the finite-sample convergence of LPI to the global optimal policy, which explicitly captures the trade-off between optimality and computational complexity in choosing κ. Numerical simulations demonstrate the effectiveness of LPI. This extended abstract is an abridged version of [12].
Article
Full-text available
With widespread use of machine learning methods in numerous domains involving humans, several studies have raised questions about the potential for unfairness towards certain individuals or groups. A number of recent works have proposed methods to measure and eliminate unfairness from machine learning models. However, most of this work has focused on only one dimension of fair decision making: distributive fairness, i.e., the fairness of the decision outcomes. In this work, we leverage the rich literature on organizational justice and focus on another dimension of fair decision making: procedural fairness, i.e., the fairness of the decision making process. We propose measures for procedural fairness that consider the input features used in the decision process, and evaluate the moral judgments of humans regarding the use of these features. We operationalize these measures on two real world datasets using human surveys on the Amazon Mechanical Turk (AMT) platform, demonstrating that our measures capture important properties of procedurally fair decision making. We provide fast submodular mechanisms to optimize the tradeoff between procedural fairness and prediction accuracy. On our datasets, we observe empirically that procedural fairness may be achieved with little cost to outcome fairness, but that some loss of accuracy is unavoidable.
Article
Full-text available
Various tools and practices have been developed to support practitioners in identifying, assessing, and mitigating fairness-related harms caused by AI systems. However, prior research has highlighted gaps between the intended design of these tools and practices and their use within particular contexts, including gaps caused by the role that organizational factors play in shaping fairness work. In this paper, we investigate these gaps for one such practice: disaggregated evaluations of AI systems, intended to uncover performance disparities between demographic groups. By conducting semi-structured interviews and structured workshops with thirty-three AI practitioners from ten teams at three technology companies, we identify practitioners' processes, challenges, and needs for support when designing disaggregated evaluations. We find that practitioners face challenges when choosing performance metrics, identifying the most relevant direct stakeholders and demographic groups on which to focus, and collecting datasets with which to conduct disaggregated evaluations. More generally, we identify impacts on fairness work stemming from a lack of engagement with direct stakeholders or domain experts, business imperatives that prioritize customers over marginalized groups, and the drive to deploy AI systems at scale.
Conference Paper
Full-text available
Recent work in fair machine learning has proposed dozens of technical definitions of algorithmic fairness and methods for enforcing these definitions. However, we still lack an understanding of how to develop machine learning systems with fairness criteria that reflect relevant stakeholders' nuanced viewpoints in real-world contexts. To address this gap, we propose a framework for eliciting stakeholders' subjective fairness notions. Combining a user interface that allows stakeholders to examine the data and the algorithm's predictions with an interview protocol to probe stakeholders' thoughts while they are interacting with the interface, we can identify stakeholders' fairness beliefs and principles. We conduct a user study to evaluate our framework in the setting of a child maltreatment predictive system. Our evaluations show that the framework allows stakeholders to comprehensively convey their fairness viewpoints. We also discuss how our results can inform the design of predictive systems.
Article
Full-text available
Vaccines, when available, will likely become our best tool to control the COVID-19 pandemic. Even in the most optimistic scenarios, vaccine shortages will likely occur. Using an age-stratified mathematical model paired with optimization algorithms, we determined optimal vaccine allocation for four different metrics (deaths, symptomatic infections, and maximum non-ICU and ICU hospitalizations) under many scenarios. We find that a vaccine with effectiveness ≥50% would be enough to substantially mitigate the ongoing pandemic, provided that a high percentage of the population is optimally vaccinated. When minimizing deaths, we find that for low vaccine effectiveness, irrespective of vaccination coverage, it is optimal to allocate vaccine to high-risk (older) age groups first. In contrast, for higher vaccine effectiveness, there is a switch to allocate vaccine to high-transmission (younger) age groups first for high vaccination coverage. While there are other societal and ethical considerations, this work can provide an evidence-based rationale for vaccine prioritization.
Article
Full-text available
Manual count of mitotic figures, which is determined in the tumor region with the highest mitotic activity, is a key parameter of most tumor grading schemes. It can be, however, strongly dependent on the area selection due to uneven mitotic figure distribution in the tumor section. We aimed to assess the question, how significantly the area selection could impact the mitotic count, which has a known high inter-rater disagreement. On a data set of 32 whole slide images of H&E-stained canine cutaneous mast cell tumor, fully annotated for mitotic figures, we asked eight veterinary pathologists (five board-certified, three in training) to select a field of interest for the mitotic count. To assess the potential difference on the mitotic count, we compared the mitotic count of the selected regions to the overall distribution on the slide. Additionally, we evaluated three deep learning-based methods for the assessment of highest mitotic density: In one approach, the model would directly try to predict the mitotic count for the presented image patches as a regression task. The second method aims at deriving a segmentation mask for mitotic figures, which is then used to obtain a mitotic density. Finally, we evaluated a two-stage object-detection pipeline based on state-of-the-art architectures to identify individual mitotic figures. We found that the predictions by all models were, on average, better than those of the experts. The two-stage object detector performed best and outperformed most of the human pathologists on the majority of tumor cases. The correlation between the predicted and the ground truth mitotic count was also best for this approach (0.963–0.979). Further, we found considerable differences in position selection between pathologists, which could partially explain the high variance that has been reported for the manual mitotic count. To achieve better inter-rater agreement, we propose to use a computer-based area selection for support of the pathologist in the manual mitotic count.
Research
Full-text available
Based on the findings of a 2015 year-long research project supported by Hobsons, this publication explores current strategies for creating a more diverse graduate student population. It outlines the current state of graduate admissions at U.S. institutions, offers promising practices for graduate institutions seeking to implement holistic admissions processes, and provides an overview of existing resources for institutions.
Article
Full-text available
The increased reliance on algorithmic decision-making in socially impactful processes has intensified the calls for algorithms that are unbiased and procedurally fair. Identifying fair predictors is an essential step in the construction of equitable algorithms, but the lack of ground-truth in fair predictor selection makes this a challenging task. In our study, we recruit 90 crowdworkers to judge the inclusion of various predictors for recidivism. We divide participants across three conditions with varying group composition. Our results show that participants were able to make informed decisions on predictor selection. We find that agreement with the majority vote is higher when participants are part of a more diverse group. The presented workflow, which provides a scalable and practical approach to reach a diverse audience, allows researchers to capture participants' perceptions of fairness in private while simultaneously allowing for structured participant discussion.
Article
Full-text available
As algorithms increasingly take managerial and governance roles, it is ever more important to build them to be perceived as fair and adopted by people. With this goal, we propose a procedural justice framework in algorithmic decision-making drawing from procedural justice theory, which lays out elements that promote a sense of fairness among users. As a case study, we built an interface that leveraged two key elements of the framework-transparency and outcome control-and evaluated it in the context of goods division. Our interface explained the algorithm's allocative fairness properties (standards clarity) and outcomes through an input-output matrix (outcome explanation), then allowed people to interactively adjust the algorithmic allocations as a group (outcome control). The findings from our within-subjects laboratory study suggest that standards clarity alone did not increase perceived fairness; outcome explanation had mixed effects, increasing or decreasing perceived fairness and reducing algorithmic accountability; and outcome control universally improved perceived fairness by allowing people to realize the inherent limitations of decisions and redistribute the goods to better fit their contexts, and by bringing human elements into final decision-making.
Article
Full-text available
A key challenge in developing and deploying Machine Learning (ML) systems is understanding their performance across a wide range of inputs. To address this challenge, we created the What-If Tool, an open-source application that allows practitioners to probe, visualize, and analyze ML systems, with minimal coding. The What-If Tool lets practitioners test performance in hypothetical situations, analyze the importance of different data features, and visualize model behavior across multiple models and subsets of input data. It also lets practitioners measure systems according to multiple ML fairness metrics. We describe the design of the tool, and report on real-life usage at different organizations.
Conference Paper
Full-text available
Breastfeeding is not only a public health issue, but also a matter of economic and social justice. This paper presents an iteration of a participatory design process to create spaces for re-imagining products, services, systems, and policies that support breastfeeding in the United States. Our work contributes to a growing literature around making hackathons more inclusive and accessible, designing participatory processes that center marginalized voices, and incorporating systems- and relationship-based approaches to problem solving. By presenting an honest assessment of the successes and shortcomings of the first iteration of a hackathon, we explain how we re-structured the second "Make the Breast Pump Not Suck" hackathon in service of equity and systems design. Key to our re-imagining of conventional innovation structures is a focus on experience design, where joy and play serve as key strategies to help people and institutions build relationships across lines of difference. We conclude with a discussion of design principles applicable not only to designers of events, but to social movement researchers and HCI scholars trying to address oppression through the design of technologies and socio-technical systems.
Conference Paper
Full-text available
Algorithmic decision-making systems are increasingly being adopted by government public service agencies. Researchers, policy experts, and civil rights groups have all voiced concerns that such systems are being deployed without adequate consideration of potential harms, disparate impacts, and public accountability practices. Yet little is known about the concerns of those most likely to be affected by these systems. We report on workshops conducted to learn about the concerns of affected communities in the context of child welfare services. The workshops involved 83 study participants including families involved in the child welfare system, employees of child welfare agencies, and service providers. Our findings indicate that general distrust in the existing system contributes significantly to low comfort in algorithmic decision-making. We identify strategies for improving comfort through greater transparency and improved communication strategies. We discuss the implications of our study for accountable algorithm design for child welfare applications.
Chapter
Evaluation research examines whether interventions to change the world work, and if so, how and why. Qualitative inquiries serve diverse evaluation purposes. Purpose is the controlling force in determining evaluation use. Decisions about design, data collection, analysis, and reporting all flow from evaluation purpose. Therefore, the first step in an evaluation is clarifying study purpose, intended uses, and intended users. This leads to consideration of how to match qualitative methods to different evaluation purposes and inquiries. Framing qualitative evaluation questions for impact affects fieldwork approaches, units of analysis, and purposeful sample selection.
Article
Crowd workers are human and thus sometimes make mistakes. In order to ensure the highest quality output, requesters often issue redundant jobs with gold test questions and sophisticated aggregation mechanisms based on expectation maximization (EM). While these methods yield accurate results in many cases, they fail on extremely difficult problems with local minima, such as situations where the majority of workers get the answer wrong. Indeed, this has caused some researchers to conclude that on some tasks crowdsourcing can never achieve high accuracies, no matter how many workers are involved. This paper presents a new quality-control workflow, called MicroTalk, that requires some workers to Justify their reasoning and asks others to Reconsider their decisions after reading counter-arguments from workers with opposing views. Experiments on a challenging NLP annotation task with workers from Amazon Mechanical Turk show that (1) argumentation improves the accuracy of individual workers by 20%, (2) restricting consideration to workers with complex explanations improves accuracy even more, and (3) our complete MicroTalk aggregation workflow produces much higher accuracy than simpler voting approaches for a range of budgets.
Book
Best practices for addressing the bias and inequality that may result from the automated collection, analysis, and distribution of large datasets. Human-centered data science is a new interdisciplinary field that draws from human-computer interaction, social science, statistics, and computational techniques. This book, written by founders of the field, introduces best practices for addressing the bias and inequality that may result from the automated collection, analysis, and distribution of very large datasets. It offers a brief and accessible overview of many common statistical and algorithmic data science techniques, explains human-centered approaches to data science problems, and presents practical guidelines and real-world case studies to help readers apply these methods. The authors explain how data scientists' choices are involved at every stage of the data science workflow—and show how a human-centered approach can enhance each one, by making the process more transparent, asking questions, and considering the social context of the data. They describe how tools from social science might be incorporated into data science practices, discuss different types of collaboration, and consider data storytelling through visualization. The book shows that data science practitioners can build rigorous and ethical algorithms and design projects that use cutting-edge computational tools and address social concerns.
Article
In this paper, we developed BreastScreening-AI within two scenarios for the classification of multimodal beast images: (1) Clinician-Only; and (2) Clinician-AI. The novelty relies on the introduction of a deep learning method into a real clinical workflow for medical imaging diagnosis. We attempt to address three high-level goals in the two above scenarios. Concretely, how clinicians: i) accept and interact with these systems, revealing whether are explanations and functionalities required; ii) are receptive to the introduction of AI-assisted systems, by providing benefits from mitigating the clinical error; and iii) are affected by the AI assistance. We conduct an extensive evaluation embracing the following experimental stages: (a) patient selection with different severities, (b) qualitative and quantitative analysis for the chosen patients under the two different scenarios. We address the high-level goals through a real-world case study of 45 clinicians from nine institutions. We compare the diagnostic and observe the superiority of the Clinician-AI scenario, as we obtained a decrease of 27% for False-Positives and 4% for False-Negatives. Through an extensive experimental study, we conclude that the proposed design techniques positively impact the expectations and perceptive satisfaction of 91% clinicians, while decreasing the time-to-diagnose by 3 min per patient.
Article
Algorithms have permeated throughout civil government and society, where they are being used to make high-stakes decisions about human lives. In this paper, we first develop a cohesive framework of algorithmic decision-making adapted for the public sector (ADMAPS) that reflects the complex socio-technical interactions between human discretion, bureaucratic processes, and algorithmic decision-making by synthesizing disparate bodies of work in the fields of Human-Computer Interaction (HCI), Science and Technology Studies (STS), and Public Administration (PA). We then applied the ADMAPS framework to conduct a qualitative analysis of an in-depth, eight-month ethnographic case study of algorithms in daily use within a child-welfare agency that serves approximately 900 families and 1300 children in the mid-western United States. Overall, we found that there is a need to focus on strength-based algorithmic outcomes centered in social ecological frameworks. In addition, algorithmic systems need to support existing bureaucratic processes and augment human discretion, rather than replace it. Finally, collective buy-in in algorithmic systems requires trust in the target outcomes at both the practitioner and bureaucratic levels. As a result of our study, we propose guidelines for the design of high-stakes algorithmic decision-making tools in the child-welfare system, and more generally, in the public sector. We empirically validate the theoretically derived ADMAPS framework to demonstrate how it can be useful for systematically making pragmatic decisions about the design of algorithms for the public sector.
Article
The introduction of machine learning (ML)in organizations comes with the claim that algorithms will produce insights superior to those of experts by discovering the “truth” from data. Such a claim gives rise to a tension between the need to produce knowledge independent of domain experts and the need to remain relevant to the domain the system serves. This two-year ethnographic study focuses on how developers managed this tension when building an ML system to support the process of hiring job candidates at a large international organization. Despite the initial goal of getting domain experts “out the loop,” we found that developers and experts arrived at a new hybrid practice that relied on a combination of ML and domain expertise. We explain this outcome as resulting from a process of mutual learning in which deep engagement with the technology triggered actors to reflect on how they produced knowledge. These reflections prompted the developers to iterate between excluding domain expertise from the ML system and including it. Contrary to common views that imply an opposition between ML and domain expertise, our study foregrounds their interdependence and as such shows the dialectic nature of developing ML. We discuss the theoretical implications of these findings for the literature on information technologies and knowledge work, information system development and implementation, and human–ML hybrids.
Article
With the widespread use of artificial intelligence (AI) systems and applications in our everyday lives, accounting for fairness has gained significant importance in designing and engineering of such systems. AI systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that these decisions do not reflect discriminatory behavior toward certain groups or populations. More recently some work has been developed in traditional machine learning and deep learning that address such challenges in different subdomains. With the commercialization of these systems, researchers are becoming more aware of the biases that these applications can contain and are attempting to address them. In this survey, we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and ways they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.
Article
In this research, we take an HCI perspective on the opportunities provided by AI techniques in medical imaging, focusing on workflow efficiency and quality, preventing errors and variability of diagnosis in Breast Cancer. Starting from a holistic understanding of the clinical context, we developed BreastScreening to support Multimodality and integrate AI techniques (using a deep neural network to support automatic and reliable classification) in the medical diagnosis workflow. This was assessed by using a significant number of clinical settings and radiologists. Here we present: i) user study findings of 45 physicians comprising nine clinical institutions; ii) list of design recommendations for visualization to support breast screening radiomics; iii) evaluation results of a proof-of-concept BreastScreening prototype for two conditions Current (without AI assistant) and AI-Assisted; and iv) evidence from the impact of a Multimodality and AI-Assisted strategy in diagnosing and severity classification of lesions. The above strategies will allow us to conclude about the behaviour of clinicians when an AI module is present in a diagnostic system. This behaviour will have a direct impact in the clinicians workflow that is thoroughly addressed herein. Our results show a high level of acceptance of AI techniques from radiologists and point to a significant reduction of cognitive workload and improvement in diagnosis execution.
Article
Field experiments using fictitious applications have become an increasingly important method for assessing hiring discrimination. Most field experiments of hiring, however, only observe whether the applicant receives an invitation to interview, called the “callback.” How adequate is our understanding of discrimination in the hiring process based on an assessment of discrimination in callbacks, when the ultimate subject of interest is discrimination in job offers? To address this question, we examine evidence from all available field experimental studies of racial or ethnic discrimination in hiring that go to the job offer outcome. Our sample includes 12 studies encompassing more than 13,000 job applications. We find considerable additional discrimination in hiring after the callback: majority applicants in our sample receive 53% more callbacks than comparable minority applicants, but majority applicants receive 145% more job offers than comparable minority applicants. The additional discrimination from interview to job offer is weakly correlated (r = 0.21) with the level of discrimination earlier in the hiring process. We discuss the implications of our results for theories of discrimination, including statistical discrimination.
Article
Ensuring effective public understanding of algorithmic decisions that are powered by machine learning techniques has become an urgent task with the increasing deployment of AI systems into our society. In this work, we present a concrete step toward this goal by redesigning confusion matrices for binary classification to support non-experts in understanding the performance of machine learning models. Through interviews (n=7) and a survey (n=102), we mapped out two major sets of challenges lay people have in understanding standard confusion matrices: the general terminologies and the matrix design. We further identified three sub-challenges regarding the matrix design, namely, confusion about the direction of reading the data, layered relations and quantities involved. We then conducted an online experiment with 483 participants to evaluate how effective a series of alternative representations target each of those challenges in the context of an algorithm for making recidivism predictions. We developed three levels of questions to evaluate users' objective understanding. We assessed the effectiveness of our alternatives for accuracy in answering those questions, completion time, and subjective understanding. Our results suggest that (1) only by contextualizing terminologies can we significantly improve users' understanding and (2) flow charts, which help point out the direction of reading the data, were most useful in improving objective understanding. Our findings set the stage for developing more intuitive and generally understandable representations of the performance of machine learning models.
Article
Participatory Design (PD) is envisioned as an approach to democratizing innovation in the design process by shifting the power dynamics between researcher and participant. Recent scholarship in HCI and design has analyzed the ways collaborative design engagements, such as PD situated in the design workshop can amplify voices and empower underserved populations. Yet, we argue that PD as instantiated in the design workshop is very much an affluent and privileged activity that often neglects the challenges associated with envisioning equitable design solutions among underserved populations. Based on two series of community-based PD workshops with underserved populations in the U.S., we highlight key areas of tension and considerations for a more equitable PD approach: historical context of the research environment, community access, perceptions of materials and activities, and unintentional harm in collecting full accounts of personal narratives. By reflecting on these tensions as a call-to-action, we hope to deconstruct the privilege of the PD workshop within HCI and re-center the focus of design on individuals who are historically underserved.
Article
Expert disagreement is pervasive in clinical decision making and collective adjudication is a useful approach for resolving divergent assessments. Prior work shows that expert disagreement can arise due to diverse factors including expert background, the quality and presentation of data, and guideline clarity. In this work, we study how these factors predict initial discrepancies in the context of medical time series analysis, examining why certain disagreements persist after adjudication, and how adjudication impacts clinical decisions. Results from a case study with 36 experts and 4,543 adjudicated cases in a sleep stage classification task show that these factors contribute to both initial disagreement and resolvability, each in their own unique way. We provide evidence suggesting that structured adjudication can lead to significant revisions in treatment-relevant clinical parameters. Our work demonstrates how structured adjudication can support consensus and facilitate a deep understanding of expert disagreement in medical data analysis.
Article
Fairness is an increasingly important concern as machine learning models are used to support decision making in high-stakes applications such as mortgage lending, hiring, and prison sentencing. This article introduces a new open-source Python toolkit for algorithmic fairness, AI Fairness 360 (AIF360), released under an Apache v2.0 license ( https://github.com/ibm/aif360 ). The main objectives of this toolkit are to help facilitate the transition of fairness research algorithms for use in an industrial setting and to provide a common framework for fairness researchers to share and evaluate algorithms. The package includes a comprehensive set of fairness metrics for datasets and models, explanations for these metrics, and algorithms to mitigate bias in datasets and models. It also includes an interactive Web experience that provides a gentle introduction to the concepts and capabilities for line-of-business users, researchers, and developers to extend the toolkit with their new algorithms and improvements and to use it for performance benchmarking. A built-in testing infrastructure maintains code quality.
Article
Involving stakeholders throughout the creation of new educational technologies can help ensure their usefulness and usability in real-world contexts. However, given the complexity of learning analytics (LA) systems, it can be challenging to meaningfully involve non-technical stakeholders throughout their design and development. This article reports on the iterative co-design, development, and classroom evaluation of Konscia, a wearable, real-time awareness tool for teachers working in AI-enhanced K-12 classrooms. In the process, we argue that the co-design of LA systems requires new kinds of prototyping methods. We introduce one of our own prototyping methods, REs, to address unique challenges of co-prototyping LA tools. This work presents the first end-to-end demonstration of how non-technical stakeholders can participate throughout the whole design process for a complex LA system—from early generative phases to the selection and tuning of analytics to evaluation in real-world contexts. We conclude by providing methodological recommendations for future LA co-design efforts.
Conference Paper
Fairness for Machine Learning has received considerable attention, recently. Various mathematical formulations of fairness have been proposed, and it has been shown that it is impossible to satisfy all of them simultaneously. The literature so far has dealt with these impossibility results by quantifying the tradeoffs between different formulations of fairness. Our work takes a different perspective on this issue. Rather than requiring all notions of fairness to (partially) hold at the same time, we ask which one of them is the most appropriate given the societal domain in which the decision-making model is to be deployed. We take a descriptive approach and set out to identify the notion of fairness that best captures lay people's perception of fairness. We run adaptive experiments designed to pinpoint the most compatible notion of fairness with each participant's choices through a small number of tests. Perhaps surprisingly, we find that the most simplistic mathematical definition of fairness---namely, demographic parity---most closely matches people's idea of fairness in two distinct application scenarios. This conclusion remains intact even when we explicitly tell the participants about the alternative, more complicated definitions of fairness, and we reduce the cognitive burden of evaluating those notions for them. Our findings have important implications for the Fair ML literature and the discourse on formalizing algorithmic fairness.
Conference Paper
Increasingly, algorithms are used to make important decisions across society. However, these algorithms are usually poorly understood, which can reduce transparency and evoke negative emotions. In this research, we seek to learn design principles for explanation interfaces that communicate how decision-making algorithms work, in order to help organizations explain their decisions to stakeholders, or to support users' "right to explanation". We conducted an online experiment where 199 participants used different explanation interfaces to understand an algorithm for making university admissions decisions. We measured users' objective and self-reported understanding of the algorithm. Our results show that both interactive explanations and "white-box" explanations (i.e. that show the inner workings of an algorithm) can improve users' comprehension. Although the interactive approach is more effective at improving comprehension, it comes with a trade-off of taking more time. Surprisingly, we also find that users' trust in algorithmic decisions is not affected by the explanation interface or their level of comprehension of the algorithm.