Conference Paper

Interface Design Optimization as a Multi-Armed Bandit Problem

Authors:
  • Playpower Labs
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

"Multi-armed bandits" offer a new paradigm for the AI-assisted design of user interfaces. To help designers understand the potential, we present the results of two experimental comparisons between bandit algorithms and random assignment. Our studies are intended to show designers how bandits algorithms are able to rapidly explore an experimental design space and automatically select the optimal design configuration. Our present focus is on the optimization of a game design space. The results of our experiments show that bandits can make data-driven design more efficient and accessible to interface designers, but that human participation is essential to ensure that AI systems optimize for the right metric. Based on our results, we introduce several design lessons that help keep human design judgment in the loop. We also consider the future of human-technology teamwork in AI-assisted design and scientific inquiry. Finally, as bandits deploy fewer low-performing conditions than typical experiments, we discuss ethical implications for bandits in large-scale experiments in education.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Adaptive randomization is an effective strategy for assigning more students to the current most optimal condition, while retaining the ability to test other conditions. We use a Multi-Armed Bandit (MAB) algorithm that uses machine learning to increase the number of students assigned to the current most effective condition (or arm) [1], [7]. MAB are commonly used for rapid use of data in different areas such as marketing to optimize the benefits of the users and balance exploration vs. exploitation [1], [3]. ...
... We use a Multi-Armed Bandit (MAB) algorithm that uses machine learning to increase the number of students assigned to the current most effective condition (or arm) [1], [7]. MAB are commonly used for rapid use of data in different areas such as marketing to optimize the benefits of the users and balance exploration vs. exploitation [1], [3]. For this study, we used Thompson Sampling (TS), a probability matching algorithm, where the probability of assignment is proportional to the probability that the arm is optimal [1]. ...
... MAB are commonly used for rapid use of data in different areas such as marketing to optimize the benefits of the users and balance exploration vs. exploitation [1], [3]. For this study, we used Thompson Sampling (TS), a probability matching algorithm, where the probability of assignment is proportional to the probability that the arm is optimal [1]. ...
Preprint
Full-text available
Adaptive experiments can increase the chance that current students obtain better outcomes from a field experiment of an instructional intervention. In such experiments, the probability of assigning students to conditions changes while more data is being collected, so students can be assigned to interventions that are likely to perform better. Digital educational environments lower the barrier to conducting such adaptive experiments, but they are rarely applied in education. One reason might be that researchers have access to few real-world case studies that illustrate the advantages and disadvantages of these experiments in a specific context. We evaluate the effect of homework email reminders in students by conducting an adaptive experiment using the Thompson Sampling algorithm and compare it to a traditional uniform random experiment. We present this as a case study on how to conduct such experiments, and we raise a range of open questions about the conditions under which adaptive randomized experiments may be more or less useful.
... In this sense, interactive reinforcement learning relies on small, user-specific data sets, which contrasts with the large, crowdsourced data sets used in creative applications in semantic editing [25,62,107]. Lastly, interactive approaches to reinforcement learning focuses on exploring agent actions based on human feedback on actions, which contrasts with the focus on optimising one parametric state based on user feedback over states-as used in Bayesian Optimisation [13,67] or multi-armed bandits [68]. ...
... Interactive reinforcement learning has been recently applied in HCI [84], with promising applications in exploratory search [10,44] and adaptive environments [40,80]. Integrating user feedback in reinforcement learning algorithms is computationally feasible [94], helps agents learn better [57], can make data-driven design more accessible [68], and holds potential for rich human-computer collaboration [95]. Applications in Human-Robot Interaction informed on how humans may give feedback to learning agents [98], and showed potential for enabling human-robot co-creativity [36]. ...
... We implemented Sarsa, which is a standard algorithm to learn how to act in many different environment state, i.e., for each given parameter configuration [97]. It differs from multi-armed bandits, which learns how to act in one unique environment state [68]. Importantly, as evoked in Section 1, Sarsa was designed to learn one optimal behaviour in relation to the goal of a task. ...
Article
Software tools for generating digital sound often present users with high-dimensional, parametric interfaces, that may not facilitate exploration of diverse sound designs. In this article, we propose to investigate artificial agents using deep reinforcement learning to explore parameter spaces in partnership with users for sound design. We describe a series of user-centred studies to probe the creative benefits of these agents and adapting their design to exploration. Preliminary studies observing users’ exploration strategies with parametric interfaces and testing different agent exploration behaviours led to the design of a fully-functioning prototype, called Co-Explorer, that we evaluated in a workshop with professional sound designers. We found that the Co-Explorer enables a novel creative workflow centred on human–machine partnership, which has been positively received by practitioners. We also highlight varied user exploration behaviours throughout partnering with our system. Finally, we frame design guidelines for enabling such co-exploration workflow in creative digital applications.
... We implemented Sarsa, which is a standard algorithm to learn how to act in many di erent environment states [Sutton and Barto, 2011]. It di ers from multi-armed bandits, which learns how to act in one unique environment state [Lomas et al., 2016]. Formally, the environment is constituted by the parameters of some sound synthesis engine, and the agent iteratively acts on them. ...
... Interactive reinforcement learning has been recently applied in HCI [Ruotsalo et al., 2014], with promising applications in exploratory search [Glowacka et al., 2013, Athukorala et al., 2016b and adaptive environments [Frenoy et al., 2016, Rajaonarivo et al., 2017. Integrating user feedback in reinforcement learning algorithms is computationally feasible [Stumpf et al., 2007], helps agents learn better [Knox and Stone, 2009], can make data-driven design more accessible [Lomas et al., 2016], and holds potential for rich human-computer collaboration [Stumpf et al., 2009]. Applications in Human-Robot Interaction informed on how humans may give feedback to learning agents [Thomaz and Breazeal, 2008], and showed potential for enabling human-robot co-creativity [Fitzgerald et al., 2017]. ...
... We implemented Sarsa, which is a standard algorithm to learn how to act in many di erent environment state, i.e., for each given parameter con guration [Sutton and Barto, 2011]. It di ers from multi-armed bandits, which learns how to act in one unique environment state [Lomas et al., 2016]. Importantly, as evoked in Section 5.1, Sarsa was designed to learn an optimal behaviour in relation to the goal of a task. ...
Thesis
Music is a cultural and creative practice that enables humans to express a variety of feelings and intentions through sound. Machine learning opens many prospects for designing human expression in interactive music systems. Yet, as a Computer Science discipline, machine learning remains mostly studied from an engineering sciences perspective, which often exclude humans and musical interaction from the loop of the created systems.In this dissertation, I argue in favour of designing with machine learning for interactive music systems. I claim that machine learning must be first and foremost situated in human contexts to be researched and applied to the design of interactive music systems. I present four interdisciplinary studies that support this claim, using human-centred methods and model prototypes to design and apply machine learning to four situated musical tasks: motion-sound mapping, sonic exploration, synthesis exploration, and collective musical interaction.Through these studies, I show that model prototyping helps envision designs of machine learning with human users before engaging in model engineering. I also show that the final human-centred machine learning systems not only helps humans create static musical artifacts, but supports dynamic processes of expression between humans and machines. I call co-expression these processes of musical interaction between humans—who may have an expressive and creative impetus regardless of their expertise—and machines—whose learning abilities may be perceived as expressive by humans.In addition to these studies, I present five applications of the created model prototypes to the design of interactive music systems, which I publicly demonstrated in workshops, exhibitions, installations, and performances. Using a reflexive approach, I argue that the musical contributions enabled by such design practice with machine learning may ultimately complement the scientific contributions of human-centred machine learning. I claim that music research can thus be led through dispositif design, that is, through the technical realization of aesthetically-functioning artifacts that challenge cultural norms on computer science and music.
... For online games, the goal is usually to maximize user engagement, or how long users spend playing the game. Bayesian optimization [11] and multi-arm bandits [15] have been used to tune features of the game, including font-size and how users enter in input, to maximize user engagement. For general user interfaces, Krzysztof et. ...
... The findings of our optimization experiments support those by Lomas et. al. [15]. Their paper emphasized optimizing for the right metric, and while our optimization technique returned layouts with better task performance, the layouts did not show much improvement in terms of aesthetics. ...
... These limitations with our optimization algorithm can be fixed easily by humans, as shown in Figures 6 and 7. Both human judgement and the optimization algorithm can be misleading, so ideally humans and AI would collaborate in the design process [15]. The following illustrates how this hybrid workflow might work with our system. ...
Preprint
Automating parts of the user interface (UI) design process has been a longstanding challenge. We present an automated technique for optimizing the layouts of mobile UIs. Our method uses gradient descent on a neural network model of task performance with respect to the model's inputs to make layout modifications that result in improved predicted error rates and task completion times. We start by extending prior work on neural network based performance prediction to 2-dimensional mobile UIs with an expanded interaction space. We then apply our method to two UIs, including one that the model had not been trained on, to discover layout alternatives with significantly improved predicted performance. Finally, we confirm these predictions experimentally, showing improvements up to 9.2 percent in the optimized layouts. This demonstrates the algorithm's efficacy in improving the task performance of a layout, and its ability to generalize and improve layouts of new interfaces.
... Yet, by 2006 Woodbury and Burrow wrote "Whither Design Space" to lament the lack of research on design spaces; they argued that the concept had exceptional value in computational design and artificial intelligence [73]. Since then, there have been a number of different efforts to systematically consider design spaces in design, both in computer-aided settings [8][39] [40] and non-computational settings [12][41] [42] [37]. Recent work at Aarhus University makes an explicit call for more design space thinking [26]. ...
... Further, measures of novelty, quality and relevance of the generated ideas should be added to the assessment of creativity as a complement to the count of the number of ideas. Finally, by collecting data during product engagement, it may be possible to use behavioral data and artificial intelligence to systematically optimize a design space [40]. ...
Article
The potential space of game designs is astronomically large. This paper shows how game design theories can be translated into a simple, tangible card deck that can assist in the exploration of new game designs within a broader "design space." By translating elements of game design theory into a physical card deck, we enable users to randomly sample a design space in order to synthesize new game design variations for a new play platform ("Lumies"). In a series of iterative design and testing rounds with various user groups, the deck has been optimized to merge relevant game theory elements into a concise card deck with limited categories and clear descriptions. In a small, controlled experiment involving groups of design students, we compare the effects of brainstorming with the card deck or the "Directed Brainstorming" method. We show that the deck does not increase ideation speed but is preferred by participants. We further show that our target audience, children, were able to use the card deck to develop dozens of new game ideas. We conclude that design space cards are a promising way to help adults and children to generate new game ideas by making it easier to explore the game design space.
... alternative interventions to help students learn) are assigned to participants, with the goal of giving higher reward arms to as many participants as possible. For example, recent work has shown bandit algorithms can speed up use of data to help participants in education (Xu et al. 2016;Clement et al., 2015;Williams et al., 2016;Segal et al., 2018), in healthcare (Tewari & Murphy, 2017;Rabbi et al., 2015;Aguilera et al., 2020), and in product design (Li et al., 2010;Chapelle & Li, 2011;Lomas et al., 2016). Yet, these examples are only a tiny fraction of the tens of thousands of experiments where bandit algorithms could be useful, by directing more participants to more effective arms. ...
... In education (the application area of our deployment in Section 3), bandit algorithms have been applied to sequencing educational content like courses ( Xu et al., 2016 ) and lessons ( Clement et al., 2014 ;Clement et al., 2015 ), as well as problem selection ( Segal et al., 2018 ). There have been a few applications of bandit algorithms to adaptive experiments in educational game design ( Lomas et al., 2016 ), evaluating crowdsourced explanations (Williams et al., 2016), and instructional messages , but they have primarily focused on optimizing learning outcomes rather than questions of how best to analyze the data from the experiments. ...
Preprint
Full-text available
Multi-armed bandit algorithms have been argued for decades as useful for adaptively randomized experiments. In such experiments, an algorithm varies which arms (e.g. alternative interventions to help students learn) are assigned to participants, with the goal of assigning higher-reward arms to as many participants as possible. We applied the bandit algorithm Thompson Sampling (TS) to run adaptive experiments in three university classes. Instructors saw great value in trying to rapidly use data to give their students in the experiments better arms (e.g. better explanations of a concept). Our deployment, however, illustrated a major barrier for scientists and practitioners to use such adaptive experiments: a lack of quantifiable insight into how much statistical analysis of specific real-world experiments is impacted (Pallmann et al, 2018; FDA, 2019), compared to traditional uniform random assignment. We therefore use our case study of the ubiquitous two-arm binary reward setting to empirically investigate the impact of using Thompson Sampling instead of uniform random assignment. In this setting, using common statistical hypothesis tests, we show that collecting data with TS can as much as double the False Positive Rate (FPR; incorrectly reporting differences when none exist) and the False Negative Rate (FNR; failing to report differences when they exist)...
... Additionally, Bayesian optimization is used for problems with continuous parameters and an infinite number of potential options. In the context of AUI, such approaches offer a new paradigm for designing UIs in collaboration with Artificial Intelligence and user data [18] [10]. However, they have been proven successful in simple adaptation problems, such as recommendations and the calibration of interface parameters. ...
Preprint
Full-text available
Adapting the user interface (UI) of software systems to meet the needs and preferences of users is a complex task. The main challenge is to provide the appropriate adaptations at the appropriate time to offer value to end-users. Recent advances in Machine Learning (ML) techniques may provide effective means to support the adaptation process. In this paper, we instantiate a reference framework for Intelligent User Interface Adaptation by using Reinforcement Learning (RL) as the ML component to adapt user interfaces and ultimately improving the overall User Experience (UX). By using RL, the system is able to learn from past adaptations to improve the decision-making capabilities. Moreover, assessing the success of such adaptations remains a challenge. To overcome this issue, we propose to use predictive Human-Computer Interaction (HCI) models to evaluate the outcome of each action (ie adaptations) performed by the RL agent. In addition, we present an implementation of the instantiated framework, which is an extension of OpenAI Gym, that serves as a toolkit for developing and comparing RL algorithms. This Gym environment is highly configurable and extensible to other UI adaptation contexts. The evaluation results show that our RL-based framework can successfully train RL agents able to learn how to adapt UIs in a specific context to maximize the user engagement by using an HCI model as rewards predictor.
... We focus on a second proposed usage of MAB algorithms in education: assigning students to a particular version of a technology. For example, non-contextual MAB algorithms have been used to choose among crowdsourced explanations [26] and to explore an extremely large range of interface designs [18]. Some of this work has also considered the implications of collecting experimental data via MAB algorithms on measurement and inference [17,19], showing systematic biases that can impair the drawing of conclusions about the conditions. ...
Preprint
Full-text available
Digital educational technologies offer the potential to customize students' experiences and learn what works for which students, enhancing the technology as more students interact with it. We consider whether and when attempting to discover how to personalize has a cost, such as if the adaptation to personal information can delay the adoption of policies that benefit all students. We explore these issues in the context of using multi-armed bandit (MAB) algorithms to learn a policy for what version of an educational technology to present to each student, varying the relation between student characteristics and outcomes and also whether the algorithm is aware of these characteristics. Through simulations, we demonstrate that the inclusion of student characteristics for personalization can be beneficial when those characteristics are needed to learn the optimal action. In other scenarios, this inclusion decreases performance of the bandit algorithm. Moreover, including unneeded student characteristics can systematically disadvantage students with less common values for these characteristics. Our simulations do however suggest that real-time personalization will be helpful in particular real-world scenarios, and we illustrate this through case studies using existing experimental results in ASSISTments. Overall, our simulations show that adaptive personalization in educational technologies can be a double-edged sword: real-time adaptation improves student experiences in some contexts, but the slower adaptation and potentially discriminatory results mean that a more personalized model is not always beneficial.
... This framework proposes the use of Machine Learning algorithms in order to provide valuable UI adaptations. Several approaches have been proven successful in simple adaptation problems, such as recommendations and the calibration of interface parameters [5], [6]. However, Reinforcement Learning (RL) is more appropriate as regards learning policies for sequences of adaptations in which rewards are not immediately achievable. ...
Conference Paper
Full-text available
Background: Adapting the User Interface (UI) of software systems to user requirements and the context of use is challenging. The main difficulty consists of suggesting the right adaptation at the right time in the right place in order to make it valuable for end-users. We believe that recent progress in Machine Learning techniques provides useful ways in which to support adaptation more effectively. In particular, Reinforcement learning (RL) can be used to personalise interfaces for each context of use in order to improve the user experience (UX). However, determining the reward of each adaptation alternative is a challenge in RL for UI adaptation. Recent research has explored the use of reward models to address this challenge, but there is currently no empirical evidence on this type of model. Objective: In this paper, we propose a confirmatory study design that aims to investigate the effectiveness of two different approaches for the generation of reward models in the context of UI adaptation using RL: (1) by employing a reward model derived exclusively from predictive Human-Computer Interaction (HCI) models (HCI), and (2) by employing predictive HCI models augmented by Human Feedback (HCI&HF). Method: The controlled experiment will use an AB/BA crossover design with two treatments: HCI and HCI&HF. We shall determine how the manipulation of these two treatments will affect the UX when interacting with adaptive user interfaces (AUI). The UX will be measured in terms of user engagement and user satisfaction, which will be operationalized by means of predictive HCI models and the Questionnaire for User Interaction Satisfaction (QUIS), respectively. By comparing the performance of two reward models in terms of their ability to adapt to user preferences with the purpose of improving the UX (i.e. increasing user engagement, improving user satisfaction), our study contributes to the understanding of how reward modeling can facilitate UI adaptation using RL.
... This framework proposes the use of Machine Learning algorithms in order to provide valuable UI adaptations. Several approaches have been proven successful in simple adaptation problems, such as recommendations and the calibration of interface parameters [5], [6]. However, Reinforcement Learning (RL) is more appropriate as regards learning policies for sequences of adaptations in which rewards are not immediately achievable. ...
Preprint
Full-text available
Adapting the User Interface (UI) of software systems to user requirements and the context of use is challenging. The main difficulty consists of suggesting the right adaptation at the right time in the right place in order to make it valuable for end-users. We believe that recent progress in Machine Learning techniques provides useful ways in which to support adaptation more effectively. In particular, Reinforcement learning (RL) can be used to personalise interfaces for each context of use in order to improve the user experience (UX). However, determining the reward of each adaptation alternative is a challenge in RL for UI adaptation. Recent research has explored the use of reward models to address this challenge, but there is currently no empirical evidence on this type of model. In this paper, we propose a confirmatory study design that aims to investigate the effectiveness of two different approaches for the generation of reward models in the context of UI adaptation using RL: (1) by employing a reward model derived exclusively from predictive Human-Computer Interaction (HCI) models (HCI), and (2) by employing predictive HCI models augmented by Human Feedback (HCI&HF). The controlled experiment will use an AB/BA crossover design with two treatments: HCI and HCI&HF. We shall determine how the manipulation of these two treatments will affect the UX when interacting with adaptive user interfaces (AUI). The UX will be measured in terms of user engagement and user satisfaction, which will be operationalized by means of predictive HCI models and the Questionnaire for User Interaction Satisfaction (QUIS), respectively. By comparing the performance of two reward models in terms of their ability to adapt to user preferences with the purpose of improving the UX, our study contributes to the understanding of how reward modelling can facilitate UI adaptation using RL.
... For example, Fitts and Hick's laws can be used to adapt the location and size of elements to minimize selection time [7]. Other practical examples and limitations include the size of the state-action space, and are discussed in [8][9][10][11][12][13]. ...
... Many practitioners have noted that the ATE itself is not a quantity of interest in several applications, e.g., when optimizing tail performance, and have begun to develop approaches using quantile metrics Howard and Ramdas, 2019;Lux, 2018). Multi-armed bandits have been used to handle multiple treatments in online settings, with a focus on sequential decision-making and exposing more users to successful variants to increase reward (Liu et al., 2014;Issa Mattos et al., 2019;Birkett, 2019;Amadio, 2020;Lomas et al., 2016). Thompson sampling (Scott, 2010;Scott, 2015;Dimakopoulou et al., 2021) as well as contextual bandits (Li et al., 2010;Agarwal et al., 2016) have all been used in industry. ...
Preprint
Full-text available
The rise of internet-based services and products in the late 1990's brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking.com, Alphabet's Google, LinkedIn, Lyft, Meta's Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have invested tremendous resources in online controlled experiments (OCEs) to assess the impact of innovation on their customers and businesses. Running OCEs at scale has presented a host of challenges requiring solutions from many domains. In this paper we review challenges that require new statistical methodologies to address them. In particular, we discuss the practice and culture of online experimentation, as well as its statistics literature, placing the current methodologies within their relevant statistical lineages and providing illustrative examples of OCE applications. Our goal is to raise academic statisticians' awareness of these new research opportunities to increase collaboration between academia and the online industry.
... The Bayes theorem updates the expectation, given a new observation and prior data. Related work leverages this approach for AUIs [42,44,54]. Bayesian optimization is a sample-efficient global optimization method that finds optimal solutions in multi-dimensional spaces by probing a black box function [73]. ...
Preprint
Full-text available
The goal of Adaptive UIs is to automatically change an interface so that the UI better supports users in their tasks. A core challenge is to infer user intent from user input and chose adaptations accordingly. Designing effective online UI adaptations is challenging because it relies on tediously hand-crafted rules or carefully collected, high-quality user data. To overcome these challenges, we formulate UI adaptation as a multi-agent reinforcement learning problem. In our formulation, a user agent learns to interact with a UI to complete a task. Simultaneously, an interface agent learns UI adaptations to maximize the user agent's performance. The interface agent is agnostic to the goal. It learns the task structure from the behavior of the user agent and based on that can support the user agent in completing its task. We show that our approach leads to a significant reduction in necessary number of actions on a photo editing task in silico. Furthermore, our user studies demonstrate the generalization capabilities of our interface agent from a simulated user agent to real users.
... Similarly, adaptive experimentation is used to assign participants to the most effective current condition while keeping the ability to test the other conditions [12]. Using adaptive experimentation in education can help explore various conditions but also direct more students to more useful ones in a randomized experiment [6,11,12]. ...
Preprint
Full-text available
Conducting randomized experiments in education settings raises the question of how we can use machine learning techniques to improve educational interventions. Using Multi-Armed Bandits (MAB) algorithms like Thompson Sampling (TS) in adaptive experiments can increase students' chances of obtaining better outcomes by increasing the probability of assignment to the most optimal condition (arm), even before an intervention completes. This is an advantage over traditional A/B testing, which may allocate an equal number of students to both optimal and non-optimal conditions. The problem is the exploration-exploitation trade-off. Even though adaptive policies aim to collect enough information to allocate more students to better arms reliably, past work shows that this may not be enough exploration to draw reliable conclusions about whether arms differ. Hence, it is of interest to provide additional uniform random (UR) exploration throughout the experiment. This paper shows a real-world adaptive experiment on how students engage with instructors' weekly email reminders to build their time management habits. Our metric of interest is open email rates which tracks the arms represented by different subject lines. These are delivered following different allocation algorithms: UR, TS, and what we identified as TS{\dag} - which combines both TS and UR rewards to update its priors. We highlight problems with these adaptive algorithms - such as possible exploitation of an arm when there is no significant difference - and address their causes and consequences. Future directions includes studying situations where the early choice of the optimal arm is not ideal and how adaptive algorithms can address them.
... Measures of resonance can play a valuable role in the AI optimization of human experiences; i.e., learning to attune to humans through the maximization of resonance. If resonance can be adequately measured and treated as an metric or objective function, then it might be optimized algorithmically (Lomas et al., 2016). For instance, if interpersonal resonance during a videoconference session could be measured, it could be optimized through the iterative testing of different interventions. ...
Article
Full-text available
Resonance, a powerful and pervasive phenomenon, appears to play a major role in human interactions. This article investigates the relationship between the physical mechanism of resonance and the human experience of resonance, and considers possibilities for enhancing the experience of resonance within human–robot interactions. We first introduce resonance as a widespread cultural and scientific metaphor. Then, we review the nature of “sympathetic resonance” as a physical mechanism. Following this introduction, the remainder of the article is organized in two parts. In part one, we review the role of resonance (including synchronization and rhythmic entrainment) in human cognition and social interactions. Then, in part two, we review resonance-related phenomena in robotics and artificial intelligence (AI). These two reviews serve as ground for the introduction of a design strategy and combinatorial design space for shaping resonant interactions with robots and AI. We conclude by posing hypotheses and research questions for future empirical studies and discuss a range of ethical and aesthetic issues associated with resonance in human–robot interactions.
... This process has been extensively applied to HCI design tasks, for example in MenuOptimizer [3] where the designer is assisted during the task of combinatorial optimization of menus, and DesignScape [46] where layout suggestions for position, scale, and alignment of elements are interactively suggested to the designer. Other design tools that have a human-in-the-loop aspect include Sketchplore [52] where real-time design optimization is integrated into a sketching tool; Forte [9], in which designers can directly iterate on fabrication shape design through topology optimization; in Kapoor et al. [32], where the behavior of classification systems can be iteratively refined by designers to support more intuitive behavior; and in Lomas et al. [41], where the arrangement of game elements is iteratively adjusted for increased user performance. Overall, these tools all feature the central aspect of human interaction where the human actively participates during the optimization process to generate better designs. ...
Preprint
Full-text available
Designers reportedly struggle with design optimization tasks where they are asked to find a combination of design parameters that maximizes a given set of objectives. In HCI, design optimization problems are often exceedingly complex, involving multiple objectives and expensive empirical evaluations. Model-based computational design algorithms assist designers by generating design examples during design, however they assume a model of the interaction domain. Black box methods for assistance, on the other hand, can work with any design problem. However, virtually all empirical studies of this human-in-the-loop approach have been carried out by either researchers or end-users. The question stands out if such methods can help designers in realistic tasks. In this paper, we study Bayesian optimization as an algorithmic method to guide the design optimization process. It operates by proposing to a designer which design candidate to try next, given previous observations. We report observations from a comparative study with 40 novice designers who were tasked to optimize a complex 3D touch interaction technique. The optimizer helped designers explore larger proportions of the design space and arrive at a better solution, however they reported lower agency and expressiveness. Designers guided by an optimizer reported lower mental effort but also felt less creative and less in charge of the progress. We conclude that human-in-the-loop optimization can support novice designers in cases where agency is not critical.
... If the arms being compared differ in how beneficial they are to participants, then one might wish to direct more participants to more effective arms. Adaptive algorithms such as those for solving multi-armed bandit problems offer one potential way to do so (Lomas et al., 2016;Williams et al., 2018). These algorithms vary the probability that a participant will be assigned to an arm based on the effectiveness of the arms for previous participants. ...
Preprint
Full-text available
Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments, in which maximizing reward means that data is used to progressively assign more participants to more effective arms. Such assignment strategies increase the risk of statistical hypothesis tests identifying a difference between arms when there is not one, and failing to conclude there is a difference in arms when there truly is one. We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis, with the benefits of reward maximization achieved by Thompson Sampling (TS). First, Top-Two Thompson Sampling adds a fixed amount of uniform random allocation (UR) spread evenly over time. Second, a novel heuristic algorithm, called TS PostDiff (Posterior Probability of Difference). TS PostDiff takes a Bayesian approach to mixing TS and UR: the probability a participant is assigned using UR allocation is the posterior probability that the difference between two arms is `small' (below a certain threshold), allowing for more UR exploration when there is little or no reward to be gained. We find that TS PostDiff method performs well across multiple effect sizes, and thus does not require tuning based on a guess for the true effect size.
... Clickstream data shows us the sequence of activities that learners went through while being on the platform. Gaining insights from such data could help us identify different learner behaviors [16,6]; improve the platform design based on learner's interactions on the platform [18,9]; and provide adaptive content to individual learners to better support their learning [7,8]. To gain insights from sequence data a few techniques have been used in the education data mining community such as Association Rule Mining [5], Sequential Pattern Mining [19], Process Mining [17], Graph-Based Analysis [10], and Curriculum Pacing [12]. ...
Poster
Full-text available
In this paper, we're going to describe the core features of the R package seqClustR [14] dedicated to sequence clustering. Sequence clustering is a data mining technique that groups similar sequences into clusters based on their similarities. Sequence clustering is useful when there are unknown number of similar sequences that need to be identified to gain valuable insights. The main feature of this package is that it provides easy access to different algorithms such as Edit Distance with Hierarchical Clustering, Markov Model-Based Clustering, Dynamic Time Warping, and K-Means to perform sequence clustering. We find that different algorithms can create very different clusters and lead us to very different conclusions. So to get a reliable understanding of the sequences, we need to apply various sequence clustering algorithms and explore the data from multiple points of view. This paper illustrates how you could create different clusters from different algorithms, extract event log data for each cluster and visualize them. We have provided an example in section 4 showing step by step procedure to run sequence clustering on the National Assessment of Educational Progress (NAEP) dataset.
... Bandit systems are one of the most successful probabilistic approaches to this problem, not only for recommendation systems but also for interface design and adaptation [38]. Each adaptation is modelled as an 'arm' associated with a distribution describing expected gains. ...
Preprint
Adapting an interface requires taking into account both the positive and negative effects that changes may have on the user. A carelessly picked adaptation may impose high costs to the user -- for example, due to surprise or relearning effort -- or "trap" the process to a suboptimal design immaturely. However, effects on users are hard to predict as they depend on factors that are latent and evolve over the course of interaction. We propose a novel approach for adaptive user interfaces that yields a conservative adaptation policy: It finds beneficial changes when there are such and avoids changes when there are none. Our model-based reinforcement learning method plans sequences of adaptations and consults predictive HCI models to estimate their effects. We present empirical and simulation results from the case of adaptive menus, showing that the method outperforms both a non-adaptive and a frequency-based policy.
... There are also some recent studies to predict touchscreens tappability [23] and accessibility [9], but these methods usually make predictions based on existing screens, without investigating how to help designers optimize and generate design solutions. In recent years, there are also some works to utilize the machine learning and deep learning to model and predict human performance in performing a sequence of user interface tasks such as menu item selection [15], game engagement [13,16] and task completion time [7]. To the best of our knowledge, there is no existing work to utilize machine learning to assist the user interface design of mobile App with collective learning. ...
Preprint
A mobile app interface usually consists of a set of user interface modules. How to properly design these user interface modules is vital to achieving user satisfaction for a mobile app. However, there are few methods to determine design variables for user interface modules except for relying on the judgment of designers. Usually, a laborious post-processing step is necessary to verify the key change of each design variable. Therefore, there is a only very limited amount of design solutions that can be tested. It is timeconsuming and almost impossible to figure out the best design solutions as there are many modules. To this end, we introduce FEELER, a framework to fast and intelligently explore design solutions of user interface modules with a collective machine learning approach. FEELER can help designers quantitatively measure the preference score of different design solutions, aiming to facilitate the designers to conveniently and quickly adjust user interface module. We conducted extensive experimental evaluations on two real-life datasets to demonstrate its applicability in real-life cases of user interface module design in the Baidu App, which is one of the most popular mobile apps in China.
... In this case study, an online educational game (Battleship Numberline) was designed with the goal of motivating students to practice number line estimation math problems [15]. Following its deployment online, the game attracted several thousand students per day. ...
Conference Paper
Full-text available
Human-centred design (HCD) is a powerful methodology that might play an important role in the development of real-world intelligent systems. However, present conceptualisations of artificial intelligence (AI) tend to emphasise autonomous, algorithmic systems. If humans are not involved in AI system design, what role can HCD play? This paper considers perspectives that reframe the role of AI in smart systems design, with the intention of creating space for human-centred design methodologies. These perspectives naturally give rise to opportunities for HCD by considering human and artificial intelligence in tandem. Informed by cybernetic theory, we define smart systems as "the use of outcome data to inform successful system action". To illustrate the practicality of this view, we share three case studies, each representing a different smart system configuration: artificial intelligence, human intelligence and combined artificial-human intelligence. We describe Battleship Numberline, an educational game with autonomous artificial intelligence. We then describe Zensus, a smart system for health and well-being that leverages human intelligence alone. Finally, we describe FactFlow, educational software that combines artificial and human intelligence. By examining the cybernetic feedback loops observed in these systems, we contribute a practical framework for the use of human-centred design methodology in smart systems design. This framework is intended as both a generative tool for designers and a basis for future research in the field of smart systems.
... They propose a model for the context of use and they adapt the UI with a combinatorial optimization algorithm (an example is shown in Figure 5). There are also works aiming at optimizing interfaces for educational games [14]. They benefit from an on-line game platform where a great number of children play, allowing to do statistics. ...
Chapter
Full-text available
When a user interface (UI) is displayed on a screen, parameters can be set to make it more readable to the user: font size and type, colors, brightness, widgets, etc. The optimal settings are specific to each user. For example, dark backgrounds are better for many visually impaired people who are dazzled. Adjusting the settings may be time-consuming and inefficient because of the user subjectivity. The proposed approach optimizes them automatically by using a measure of the reading performance. After a survey of existing set-ups for optimizing UIs, a new system composed of a microphone with voice recognition, and an optimization algorithm to perform reinforcement learning (RL), will be proposed. The user reads aloud a text displayed through the UI, and the feedback adaptation signals are the reading performance criteria. The UI parameters are modified while the user is reading, until an optimum is reached.
Preprint
Full-text available
We introduce Dynamic Information Sub-Selection (DISS), a novel framework of AI assistance designed to enhance the performance of black-box decision-makers by tailoring their information processing on a per-instance basis. Blackbox decision-makers (e.g., humans or real-time systems) often face challenges in processing all possible information at hand (e.g., due to cognitive biases or resource constraints), which can degrade decision efficacy. DISS addresses these challenges through policies that dynamically select the most effective features and options to forward to the black-box decision-maker for prediction. We develop a scalable frequentist data acquisition strategy and a decision-maker mimicking technique for enhanced budget efficiency. We explore several impactful applications of DISS, including biased decision-maker support, expert assignment optimization, large language model decision support, and interpretability. Empirical validation of our proposed DISS methodology shows superior performance to state-of-the-art methods across various applications.
Article
As the number of selectable items increases, point-and-click interfaces rapidly become complex, leading to a decrease in usability. Adaptive user interfaces can reduce this complexity by automatically adjusting an interface to only display the most relevant items. A core challenge for developing adaptive interfaces is to infer user intent and chose adaptations accordingly. Current methods rely on tediously hand-crafted rules or carefully collected user data. Furthermore, heuristics need to be recrafted and data regathered for every new task and interface. To address this issue, we formulate interface adaptation as a multi-agent reinforcement learning problem. Our approach learns adaptation policies without relying on heuristics or real user data, facilitating the development of adaptive interfaces across various tasks with minimal adjustments needed. In our formulation, a user agent mimics a real user and learns to interact with an interface via point-and-click actions. Simultaneously, an interface agent learns interface adaptations, to maximize the user agent's efficiency, by observing the user agent's behavior. For our evaluation, we substituted the simulated user agent with actual users. Our study involved twelve participants and concentrated on automatic toolbar item assignment. The results show that the policies we developed in simulation effectively apply to real users. These users were able to complete tasks with fewer actions and in similar times compared to methods trained with real data. Additionally, we demonstrated our method's efficiency and generalizability across four different interfaces and tasks.
Article
Computational methods can potentially facilitate user interface design by complementing designer intuition, prior experience, and personal preference. Framing a user interface design task as a multi-objective optimization problem can help with operationalizing and structuring this process at the expense of designer agency and experience. While offering a systematic means of exploring the design space, the optimization process cannot typically leverage the designer’s expertise in quickly identifying that a given ‘bad’ design is not worth evaluating. We here examine a cooperative approach where both the designer and optimization process share a common goal, and work in partnership by establishing a shared understanding of the design space. We tackle the research question: how can we foster cooperation between the designer and a systematic optimization process in order to best leverage their combined strength? We introduce and present an evaluation of a cooperative approach that allows the user to express their design insight and work in concert with a multi-objective design process. We find that the cooperative approach successfully encourages designers to explore more widely in the design space than when they are working without assistance from an optimization process. The cooperative approach also delivers design outcomes that are comparable to an optimization process run without any direct designer input, but achieves this with greater efficiency and substantially higher designer engagement levels.
Conference Paper
Ethical deliberation has proved a consistent feature of Human-Computer Interaction (HCI) since its earliest years, spanning the respectful involvement of research participants to design choices impacting fairness, freedom and welfare. Despite growing discussions, applied knowledge and practical approaches for navigating complex moral dilemmas remain challenging to grasp. Motivated by the need for a structured overview, this paper contributes a scoping review of ethics as discussed across 129 full-length SIGCHI papers containing the search term ‘ethic*’in their title, abstract or authors’ keywords over the last ten years. Findings show increasing prioritisation of the topic, particularly within Artificial Intelligence. Value-Sensitive and Critical Design appear as the most frequently applied orientations, and participatory approaches are more prevalent than those without end-user input. Engaging with a spectrum from personal to societal concerns, the SIGCHI literature thus echos calls for critical perspectives on user-centred processes and the need to establish more sustainable responsibility structures.
Book
Full-text available
Editorial The RSD10 symposium was held at the faculty of Industrial Design Engineering, Delft University of Technology, 2nd-6th November 2021. After a successful (yet unforeseen) online version of the RSD 9 symposium, RSD10 was designed as a hybrid conference. How can we facilitate the physical encounters that inspire our work, yet ensure a global easy access for joining the conference, while dealing well with the ongoing uncertainties of the global COVID pandemic at the same time? In hindsight, the theme of RSD10 could not have been a better fit with the conditions in which it had to be organized: “Playing with Tensions: Embracing new complexity, collaboration and contexts in systemic design”. Playing with Tensions Complex systems do not lend themselves for simplification. Systemic designers have no choice but to embrace complexity, and in doing so, embrace opposing concepts and the resulting paradoxes. It is at the interplay of these ideas that they find the most fruitful regions of exploration. The main conference theme explored design and systems thinking practices as mediators to deal fruitfully with tensions. Our human tendency is to relieve the tensions, and in design, to resolve the so-called “pain points.” But tensions reveal paradoxes, the sites of connection, breaks in scale, emergence of complexity. Can we embrace the tension and paradoxes as valuable social feedback in our path to just and sustainable futures? The symposium took off with two days of well-attended workshops on campus and online. One could sense tensions through embodied experiences in one of the workshops, while reframing systemic paradoxes as fruitful design starting points in another. In the tradition of RSD, a Gigamap Exhibition was organized. The exhibition showcased mind-blowing visuals that reveal the tension between our own desire for order and structure and our desire to capture real-life dynamics and contradicting perspectives. Many of us enjoyed the high quality and diversity in the keynotes throughout the symposium. As chair of the SDA, Dr. Silvia Barbero opened in her keynote with a reflection on the start and impressive evolution of the Relating Systems thinking and Design symposia. Prof.Dr. Derk Loorbach showed us how transition research conceptualizes shifts in societal systems and gave us a glimpse into their efforts to foster desired ones. Prof.Dr. Elisa Giaccardi took us along a journey of technologically mediated agency. She advocated for a radical shift in design to deal with this complex web of relationships between things and humans. Indy Johar talked about the need to reimagine our relationship with the world as one based on fundamental interdependence. And finally, Prof.Dr. Klaus Krippendorf systematically unpacked the systemic consequences of design decisions. Together these keynote speakers provided important insights into the role of design in embracing systemic complexity, from the micro-scale of our material contexts to the macro-scale of globally connected societies. And of course, RSD10 would not be an RSD symposium if it did not offer a place to connect around practical case examples and discuss how knowledge could improve practice and how practice could inform and guide research. Proceedings RSD10 has been the first symposium in which contributors were asked to submit a full paper: either a short one that presented work-in-progress, or a long one presenting finished work. With the help of an excellent list of reviewers, this set-up allowed us to shape a symposium that offered stage for high-quality research, providing a platform for critical and fruitful conversations. Short papers were combined around a research approach or methodology, aiming for peer-learning on how to increase the rigour and relevance of our studies. Long papers were combined around commonalities in the phenomena under study, offering state-of-the-art research. The moderation of engaged and knowledgeable chairs and audience lifted the quality of our discussions. In total, these proceedings cover 33 short papers and 19 long papers from all over the world. From India to the United States, and Australia to Italy. In the table of contents, each paper is represented under its RSD 10 symposium track as well as a list of authors ordered alphabetically. The RSD10 proceedings capture the great variety of high-quality papers yet is limited to only textual contributions. We invite any reader to visit the rsdsymposium.org website to browse through slide-decks, video recordings, drawing notes and the exhibition to get the full experience of RSD10 and witness how great minds and insights have been beautifully captured! Word of thanks Let us close off with a word of thanks to our dean and colleagues for supporting us in hosting this conference, the SDA for their trust and guidance, Dr. Peter Jones and Dr. Silvia Barbero for being part of the RSD10 scientific committee, but especially everyone who contributed to the content of the symposium: workshop moderators, presenters, and anyone who participated in the RSD 10 conversation. It is only in this complex web of (friction-full) relationships that we can further our knowledge on systemic design: thanks for being part of it! Dr. JC Diehl, Dr. Nynke Tromp, and Dr. Mieke van der Bijl-Brouwer Editors RSD10
Article
Full-text available
Although the idea of learning engineering dates back to the 1960s, there has been an explosion of interest in the area in the last decade. This interest has been driven by an expansion in the computational methods available both for scaled data analysis and for much faster experimentation and iteration on student learning experiences. This article describes the findings of a virtual convening brought together to discuss the potential of learning engineering and the key opportunities available for learning engineering over the next decades. We focus the many possibilities into ten key opportunities for the field, which in turn group into three broad areas of opportunity. We discuss the state of the current art in these ten opportunities and key points of leverage. In these cases, a relatively modest shift in the field's priorities and work may have an outsized impact.
Chapter
Adaptive experiments can increase the chance that current students obtain better outcomes from a field experiment of an instructional intervention. In such experiments, the probability of assigning students to conditions changes while more data is being collected, so students can be assigned to interventions that are likely to perform better. Digital educational environments lower the barrier to conducting such adaptive experiments, but they are rarely applied in education. One reason might be that researchers have access to few real-world case studies that illustrate the advantages and disadvantages of these experiments in a specific context. We evaluate the effect of homework email reminders in students by conducting an adaptive experiment using the Thompson Sampling algorithm and compare it to a traditional uniform random experiment. We present this as a case study on how to conduct such experiments, and we raise a range of open questions about the conditions under which adaptive randomized experiments may be more or less useful.
Chapter
A detailed illustration of how large scale digital learning systems can incrementally reduce the poverty-achievement gap
Chapter
Full-text available
Learning to read is one of the most important achievements of early childhood, and sets the stage for future success. Even prior to school entry, children’s foundational literacy skills predict their later academic trajectories (Duncan et al., 2007; La Paro & Pianta, 2000; Lloyd, 1969; Lloyd, 1978). Children learn to read with differing levels of ease, with an estimated 5-17% of school-age children who struggle with reading acquisition (Shaywitz, 1998). The individual variation in children’s reading skills can be predicted by genetic, environmental, academic and socio-demographic factors (for review, see Peterson & Pennington, 2015). This chapter focuses on the relationship between reading development and socioeconomic status (SES), with attention to both cognitive outcomes and neural mechanisms. First, we describe SES and its relation to academic achievement in general, and reading development in particular. Second, we examine environmental factors that can potentially give rise to socioeconomic disparities in reading, such as early language/literacy exposure and access to books. Next, we explore the link between SES and reading disability (RD), including a focus on intervention approaches and treatment response. Finally, we summarize remaining questions and propose future research priorities.
Book
Full-text available
Durante las últimas dos décadas, la investigación en el área de estudio de la pobreza ha comenzado a aportar evidencia que constituye un avance en la comprensión de cómo la adversidad temprana asociada a privaciones materiales, sociales y culturales modula el desarrollo cerebral. Cuando tal evidencia es utiliada en otros contextos disciplinares, con frecuencia se verifican referencias al desarrollo cerebral temprano como a un factor predictor de conductas adaptativas y de productividad económica durante la vida adulta; o de la imposibilidad de tales logros por la supuesta inmutabilidad de los impactos negativos a largo plazo de la pobreza infantil. Este tipo de afirmaciones, que tienen implicaciones no solo científicas sino también políticas, requieren ser analizadas de manera adecuada a la luz de la evidencia disponible, ya que podrían inducir conceptos erróneos y sobre-eneralizaciones que tienen la potencialidad de afectar los criterios para la inversion, el diseño, la implementación y la evaluación de acciones en el ámbito de la infancia temprana. Este libro se propone como un aporte en esta dirección. Los distintos capítulos, a cargo de referentes destacados del estudio neurocientífico y cognitivo de la pobreza, aportan evidencia que alimenta hipótesis y reflexiones en línea con las principales preguntas de tal área de estudio.
Chapter
The year 2018 was the moment of increasing awareness of artificial intelligence (AI) in every imaginable domain. This article examines the question have AI technologies and their related hype already affects interaction design practice in the Western industry. It looks at the potential impact in tools, processes and products of interaction design. It highlights three potentially significant themes which stand out of the hype: 1) AI utilization as a technology push 2) generative AI, and 3) the ethics of AI. Each theme is analyzed to derive the claim that for now, AI has little specific impact on interaction design. On a closer observation, generative AI is one the themes that promises the biggest change in the longer run but is simultaneously an elusive hope, which may never lead to anything but AI-augmented creativity support tools, some of which already exist. The article describes the AI hype is partially positive, partially problematic for the advancement of design goals. The hype contributes positively by helping to surface healthy ethical debates that give more credence to designers’ long-term attempts to be user-centered and ethical. This ethical discussion tends to neglect the fact that the ethical burden bestowed upon algorithms is not a new one but was going on long before the big data fuzz. It is evidently now more critical than ever. This articles’ intention is to inform the academic researchers working on human-computer interaction and interaction design about the need to continue developing human-centered AI for both general audiences and specifically for interaction designers. There are already promising examples of HCI research that show potential to affect design practice. Common to all is that they do not attempt to roll out a full generative AI solution but rather support human designers in iterative and somewhat well-constrained tasks. Overall it seems that by 2020, interaction designers will be increasingly working on technology that involves AI elements, but those elements itself will not yet be core of how their work is carried out. Awareness of the ethical challenges and technical affordances related to AI technology and its utilization will become more important than the integration of AI into the interaction design process in the short term. By the end of 2020’s, we maybe yet looking into a very different reality from now.
Article
Full-text available
Significance We show, via a massive ( N = 689,003) experiment on Facebook, that emotional states can be transferred to others via emotional contagion, leading people to experience the same emotions without their awareness. We provide experimental evidence that emotional contagion occurs without direct interaction between people (exposure to a friend expressing an emotion is sufficient), and in the complete absence of nonverbal cues.
Conference Paper
Full-text available
Traditional experimental paradigms have focused on executing experiments in a lab setting and eventually moving successful findings to larger experiments in the field. However, data from field experiments can also be used to inform new lab experiments. Now, with the advent of large student populations using internet-based learning software, online experiments can serve as a third setting for experimental data collection. In this paper, we introduce the Super Experiment Framework (SEF), which describes how internet-scale experiments can inform and be informed by classroom and lab experiments. We apply the framework to a research project implementing learning games for mathematics that is collecting hundreds of thousands of data trials weekly. We show that the framework allows findings from the lab-scale, classroom-scale and internet-scale experiments to inform each other in a rapid complementary feedback loop.
Article
Full-text available
School-researcher partnerships and large in vivo experiments help focus on useful, effective, instruction.
Conference Paper
Full-text available
User-oriented research in the game industry is undergoing a change from relying on informal user-testing methods adapted directly from productivity software development to integrating modern approaches to usability-and user experience testing. Gameplay metrics analysis form one of these techniques, being based on instrumentation methods in HCI. Gameplay metrics are instrumentation data about the user behavior and user-game interaction, and can be collected during testing, production and the live period of the lifetime of a digital game. The use of instrumentation data is relatively new to commercial game development, and remains a relatively unexplored method of user research. In this paper, the focus is on utilizing game metrics for informing the analysis of gameplay during commercial game production as well as in research contexts. A series of case studies are presented, focusing on the major commercial game titles Kane & Lynch and Fragile Alliance.
Conference Paper
Full-text available
Online controlled experiments are often utilized to make data-driven decisions at Amazon, Microsoft, eBay, Facebook, Google, Yahoo, Zynga, and at many other companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher's experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and mining of online controlled experiments at scale—thousands of experiments now—has taught us many lessons. These exemplify the proverb that the difference between theory and practice is greater in practice than in theory. We present our learnings as they happened: puzzling outcomes of controlled experiments that we analyzed deeply to understand and explain. Each of these took multiple-person weeks to months to properly analyze and get to the often surprising root cause. The root causes behind these puzzling results are not isolated incidents; these issues generalized to multiple experiments. The heightened awareness should help readers increase the trustworthiness of the results coming out of controlled experiments. At Microsoft's Bing, it is not uncommon to see experiments that impact annual revenue by millions of dollars, thus getting trustworthy results is critical and investing in understanding anomalies has tremendous payoff: reversing a single incorrect decision based on the results of an experiment can fund a whole team of analysts. The topics we cover include: the OEC (Overall Evaluation Criterion), click tracking, effect trends, experiment length and power, and carryover effects.
Article
Full-text available
Design Space Analysis is an approach to representing design rationale. It uses a semiformal notation, called QOC (Questions, Options, and Criteria), to represent the design space around an artifact. The main constituents of QOC are Qustions identifying key design issues, Options providing possible answers to the Questions, and Criteria for assessing and comparing the Options. Design Space Analysis also takes account of justifications for the design (and possible alternative designs) that reflect considerations such as consistency, models and analogies, and relevant data and theory. A Design Space Analysis does not produce a record of the design process but is instead a coproduct of design and has to be constructed alongside the artifact itself. Our work is motivated by the notion that a Design Space Analysis will repay the investment in its creation by supporting both the original process of design and subsequent work on redesign and reuse by (a) providing an explicit representation to aid reasoning about the design and about the consequences of changes to it and (b) serving as a vehicle for communication, for example, among members of the design team or among the original designers and later maintainers of a system. Our work to date emphasises the nature of the QOC representation over processes for creating it, so these claims serve as goals rather than objectives we have achieved. This article describes the elements of Design Space Analysis and illustrates them by reference to analyses of existing designs and to studies of the concepts and arguments used by designers during design discussions.
Article
Full-text available
A multi-armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi-armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are ‘optimal’ in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature. Copyright © 2010 John Wiley & Sons, Ltd.
Conference Paper
Full-text available
Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. While the theoretical aspects of offline controlled experiments have been well studied and documented, the practical aspects of running them in online settings, such as web sites and services, are still being developed. As the usage of controlled experiments grows in these online settings, it is becoming more important to understand the opportunities and pitfalls one might face when using them in practice. A survey of online controlled experiments and lessons learned were previously documented in Controlled Experiments on the Web: Survey and Practical Guide (Kohavi, et al., 2009). In this follow-on paper, we focus on pitfalls we have seen after running numerous experiments at Microsoft. The pitfalls include a wide range of topics, such as assuming that common statistical formulas used to calculate standard deviation and statistical power can be applied and ignoring robots in analysis (a problem unique to online settings). Online experiments allow for techniques like gradual ramp-up of treatments to avoid the possibility of exposing many customers to a bad (e.g., buggy) Treatment. With that ability, we discovered that it's easy to incorrectly identify the winning Treatment because of Simpson's paradox.
Conference Paper
Full-text available
Spatial scaffolding is a naturally occurring human teaching behavior, in which teachers use their bodies to spatially structure the learning environment to direct the attention of the learner. Robotic systems can take advantage of simple, highly reliable ...
Conference Paper
Full-text available
A bewildering variety of devices for communication from humans to computers now exists on the market. In order to make sense of this variety, and to aid in the design of new input devices, we propose a framework for describing and analyzing input devices. Following Mackinlay's semantic analysis of the design space for graphical presentations, our goal is to provide tools for the generation and test of input device designs. The descriptive tools we have created allow us to describe the semantics of a device and measure its expressiveness. Using these tools, we have built a taxonomy of input devices that goes beyond earlier taxonomies of Buxton & Baecker and Foley, Wallace, & Chan. In this paper, we build on these descriptive tools, and proceed to the use of human performance theories and data for evaluation of the effectiveness of points in this design space. We focus on two figures of merit, footprint and bandwidth, to illustrate this evaluation. The result is the systematic integration of methods for both generating and testing the design space of input devices.
Article
Full-text available
The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where end-users can help guide the development of features. Our experience indicates that significant learning and return-on-investment (ROI) are seen when development teams listen to their customers, not to the Highest Paid Person’s Opinion (HiPPO). We provide several examples of controlled experiments with surprising results. We review the important ingredients of running controlled experiments, and discuss their limitations (both technical and organizational). We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques, which we show are not as simple in practice as is often assumed. Controlled experiments typically generate large amounts of data, which can be analyzed using data mining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses. Based on our extensive practical experience with multiple systems and organizations, we share key lessons that will help practitioners in running trustworthy controlled experiments.
Article
Full-text available
Mainly motivated by the current lack of a qualitative and quantitative entertainment formulation of computer games and the procedures to generate it, this article covers the following issues: It presents the features—extracted primarily from the opponent behavior—that make a predator/prey game appealing; provides the qualitative and quantitative means for measuring player entertainment in real time, and introduces a successful methodology for obtaining games of high satisfaction. This methodology is based on online (during play) learning opponents who demonstrate cooperative action. By testing the game against humans, we confirm our hypothesis that the proposed entertainment measure is consistent with the judgment of human players. As far as learning in real time against human players is concerned, results suggest that longer games are required for humans to notice some sort of change in their entertainment.
Article
Full-text available
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Article
Full-text available
Personalized web services strive to adapt their services (advertisements, news articles, etc) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation. In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. The contributions of this work are three-fold. First, we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory. Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5% click lift compared to a standard context-free bandit algorithm, and the advantage becomes even greater when data gets more scarce.
Article
Full-text available
Virtual advisors often increase sales for those customers who find such online advice to be convenient and helpful. However, other customers take a more active role in their purchase decisions and prefer more detailed data. In general, we expect that websites are more preferred and increase sales if their characteristics (e.g., more detailed data) match customers' cognitive styles (e.g., more analytic). "Morphing" involves automatically matching the basic "look and feel" of a website, not just the content, to cognitive styles. We infer cognitive styles from clickstream data with Bayesian updating. We then balance exploration (learning how morphing affects purchase probabilities) with exploitation (maximizing short-term sales) by solving a dynamic program (partially observable Markov decision process). The solution is made feasible in real time with expected Gittins indices. We apply the Bayesian updating and dynamic programming to an experimental BT Group (formerly British Telecom) website using data from 835 priming respondents. If we had perfect information on cognitive styles, the optimal "morph" assignments would increase purchase intentions by 21%. When cognitive styles are partially observable, dynamic programming does almost as well—purchase intentions can increase by almost 20%. If implemented system-wide, such increases represent approximately $80 million in additional revenue.
Conference Paper
We use Bayesian optimization methods to design games that maximize user engagement. Participants are paid to try a game for several minutes, at which point they can quit or continue to play voluntarily with no further compensation. Engagement is measured by player persistence, projections of how long others will play, and a post-game survey. Using Gaussian process surrogate-based optimization, we conduct efficient experiments to identify game design characteristics---specifically those influencing difficulty---that lead to maximal engagement. We study two games requiring trajectory planning, the difficulty of each is determined by a three-dimensional continuous design space. Two of the design dimensions manipulate the game in user-transparent manner (e.g., the spacing of obstacles), the third in a subtle and possibly covert manner (incremental trajectory corrections). Converging results indicate that overt difficulty manipulations are effective in modulating engagement only when combined with the covert manipulation, suggesting the critical role of a user's self-perception of competence.
Thesis
Large-scale online experiments can test generalizable theories about how designs affect users. While online software companies run hundreds of thousands of experiments every day, nearly all of these experiments are simple A/B tests structured to identify which software design is better. In contrast, this thesis highlights opportunities for an “interaction design science” where online experiments can test generalizable theories explaining how and why different software designs affect user interactions. To illustrate the basic scientific opportunities inherent within large-scale online design experiments, this thesis deploys over 10,000 variations of an online educational game to more than 100,000 learners in order to test basic psychological theories of motivation. In contrast to dominant theories of motivation, which predict that a moderate level of challenge maximizes motivation, these experiments find that difficulty has a consistently negative effect on motivation, unless accompanied by specific design factors. However, a series of parallel experiments provide evidence that a moderate level of novelty maximizes motivation, while also increasing difficulty. These results suggest that previous theoretical formulations of challenge may be conflating difficulty and novelty. These experiments are conducted within Battleship Numberline, a systematically designed learning game that has been played over three million times. This thesis argues that accelerating the pace of online design experiments can accelerate basic science, particularly the scientific theory underlying interaction design. For instance, a testable taxonomy of motivational design elements is presented, which could be validated through a series of online experiments. Yet, while it may be feasible to run thousands of design experiments, analyzing and learning from this large-scale experimentation is a new and important scientific challenge. To address this issue, this thesis investigates the use of multi-armed bandit algorithms to automatically explore (and optimize) the design space of online software. To synthesize these results, this thesis provides a summary table of all 17 tested hypotheses, offers a design pattern for producing online experiments that contribute to generalizable theory and proposes a model that illustrates how online software experiments can accelerate both basic science and data-driven continuous improvement.
Article
The modern service economy is substantively different from the agricultural and manufacturing economies that preceded it. In particular, the cost of experimenting is dominated by opportunity cost rather than the cost of obtaining experimental units. The different economics require a new class of experiments, in which stochastic models play an important role. This article briefly summarizes multi-armed bandit experiments, where the experimental design is modified as the experiment progresses to reduce the cost of experimenting. Special attention is paid to Thompson sampling, which is a simple and effective way to run a multi-armed bandit experiment. Copyright © 2015 John Wiley & Sons, Ltd.
Article
We present a general automatic experimentation and hypothesis generation framework that utilizes a large set of users to explore the effects of different parts of an intervention parameter space on any objective function. We also incorporate importance sampling, allowing us to run these automatic experiments even if we cannot give out the exact intervention distributions that we want. To show the utility of this framework, we present an implementation in the domain of fractions and numberlines, using an online educational game as the source of players. Our system is able to automatically explore the parameter space and generate hypotheses about what types of numberlines lead to maximal short-term transfer; testing on a separate dataset shows the most promising hypotheses are valid. We briefly discuss our results in the context of the wider educational literature, showing that one of our results is not explained by current research on multiple fraction representations, thus proving our ability to generate potentially interesting hypotheses to test.
Article
Thompson sampling is one of oldest heuristic to address the exploration / ex-ploitation trade-off, but it is surprisingly unpopular in the literature. We present here some empirical results using Thompson sampling on simulated and real data, and show that it is highly competitive. And since this heuristic is very easy to implement, we argue that it should be part of the standard baselines to compare against.
Article
Online games can serve as research instruments to explore the effects of game design elements on motivation and learning. In our research, we manipulated the design of an online math game to investigate the effect of challenge on player motivation and learning. To test the "Inverted-U Hypothesis", which predicts that maximum game engagement will occur with moderate challenge, we produced two large-scale (10K and 70K subjects), multi-factor (2x3 and 2x9x8x4x25) online experiments. We found that, in almost all cases, subjects were more engaged and played longer when the game was easier, which seems to contradict the generality of the Inverted-U Hypothesis. Troublingly, we also found that the most engaging design conditions produced the slowest rates of learning. Based on our findings, we describe several design implications that may increase challenge-seeking in games, such as providing feedforward about the anticipated degree of challenge.
Article
This paper studies how and how much active experimentation is used in discounted or finite-horizon optimization problems with an agent who chooses actions sequentially from a finite set of actions, with rewards depending on unknown parameters associated with the actions. Closed-form approximations are developed for the optimal rules in these ‘multi-armed bandit’ problems. Some refinements and modifications of the basic structure of these approximations also provide a nearly optimal solution to the long-standing problem of incorporating switching costs into multi-armed bandits.
Conference Paper
Standard intelligent tutoring systems give immediate feedback on whether students’ answers are correct. This prevents unproductive floundering, but may also prevent students from engaging deeply with their misconceptions. This paper presents a prototype intelligent tutoring system with grounded feedback that supports students in evaluating and correcting their own errors. In a think-aloud study with five fifth-graders, students used the grounded feedback to self-correct, and solved more fraction addition problems with the tutor than with paper and pencil. These preliminary results are encouraging and motivate experimental work in this area.
Conference Paper
Decision-theoretic optimization is becoming a popular tool in the user interface community, but creating accurate cost (or utility) functions has become a bottleneck --- in most cases the numerous parameters of these functions are chosen manually, which is a tedious and error-prone process. This paper describes ARNAULD, a general interactive tool for eliciting user preferences concerning concrete outcomes and using this feedback to automatically learn a factored cost function. We empirically evaluate our machine learning algorithm and two automatic query generation approaches and report on an informal user study.
Conference Paper
Normally, the primary purpose of an information display is to convey information. If information displays can be aesthetically interesting, that might be an added bonus. This paper considers an experiment in reversing this imperative. It describes the Kandinsky system which is designed to create displays which are first aesthetically interesting, and then as an added bonus, able to convey information. The Kandinsky system works on the basis of aesthetic properties specified by an artist (in a visual form). It then explores a space of collages composed from information bearing images, using an optimization technique to find compositions which best maintain the properties of the artist's aesthetic expression.
Conference Paper
We propose novel multi-armed bandit (explore/exploit) schemes to maximize total clicks on a content module published regularly on Yahoo! Intuitively, one can "explore'' each candidate item by displaying it to a small fraction of user visits to estimate the item's click-through rate (CTR), and then "exploit'' high CTR items in order to maximize clicks. While bandit methods that seek to find the optimal trade-off between explore and exploit have been studied for decades, existing solutions are not satisfactory for Web content publishing applications where dynamic set of items with short lifetimes, delayed feedback and non-stationary reward (CTR) distributions are typical. In this paper, we develop a Bayesian solution and extend several existing schemes to our setting. Through extensive evaluation with nine bandit schemes, we show that our Bayesian solution is uniformly better in several scenarios. We also study the empirical characteristics of our schemes and provide useful insights on the strengths and weaknesses of each. Finally, we validate our results with a "side-by-side'' comparison of schemes through live experiments conducted on a random sample of real user visits to Yahoo!
Conference Paper
The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, Poker (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.
Article
A class of simple adaptive allocation rules is proposed for the problem (often called the "multi-armed bandit problem") of sampling x1,xNx_1, \cdots x_N sequentially from k populations with densities belonging to an exponential family, in order to maximize the expected value of the sum SN=x1++xNS_N = x_1 + \cdots + x_N. These allocation rules are based on certain upper confidence bounds, which are developed from boundary crossing theory, for the k population parameters. The rules are shown to be asymptotically optimal as NN \rightarrow \infty from both Bayesian and frequentist points of view. Monte Carlo studies show that they also perform very well for moderate values of the horizon N.
Article
Continuing his exploration of the organization of complexity and the science of design, this new edition of Herbert Simon's classic work on artificial intelligence adds a chapter that sorts out the current themes and tools -- chaos, adaptive systems, genetic algorithms -- for analyzing complexity and complex systems. There are updates throughout the book as well. These take into account important advances in cognitive psychology and the science of design while confirming and extending the book's basic thesis: that a physical symbol system has the necessary and sufficient means for intelligent action. The chapter "Economic Reality" has also been revised to reflect a change in emphasis in Simon's thinking about the respective roles of organizations and markets in economic systems.
Digital Games for Improving Number Sense Retrieved from https
  • D Lomas
Lomas, D., (2013). Digital Games for Improving Number Sense Retrieved from https://pslcdatashop. web.cmu.edu/Files?datasetId=445
Trading Off Scientific Knowledge and User Learning with Multi-Armed Bandits
  • Y Liu
  • T Mandel
  • E Brunskill
  • Z Popovic
Liu, Y., Mandel, T., Brunskill, E., & Popovic, Z. (2014) Trading Off Scientific Knowledge and User Learning with Multi-Armed Bandits. Educational Data Mining
Duolingo: Learning a Language While Translating the Web
  • S Hacker
Hacker, S. (2014) Duolingo: Learning a Language While Translating the Web. PhD Thesis, Carnegie
Google Content Experiments https://support.google.com/analytics/answer/2844870?hl =en&ref_topic=2844866 38 The Sciences of the Artificial Multi-armed bandit experiments in the online service economy
  • S Scott
  • H Simon
Scott, S. (2014) Google Content Experiments https://support.google.com/analytics/answer/2844870?hl =en&ref_topic=2844866 38. Simon, H. (1969). The Sciences of the Artificial. CScott, S. L. (2015). Multi-armed bandit experiments in the online service economy. Applied Stochastic Models in Business and Industry, 31(1), 37-45.
Uncontrolled: The surprising payoff of trial-and-error for business politics and society
  • J Manzi
Manzi, J. (2012). Uncontrolled: The surprising payoff of trial-and-error for business, politics, and society. Basic Books.
When to Run Bandit Tests Instead of A/B/n Tests
  • A Birkett
Birkett, A. (2015) When to Run Bandit Tests Instead of A/B/n Tests. http://conversionxl.com/bandit-tests/
Ethical issues in advanced artificial intelligence. Science Fiction and Philosophy: From Time Travel to Superintelligence
  • N Bostrom
Bostrom, N. (2003). Ethical issues in advanced artificial intelligence. Science Fiction and Philosophy: From Time Travel to Superintelligence, 277-284.
Technology or People: Putting People Back in Charge. Jnd.org Norman D. (in preparation) Technology or People: Putting People Back in Charge
  • D Norman
Norman, D. (in preparation) Technology or People: Putting People Back in Charge. Jnd.org
An empirical evaluation of thompson sampling. InAdvances in neural information processing systems
  • O Chapelle
  • L Li
A modern Bayesian look at the multi-armed bandit, Applied Stochastic Models in Business and Industry, v.26 n.6
  • L Steven
  • Scott
Design Space Sampling for the Optimization of Educational Games
  • D Lomas
  • E Harpstead
  • J R Hauser
  • G L Urban
  • G Liberali
  • M Braun
Hauser, J.R., Urban, G.L., Liberali, G., and Braun, M. (2009) Website Morphing. Marketing Science. 28, 2, 202-223.
Optimizing motivation and learning with large-scale game design experiments (Unpublished Making Interfaces Work for Each Individual #chi4good
  • D Lomas
Lomas, D. (2014). Optimizing motivation and learning with large-scale game design experiments (Unpublished Making Interfaces Work for Each Individual #chi4good, CHI 2016, San Jose, CA, USA
Google Content Experiments
  • S Scott
Scott, S. (2014) Google Content Experiments https://support.google.com/analytics/answer/2844870?hl =en&ref_topic=2844866