Conference Paper

Estimating Quality Ratings from Touch Interactions in Mobile Games

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The need to test how apps impact battery/power consumption [34], [35], [65], [83], [10], [1], [92], [19], [43], [76], [53] Performance bugs are very diicult to be detected and reproduced [57], [24], [51] Performance testing is a very time-consuming task [43] Lack of tools or methodologies for performance testing [34], [35], [83], [1], [36] The need to detect memory leaks [6] Poor performance apps negatively impact the user experience [6], [51], [90] App performance may vary diferent mobile platforms [81], [76] Developers do not have proper knowledge about performance, therefore they are not careful with this q.c [81], [6], [21], [19] S Developers do not have proper knowledge about security, therefore they are not careful with this q.c [7], [70], [75], [28], [58], [12], [72], [88], [29], [41], [47], [9], [82], [84], [54], [74], [85] Security testing is not a simple task and it is worth being investigated [9], [82], [54] Programmers developing mobile apps that communicate with a server usually do not follow security guidelines to implement SSL/TLS protocols [75], [58], [84] Security testing is neglected with respect to other q.c.(s), hence there is a need for automated tools to detect vulnerabilities [28] Apps are not tested against known vulnerabilities [12] The usage of third-party apps and frameworks may introduce vulnerabilities [72] U Usability may impact on the success of the mobile apps in the market [79], [61], [49], [52], [25] App usability should be tested in devices that have diferent display characteristics [49], [46], [20], [13] Usability testing usually requires extra cost, speciic expertise, and time [38], [79], [18], [52], [25], [26] Developers do not have proper knowledge about usability, therefore they are not careful with this q.c [38], [26] P Mobile apps have to work correctly in mobile devices and platforms with very diferent characteristics. ...
... The need to test how apps impact battery/power consumption [34], [35], [65], [83], [10], [1], [92], [19], [43], [76], [53] Performance bugs are very diicult to be detected and reproduced [57], [24], [51] Performance testing is a very time-consuming task [43] Lack of tools or methodologies for performance testing [34], [35], [83], [1], [36] The need to detect memory leaks [6] Poor performance apps negatively impact the user experience [6], [51], [90] App performance may vary diferent mobile platforms [81], [76] Developers do not have proper knowledge about performance, therefore they are not careful with this q.c [81], [6], [21], [19] S Developers do not have proper knowledge about security, therefore they are not careful with this q.c [7], [70], [75], [28], [58], [12], [72], [88], [29], [41], [47], [9], [82], [84], [54], [74], [85] Security testing is not a simple task and it is worth being investigated [9], [82], [54] Programmers developing mobile apps that communicate with a server usually do not follow security guidelines to implement SSL/TLS protocols [75], [58], [84] Security testing is neglected with respect to other q.c.(s), hence there is a need for automated tools to detect vulnerabilities [28] Apps are not tested against known vulnerabilities [12] The usage of third-party apps and frameworks may introduce vulnerabilities [72] U Usability may impact on the success of the mobile apps in the market [79], [61], [49], [52], [25] App usability should be tested in devices that have diferent display characteristics [49], [46], [20], [13] Usability testing usually requires extra cost, speciic expertise, and time [38], [79], [18], [52], [25], [26] Developers do not have proper knowledge about usability, therefore they are not careful with this q.c [38], [26] P Mobile apps have to work correctly in mobile devices and platforms with very diferent characteristics. ...
... This is the main strategy used for usability testing, and is proposed in 13 of the 14 studies addressing this q.c. [11,13,18,20,25,26,38,46,49,52,60,61,79]. The analysis of energy and memory usage logs are also used in performance eiciency testing [1,43]. ...
Article
Full-text available
Context: The mobile app market is continually growing offering solutions to almost all aspects of people’s lives, e.g., healthcare, business, entertainment, as well as the stakeholders’ demand for apps that are more secure, portable, easy to use, among other non-functional requirements (NFRs). Therefore, manufacturers should guarantee that their mobile apps achieve high-quality levels. A good strategy is to include software testing and quality assurance activities during the whole life cycle of such solutions. Problem: Systematically warranting NFRs is not an easy task for any software product. Software engineers must take important decisions before adopting testing techniques and automation tools to support such endeavors. Proposal: To provide to the software engineers with a broad overview of existing dynamic techniques and automation tools for testing mobile apps regarding NFRs. Methods: We planned and conducted a Systematic Mapping Study (SMS) following well-established guidelines for executing secondary studies in software engineering. Results: We found 56 primary studies and characterized their contributions based on testing strategies, testing approaches, explored mobile platforms, and the proposed tools. Conclusions: The characterization allowed us to identify and discuss important trends and opportunities that can benefit both academics and practitioners.
Conference Paper
Full-text available
Remote usability evaluation enables the possibility of analysing users' behaviour in their daily settings. We present a method and an associated tool able to identify potential usability issues through the analysis of client-side logs of mobile Web interactions. Such log analysis is based on the identification of specific usability smells. We describe an example set of bad usability smells, and how they are detected. The tool also allows evaluators to add new usability smells not included in the original set. We also report on the tool use in analysing the usability of a real, widely used application accessed by forty people through their smartphones whenever and wherever they wanted.
Conference Paper
Full-text available
Typing based communication applications on smartphones, like WhatsApp, can induce emotional exchanges. The effects of an emotion in one session of communication can persist across sessions. In this work, we attempt automatic emotion detection by jointly modeling the typing characteristics, and the persistence of emotion. Typing characteristics, like speed, number of mistakes, special characters used, are inferred from typing sessions. Self reports recording emotion states after typing sessions capture persistence of emotion. We use this data to train a personalized machine learning model for multi-state emotion classification. We implemented an Android based smartphone application, called TapSense, that records typing related metadata, and uses a carefully designed Experience Sampling Method (ESM) to collect emotion self reports. We are able to classify four emotion states - happy, sad, stressed, and relaxed, with an average accuracy (AUCROC) of 84% for a group of 22 participants who installed and used TapSense for 3 weeks.
Article
Full-text available
The System Usability Scale (SUS) has been widely employed in both the field and the laboratory as a valid and reliable measure of system usability. Although its psychometric properties are relatively well understood, the impact that differences in users’ personality traits have on their perceived usability of products, services, and systems has not been deeply explored—even though people’s scores on personality traits have been shown to be reliable and predict a staggering array of societally important outcomes in work, school, and life domains. In this study, 268 users assessed the usability of 20 different products retrospectively with the SUS. Five broad personality traits were then measured using the Mini-IPIP scale. Results indicated that measured personality traits do correlate with the rated usability of products, where measures of Openness to Experience and Agreeableness have the strongest positive correlations with subjective usability assessment. Implications for practitioners, designers, and researchers are discussed.
Conference Paper
Full-text available
Users' individual differences in their mobile touch behaviour can help to continuously verify identity and protect personal data. However, little is known about the influence of GUI elements and hand postures on such touch biometrics. Thus, we present a metric to measure the amount of user-revealing information that can be extracted from touch targeting interactions and apply it in eight targeting tasks with over 150,000 touches from 24 users in two sessions. We compare touch-to-target offset patterns for four target types and two hand postures. Our analyses reveal that small, compactly shaped targets near screen edges yield the most descriptive touch targeting patterns. Moreover, our results show that thumb touches are more individual than index finger ones. We conclude that touch-based user identification systems should analyse GUI layouts and infer hand postures. We also describe a framework to estimate the usefulness of GUIs for touch biometrics.
Conference Paper
Full-text available
In this paper a user verification system on mobile phones is proposed. This system is based on behavioral biometric traits which is a keystroke dynamics derived from a touchable keyboard. A mobile application is developed for collecting those touch keystroke dynamics. In contrast to other systems, no specific text or numbers are used to build our dataset. The Median Vector Proximity classifier is applied on the touch keystroke data (touchable keyboard) and the performance of the system is investigated using different number of features and we found that the system with 31 features gained an average EER=12.9%. While with an extra two features (average of finger size and pressure) the average EER=12.2%. This shows that the more features used results in more accurate systems. The proposed system is compared against other systems and shows promising results in dynamic authentication area.
Article
Full-text available
The role of affect and emotion in interactive system design is an active and recent research area. The aim is to make systems more responsive to user’s needs and expectations. The first step towards affective interaction is to recognize user’s emotional state. Literature contains many works on emotion recognition. In those works, facial muscle movement, gestures, postures and physiological signals were used for recognition. The methods are computation intensive and require extra hardware (e.g., sensors and wires). In this work, we propose a simpler model to predict the affective state of a touch screen user. The prediction is done based on the user’s touch input, namely the finger strokes. We defined seven features based on the strokes. A linear combination of these features is proposed as the predictor, which can predict a user’s affective state into one of the three states: positive (happy, excited and elated), negative (sad, anger, fear, disgust) and neutral (calm, relaxed and contented). The model alleviates the need for extra setup as well as extensive computation, making it suitable for implementation on mobile devices with limited resources. The model is developed and validated with empirical data involving 57 participants performing 7 touch input tasks. The validation study demonstrates a high prediction accuracy of 90.47 %. The proposed model and its empirical development and validation are described in this paper.
Conference Paper
Full-text available
Pointing tasks are commonly studied in HCI research, for example to evaluate and compare different interaction techniques or devices. A recent line of work has modelled user-specific touch behaviour with machine learning methods to reveal spatial targeting error patterns across the screen. These models can also be applied to improve accuracy of touch-screens and keyboards, and to recognise users and hand postures. However, no implementation of these techniques has been made publicly available yet, hindering broader use in research and practical deployments. Therefore, this paper presents a toolkit which implements such touch models for data analysis (Python), mobile applications (Java/Android), and the web (JavaScript). We demonstrate several applications , including hand posture recognition, on touch targeting data collected in a study with 24 participants. We consider different target types and hand postures, changing behaviour over time, and the influence of hand sizes.
Conference Paper
Full-text available
An end-user questionnaire to measure user experience quickly in a simple and immediate way while covering a preferably comprehensive impression of the product user experience was the goal of the reported construction process. An empirical approach for the item selection was used to ensure practical relevance of items. Usability experts collected terms and statements on user experience and usability, including ‘hard’ as well as ‘soft’ aspects. These statements were consolidated and transformed into a first questionnaire version containing 80 bipolar items. It was used to measure the user experience of software products in several empirical studies. Data were subjected to a factor analysis which resulted in the construction of a 26 item questionnaire including the six factors Attractiveness, Perspicuity, Efficiency, Dependability, Stimulation, and Novelty. Studies conducted for the original German questionnaire and an English version indicate a satisfactory level of reliability and construct validity.
Conference Paper
Full-text available
Die Evaluation interaktiver Produkte ist eine wichtige Aktivität im Rahmen benutzerzentrierter Gestaltung. Eine Evaluationstechnik, die sich meist auf die Nutzungsqualität oder ‟ Gebrauchstauglichkeit ” eines Pro- dukts konzentriert, stellen Fragebögen dar. Zur Zeit werden allerdings weitere, sogenannte ‟ hedonische ” Qualitätsaspekte diskutiert. Diese beruhen auf den menschlichen Bedürfnissen nach Stimulation und Identi- tät, während bei Gebrauchstauglichkeit (bzw. ‟ pragmatischer Qualität ” ) der Bedarf zur kontrollierten Mani- pulation der Umwelt im Vordergrund steht. In diesem Beitrag wird der ‟ AttrakDiff 2 ” Fragebogen vorge- stellt, der sowohl wahrgenommene pragmatische als auch hedonische Qualität zu messen vermag. Ergebnis- se zur Reliabilität und Validität werden vorgestellt und diskutiert. AttrakDiff 2 stellt einen ersten Beitrag zur Messung von Qualitätsaspekten dar, die über die reine Gebrauchstauglichkeit hinausgehen.
Book
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform.Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
Article
Approximately 65% of nursing home residents suffer from Alzheimer’s disease and related disorders (ADRD). Therefore, psychosocial interventions for dealing with ADRD related symptoms play an important role in residential care. Recent findings suggest that Information and Communication Technologies (ICTs) can be effective tools for supporting dementia care delivery. However, more systematic research is needed on specific benefits of ICTs in dementia care. In order to investigate effects of a tablet-computer-based intervention on Quality of Life and Behavioral Symptoms in nursing home residents with ADRD, a cluster-randomized controlled trial (cRCT) is currently underway in 10 nursing homes (N=200, MMSE<24, 5 experimental-, 5 control-group facilities) in Berlin. Over a period of 8 weeks, experimental group participants engage in 3 supervised 30-minute tablet sessions/week using adaptive tablet-applications targeting cognitive and functional abilities and supporting emotional self-regulation. Control group participants receive an equal amount of individual sessions without tablets. Dementia-related Quality of Life, Cognition and Behavioral Symptoms are assessed at baseline and after intervention. Additionally, assessments of mood and behavior are conducted before and after each activation session in both groups. So far, n=29 participants (14 experimental group) have been included in the on-going cRCT. Preliminary results indicate a significant (p<.01) increase of the ratings for happiness (from 5.4 to 6.0), social behavior (from 5.1 to 5.6), and mood (5.0 to 5.5) in the experimental group. In the final analysis behavioral data (app parameter) will be also included.
Conference Paper
In this work a first attempt is made to model user experience ratings with recordings of touch interactions. To measure the user experience the AttrakDiff Mini questionnaire is used providing three quality dimensions: pragmatic quality, hedonic quality and attractiveness. The feature selection for linear models shows that it is possible to predict the pragmatic quality for a test set with an \(r^2\) of up to 0.64. The hedonic quality and attractiveness, however, seem not to be reflected in these touch interactions in a linear way. Still, the results from this work constitute a promising direction for future research on assessing or even optimizing the user experience for apps automatically in real time.
Thesis
The goal of adaptive user interfaces (UI) is offering the opportunity to adapt to changes in the context of use and thus provide potentially improved interaction capabilities for different users in specific situations. But, this poses the challenge of evaluating usability aspects of many different variants of the resulting UI. Consequently, usability evaluations with real users or experts tend to become complex and time-consuming especially in the domain of adaptive UIs. Model-based usability evaluations and specifically automated tools and approaches have proven to correctly predict usability relevant aspects in early stages of development. However, the creation and provision of required models and information tends to be complex and time consuming as well and further requires a high degree of expertise for the specific tool and applied method. This thesis describes an integrated approach that provides automation in model-based usability evaluation based on already existing development models of adaptive UIs. The approach is based on required information for describing the UI surface information and the interaction capabilities of the UI. With the help of this information usability relevant criteria are predicted using specific tools of automated usability evaluation. The implementation of the approach presents integration of an existing runtime framework for adaptive UIs with a cognitive user behavior model for simulation. Information required for simulating interactions is created automatically with the help of the UI development models and by this means saves time and costs when preparing and running simulations. Additionally, with the help of two studies, the resulting predictions are further improved by directly using information encoded in the existing development models without requiring specific expertise from designers and usability experts.
Article
Conforming to W3C specifications, mobile web browsers allow JavaScript code in a web page to access motion and orientation sensor data without the user's permission. The associated risks to user security and privacy are however not considered in W3C specifications. In this work, for the first time, we show how user security can be compromised using these sensor data via browser, despite that the data rate is 3–5 times slower than what is available in app. We examine multiple popular browsers on Android and iOS platforms and study their policies in granting permissions to JavaScript code with respect to access to motion and orientation sensor data. Based on our observations, we identify multiple vulnerabilities, and propose TouchSignatures which implements an attack where malicious JavaScript code on an attack tab listens to such sensor data measurements. Based on these streams, TouchSignatures is able to distinguish the user's touch actions (i.e., tap, scroll, hold, and zoom) and her PINs, allowing a remote website to learn the client-side user activities. We demonstrate the practicality of this attack by collecting data from real users and reporting high success rates using our proof-of-concept implementations. We also present a set of potential solutions to address the vulnerabilities. The W3C community and major mobile browser vendors including Mozilla, Google, Apple and Opera have acknowledged our work and are implementing some of our proposed countermeasures.
Article
Using software for model-based usability evaluation is uncommon today, as the modelling process is considered as overhead to the actual design work. The aim of the present paper is to describe a concept, which may make modelbased usability evaluation more worthwhile and feasible. The concept is based on sampling user models based on demographic characteristics; these characteristics may help to estimate the severity of usability problems found with the help of the user model. To sample representative user models, a simple Bayesian network (BN) was constructed, holding information about age and gender distributions, and attitudes towards technology. The results of the simulation suggest that a BN is an appropriate tool to store user information for modelling purposes, and thus may improve model-based usability evaluation.
Article
Smartphone users have their own unique behavioral patterns when tapping on the touch screens. These personal patterns are reflected on the different rhythm, strength, and angle preferences of the applied force. Since smart phones are equipped with various sensors like accelerometer, gyroscope, and touch screen sensors, capturing a user's tapping behaviors can be done seamlessly. Exploiting the combination of four features (acceleration, pressure, size, and time) extracted from smart phone sensors, we propose a non-intrusive user verification mechanism to substantiate whether an authenticating user is the true owner of the smart phone or an impostor who happens to know the pass code. Based on the tapping data collected from over 80 users, we conduct a series of experiments to validate the efficacy of our proposed system. Our experimental results show that our verification system achieves high accuracy with averaged equal error rates of down to 3.65%. As our verification system can be seamlessly integrated with the existing user authentication mechanisms on smart phones, its deployment and usage are transparent to users and do not require any extra hardware support.
Chapter
Die Evaluation interaktiver Produkte ist eine wichtige Aktivität im Rahmen benutzerzentrierter Gestaltung. Eine Evaluationstechnik, die sich meist auf die Nutzungsqualität oder „Gebrauchstauglichkeit“ eines Produkts konzentriert, stellen Fragebögen dar. Zur Zeit werden allerdings weitere, sogenannte „hedonische“ Qualitätsaspekte diskutiert. Diese beruhen auf den menschlichen Bedürfnissen nach Stimulation und Identität, während bei Gebrauchstauglichkeit (bzw. „pragmatischer Qualität“) der Bedarf zur kontrollierten Manipulation der Umwelt im Vordergrund steht. In diesem Beitrag wird der „AttrakDiff 2“ Fragebogen vorgestellt, der sowohl wahrgenommene pragmatische als auch hedonische Qualität zu messen vermag. Ergebnisse zur Reliabilität und Validität werden vorgestellt und diskutiert. AttrakDiff 2 stellt einen ersten Beitrag zur Messung von Qualitätsaspekten dar, die über die reine Gebrauchstauglichkeit hinausgehen.
Article
The aim of this study is to analyze information that can be revealed from simple touch gestures such as horizontal and vertical scrolling. Touch gestures contain identity information, they can reflect the user’s experience using touchscreen and they can infer the gender of the user. The statements are based on measurements on a large touch dataset collected from 71 users using 8 different mobile devices, both tablets and phones. Touch data were divided in strokes and classification measurements were investigated based on single and multiple strokes. Classification results based on single stroke are inaccurate, which can be improved by using multiple strokes. Measurements prove that identity, gender and userÆs touchscreen experience level can be accurately predicted from a sequence of 10 strokes. In addition to the different classification results we present statistical analysis of the collected data in order to reveal basic differences between male and female users as well as for less and more experienced touchscreen users.
Conference Paper
The spread of touch screen based smart phones has been constantly increasing over the last few years. However, there are still many open research questions concerning the basic input properties of these devices. We performed a large scale study to research the users' touch screen behavior on standard UI elements. To do so we programmed and published a quiz game that logs touch data and sends it back for evaluation purposes. Over 14,000 persons have played this game so far and sent back statistical data. We use the collected data to present basic touch properties, such as mean hold time and pressure dynamics, and to show that touch screen based input is individual for each person and that one can identify a specific user in a set of 5 users with a precision of about 80%, based on just a few touch events.
Article
The System Usability Scale (SUS) is an inexpensive, yet effective tool for assessing the usability of a product, including Web sites, cell phones, interactive voice response systems, TV applications, and more. It provides an easy-to-understand score from 0 (negative) to 100 (positive). While a 100-point scale is intuitive in many respects and allows for relative judgments, information describing how the numeric score translates into an absolute judgment of usability is not known. To help answer that question, a seven-point adjective-anchored Likert scale was added as an eleventh question to nearly 1,000 SUS surveys. Results show that the Likert scale scores correlate extremely well with the SUS scores (r=0.822). The addition of the adjective rating scale to the SUS may help practitioners interpret individual SUS scores and aid in explaining the results to non-human factors professionals.
Article
In this paper, we compare different approaches for predicting the quality and usability of spoken dialogue systems. The respective models provide estimations of user judgments on perceived quality, based on parameters which can be extracted from interaction logs. Different types of input parameters and different modeling algorithms have been compared using three spoken dialogue databases obtained with two different systems. The results show that both linear regression models and classification trees are able to cover around 50% of the variance in the training data, and neural networks even more. When applied to independent test data, in particular to data obtained with different systems and/or user groups, the prediction accuracy decreases significantly. The underlying reasons for the limited predictive power are discussed. It is shown that – although an accurate prediction of individual ratings is not yet possible with such models – they may still be used for taking decisions on component optimization, and are thus helpful tools for the system developer.
Customizable automatic detection of bad usability smells in mobile accessed web applications
  • F Paternò
  • A G Schiavone
  • A Conte
Handbuch zur fun-ni toolbox - user experience evaluation auf drei ebenen
  • S Diefenbach
  • M Hassenzahl
S. Diefenbach and M. Hassenzahl, "Handbuch zur funni toolbox user experience evaluation auf drei ebenen," 2011. [Online].
Model-based evaluation
  • D Kieras
D. Kieras, "Model-based evaluation," in Human-Computer Interaction Handbook. CRC Press, 2012, pp. 1299-1318.
Quality engineering: Qualitat kommunikationstechnischer systeme
  • S Möller
S. Möller, Quality engineering: Qualitat kommunikationstechnischer systeme. Berlin Heidelberg: Springer, 2010.
caret: Classification and regression training
  • M Kuhn
  • J Wing
  • S Weston
  • A Williams
  • C Keefer
  • A Engelhardt
  • T Cooper
  • Z Mayer
  • B Kenkel
  • T R C Team
  • M Benesty
  • R Lescarbeau
  • A Ziem
  • L Scrucca
  • Y Tang
  • C Candan
  • T Hunt
M. Kuhn, J. Wing, S. Weston, A. Williams, C. Keefer, A. Engelhardt, T. Cooper, Z. Mayer, B. Kenkel, t. R. C. Team, M. Benesty, R. Lescarbeau, A. Ziem, L. Scrucca, Y. Tang, C. Candan, and T. Hunt, "caret: Classification and regression training," 2017. [Online].
Pflegetab: Enhancing quality of life using a psychosocial internet-based intervention for residential dementia care
  • J.-N Antons
  • J O'sullivan
  • S Arndt
  • P Gellert
  • J Nordheim
  • S Möller
  • A Kuhlmey
J.-N. Antons, J. O'Sullivan, S. Arndt, P. Gellert, J. Nordheim, S. Möller, and A. Kuhlmey, "Pflegetab: Enhancing quality of life using a psychosocial internet-based intervention for residential dementia care," in Proceedings of International Society for Research on Internet Interventions. Sanford NC, USA: International Society for Research on Internet Interventions, 2016.
Touchml: A machine learning toolkit for modelling spatial touch targeting behaviour
  • D Buschek
  • F Alt
  • O Brdiczka
  • P Chau
  • G Carenini
  • S Pan
  • P O Kristensson
Pflegetab: Enhancing quality of life using a psychosocial internet-based intervention for residential dementia care
  • J.-N Antons
  • J Sullivan
  • S Arndt
  • P Gellert
  • J Nordheim
  • S Möller