Conference Paper

User Acceptance Criteria for Privacy Preserving Machine Learning Techniques

Authors:
  • Continental Automotive Technologies GmbH
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Users are confronted with a variety of different machine learning applications in many domains. To make this possible especially for applications relying on sensitive data, companies and developers are implementing Privacy Preserving Machine Learning (PPML) techniques what is already a challenge in itself. This study provides the first step for answering the question how to include the user's preferences for a PPML technique into the privacy by design process , when developing a new application. The goal is to support developers and AI service providers when choosing a PPML technique that best reflects the users' preferences. Based on discussions with privacy and PPML experts, we derived a framework that maps the characteristics of PPML to user acceptance criteria.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... can be used on its own or in a more general PPML selection process as one decision criterion out of many. To achieve this, we built on the analysis of [45] who elicited User Acceptance Criteria (UAC) that influence the users' acceptance of an application based on frequently used models, such as IUIPC [80,50] or APCO [7] model. We use our framework to translate these criteria into differentiating PPML Characteristics among PPML technologies. ...
... • RO1: Providing a process on how to connect PPML technologies with UAC. We contribute by providing precise formulas to apply the suggested mapping of Löbner et al. [45] to calculate a PPML Characteristic preference score from UAC. ...
... Löbner et al. [45] propose a mapping to translate UAC into PPML Characteristics by eliciting influencing relationships through joint coding and expert interviews. However, they do not present how to calculate the technology that suits an application best. ...
Preprint
Full-text available
Using Privacy-Enhancing Technologies (PETs) for machine learning often influences the characteristics of a machine learning approach, e.g., the needed computational power, timing of the answers or how the data can be utilized. When designing a new service, the developer faces the problem that some decisions require a trade-off. For example, the use of a PET may cause a delay in the responses or adding noise to the data to improve the users' privacy might have a negative impact on the accuracy of the machine learning approach. As of now, there is no structured way how the users' perception of a machine learning based service can contribute to the selection of Privacy Preserving Machine Learning (PPML) methods. This is especially a challenge since one cannot assume that users have a deep technical understanding of these technologies. Therefore, they can only be asked about certain attributes that they can perceive when using the service and not directly which PPML they prefer. This study introduces a decision support framework with the aim of supporting the selection of PPML technologies based on user preferences. Based on prior work analysing User Acceptance Criteria (UAC), we translate these criteria into differentiating characteristics for various PPML techniques. As a final result, we achieve a technology ranking based on the User Acceptance Criteria while providing technology insights for the developers. We demonstrate its application using the use case of classifying privacy-relevant information. Our contribution consists of the decision support framework which consists of a process to connect PPML technologies with UAC, a process for evaluating the characteristics that separate PPML techniques, and a ranking method to evaluate the best PPML technique for the use case.
... Next steps are user story elicitation and use case design to close the market gap of smart home privacy and security soft-and hardware solutions, with a special focus on usability for average users. This especially holds for the integration of AI into IoT smart home devices [10]. ...
Conference Paper
The integration of smart home IoT devices into households promises to improve users' living experiences and increase convenience, but raises manifold privacy and security concerns. This study conducts 11 interviews with expert users in Germany, employing qualitative content analysis to unveil prevalent privacy and security concerns. The objective is to survey smart home usage behavior and user knowledge to identify the need for solutions that educate users on potential threats, and aid them in securing their smart home. The findings illuminate the evolving dynamics of user perspectives within the rapidly advancing smart home landscape of IoT devices and the need for open-source solutions that provide recommendations and data flow control in smart home devices. Although knowledgeable in general, it can be observed that even expert users lack awareness of information flows in smart homes, calling for more education and the creation of software and hardware-based solutions to mitigate threats in the future.
... From an HCI perspective, our study underscores the importance of considering user acceptance when developing mental health assessment tools that utilize smartphone images and machine learning. Recently, there has been a growing interest among researchers to integrate user acceptance into the training phase of machine learning models, as proposed in studies like [10,47]. In a related observation, our feature importance analysis indicated that the right side of the face is more useful in depression detection. ...
Article
Full-text available
Users report that they have regretted accidentally sharing personal information on social media. There have been proposals to help protect the privacy of these users, by providing tools which analyze text or images and detect personal information or privacy disclosure with the objective to alert the user of a privacy risk and transform the content. However, these proposals rely on having access to users’ data and users have reported that they have privacy concerns about the tools themselves. In this study, we investigate whether these privacy concerns are unique to privacy tools or whether they are comparable to privacy concerns about non-privacy tools that also process personal information. We conduct a user experiment to compare the level of privacy concern towards privacy tools and nonprivacy tools for text and image content, qualitatively analyze the reason for those privacy concerns, and evaluate which assurances are perceived to reduce that concern. The results show privacy tools are at a disadvantage: participants have a higher level of privacy concern about being surveilled by the privacy tools, and the same level concern about intrusion and secondary use of their personal information compared to non-privacy tools. In addition, the reasons for these concerns and assurances that are perceived to reduce privacy concern are also similar. We discuss what these results mean for the development of privacy tools that process user content.
Article
Full-text available
When requesting a web-based service, users often fail in setting the website's privacy settings according to their self privacy preferences. Being overwhelmed by the choice of preferences, a lack of knowledge of related technologies or unawareness of the own privacy preferences are just some reasons why users tend to struggle. To address all these problems, privacy setting prediction tools are particularly well suited. Such tools aim to lower the burden to set privacy preferences according to owners privacy preferences. To be in line with the increased demand for explainability and interpretability by regulatory obligations - such as the General Data Protection Regulation (GDPR) in Europe - this paper introduces an explainable model for default privacy setting prediction. Compared to the previous work we present an improved feature selection, increased interpretability of each step in model design and enhanced evaluation metrics to better identify weaknesses in the model's design before they go into production. As a result, we aim to provide an explainable and transparent tool for default privacy setting prediction which users easily understand and are therefore more likely to use.
Article
Full-text available
Cameras are everywhere, and are increasingly coupled with video analytics software that can identify our face, track our mood, recognize what we are doing, and more. We present the results of a 10-day in-situ study designed to understand how people feel about these capabilities, looking both at the extent to which they expect to encounter them as part of their everyday activities and at how comfortable they are with the presence of such technologies across a range of realistic scenarios. Results indicate that while some widespread deployments are expected by many (e.g., surveillance in public spaces), others are not, with some making people feel particularly uncomfortable. Our results further show that individuals’ privacy preferences and expectations are complicated and vary with a number of factors such as the purpose for which footage is captured and analyzed, the particular venue where it is captured, and whom it is shared with. Finally, we discuss the implications of people’s rich and diverse preferences on opt-in or opt-out rights for the collection and use (including sharing) of data associated with these video analytics scenarios as mandated by regulations. Because of the user burden associated with the large number of privacy decisions people could be faced with, we discuss how new types of privacy assistants could possibly be configured to help people manage these decisions.
Article
Full-text available
The exponential growth of big data and deep learning has increased the data exchange traffic in society. Machine Learning as a Service, (MLaaS) which leverages deep learning techniques for predictive analytics to enhance decision-making, has become a hot commodity. However, the adoption of MLaaS introduces data privacy challenges for data owners and security challenges for deep learning model owners. Data owners are concerned about the safety and privacy of their data on MLaaS platforms, while MLaaS platform owners worry that their models could be stolen by adversaries who pose as clients. Consequently, Privacy-Preserving Deep Learning (PPDL) arises as a possible solution to this problem. Recently, several papers about PPDL for MLaaS have been published. However, to the best of our knowledge, no previous paper has summarized the existing literature on PPDL and its specific applicability to the MLaaS environment. In this paper, we present a comprehensive survey of privacy-preserving techniques, starting from classical privacy-preserving techniques to well-known deep learning techniques. Additionally, we present a detailed description of PPDL and address the issue of using PPDL for MLaaS. Furthermore, we undertake detailed comparisons between state-of-the-art PPDL methods. Subsequently, we classify an adversarial model on PPDL by highlighting possible PPDL attacks and their potential solutions. Ultimately, our paper serves as a single point of reference for detailed knowledge on PPDL and its applicability to MLaaS environments for both new and experienced researchers.
Article
Full-text available
Smartphone users are often unaware of mobile applications’ (“apps”) third-party data collection and sharing practices, which put them at higher risk of privacy breaches. One way to raise awareness of these practices is by providing unobtrusive but pervasive visualizations that can be presented in a glanceable manner. In this paper, we applied Wogalter et al.’s Communication-Human Information Processing model (C-HIP) to design and prototype eight different visualizations that depict smartphone apps’ data sharing activities. We varied the granularity and type (i.e., data-centric or app-centric) of information shown to users and used the screensaver/lock screen as a design probe. Through interview-based design probes with Android users (n=15), we investigated the aspects of the data exposure visualizations that influenced users’ comprehension and privacy awareness. Our results shed light on how users’ perceptions of privacy boundaries influence their preference regarding the information structure of these visualizations, and the tensions that exist in these visualizations between glanceability and granularity. We discuss how a pervasive, soft paternalistic approach to privacy-related visualization may raise awareness by enhancing the transparency of information flow, thereby, unobtrusively increasing users’ understanding of data sharing practices of mobile apps. We also discuss implications for privacy research and glanceable security.
Conference Paper
Full-text available
Today’s environment of data-driven business models relies heavily on collecting as much personal data as possible. Besides being protected by governmental regulation, internet users can also try to protect their privacy on an individual basis. One of the most famous ways to accomplish this, is to use privacy-enhancing technologies (PETs). However, the number of users is particularly important for the anonymity set of the service. The more users use the service, the more difficult it will be to trace an individual user. There is a lot of research determining the technical properties of PETs like Tor or JonDonym, but the use behavior of the users is rarely considered, although it is a decisive factor for the acceptance of a PET. Therefore, it is an important driver for increasing the user base. We undertake a first step towards understanding the use behavior of PETs employing a mixed-method approach. We conducted an online survey with 265 users of the anonymity services Tor and JonDonym (124 users of Tor and 141 users of JonDonym). We use the technology acceptance model as a theoretical starting point and extend it with the constructs perceived anonymity and trust in the service in order to take account for the specific nature of PETs. Our model explains almost half of the variance of the behavioral intention to use the two PETs. The results indicate that both newly added variables are highly relevant factors in the path model. We augment these insights with a qualitative analysis of answers to open questions about the users’ concerns, the circumstances under which they would pay money and choose a paid premium tariff (only for JonDonym), features they would like to have and why they would or would not recommend Tor/JonDonym. Thereby, we provide additional insights about the users’ attitudes and perceptions of the services and propose new use factors not covered by our model for future research.
Article
Full-text available
Deep learning is one of the advanced approaches of machine learning, and has attracted a growing attention in the recent years. It is used nowadays in different domains and applications such as pattern recognition, medical prediction, and speech recognition. Differently from traditional learning algorithms, deep learning can overcome the dependency on hand-designed features. Deep learning experience is particularly improved by leveraging powerful infrastructures such as clouds and adopting collaborative learning for model training. However, this comes at the expense of privacy, especially when sensitive data are processed during the training and the prediction phases, as well as when training model is shared. In this paper, we provide a review of the existing privacy-preserving deep learning techniques, and propose a novel multi-level taxonomy, which categorizes the current state-of-the-art privacy-preserving deep learning techniques on the basis of privacy-preserving tasks at the top level, and key technological concepts at the base level. This survey further summarizes evaluation results of the reviewed solutions with respect to defined performance metrics. In addition, it derives a set of learned lessons from each privacy-preserving task. Finally, it highlights open research challenges and provides some recommendations as future research directions.
Article
Full-text available
Due to an increasing collection of personal data by internet companies and several data breaches, research related to privacy gained importance in the last years in the information systems domain. Privacy concerns can strongly influence users' decision to use a service. The Internet Users Information Privacy Concerns (IUIPC) construct is one operationalization to measure the impact of privacy concerns on the use of technologies. However, when applied to a privacy enhancing technology (PET) such as an anonymization service the original rationales do not hold anymore. In particular, an inverted impact of trusting and risk beliefs on behavioral intentions can be expected. We show that the IUIPC model needs to be adapted for the case of PETs. In addition, we extend the original causal model by including trusting beliefs in the anonymization service itself as well as a measure for privacy literacy. A survey among 124 users of the anonymization service Tor shows that trust in Tor has a statistically significant effect on the actual use behavior of the PET. In addition, the results indicate that privacy literacy has a negative impact on trusting beliefs in general and a positive effect on trust in Tor.
Conference Paper
Full-text available
Defects in requirements specifications can have severe consequences during the software development lifecycle. Some of them result in overall project failure due to incorrect or missing quality characteristics such as security. There are several concerns that make security difficult to deal with; for instance, (1) when stakeholders discuss general requirements in (review) meetings, they are often not aware that they should also discuss security-related topics, and (2) they typically do not have enough security expertise. These concerns become even more challenging in agile development contexts, where lightweight documentation is typically involved. The goal of this paper is to design and evaluate an approach to support reviewing security-related aspects in agile requirements specifications of web applications. The designed approach considers user stories and security specifications as input and relates those user stories to security properties via Natural Language Processing (NLP) techniques. Based on the related security properties, our approach then identifies high-level security requirements from the Open Web Application Security Project (OWASP) to be verified and generates a focused reading techniques to support reviewers in detecting detects. We evaluate our approach via two controlled experiment trials. We compare the effectiveness and efficiency of novice inspectors verifying security aspects in agile requirements using our reading technique against using the complete list of OWASP high-level security requirements. The (statistically significant) results indicate that using the reading technique has a positive impact (with very large effect size) on the performance of inspectors in terms of effectiveness and efficiency.
Article
Full-text available
The concept of cloud computing relies on central large datacentres with huge amounts of computational power. The rapidly growing Internet of Things with its vast amount of data showed that this architecture produces costly, inefficient and in some cases infeasible communication. Thus, fog computing, a new architecture with distributed computational power closer to the IoT devices was developed. So far, this decentralised fog-oriented architecture has only been used for performance and resource management improvements. We show how it could also be used for improving the users’ privacy. For that purpose, we map privacy patterns to the IoT / fog computing / cloud computing architecture. Privacy patterns are software design patterns with the focus to translate “privacy-by-design” into practical advice. As a proof of concept, for each of the used privacy patterns we give an example from a smart vehicle scenario to illustrate how the patterns could improve the users’ privacy.
Article
Full-text available
Abstract Incredible amounts of data is being generated by various organizations like hospitals, banks, e-commerce, retail and supply chain, etc. by virtue of digital technology. Not only humans but machines also contribute to data in the form of closed circuit television streaming, web site logs, etc. Tons of data is generated every minute by social media and smart phones. The voluminous data generated from the various sources can be processed and analyzed to support decision making. However data analytics is prone to privacy violations. One of the applications of data analytics is recommendation systems which is widely used by ecommerce sites like Amazon, Flip kart for suggesting products to customers based on their buying habits leading to inference attacks. Although data analytics is useful in decision making, it will lead to serious privacy concerns. Hence privacy preserving data analytics became very important. This paper examines various privacy threats, privacy preservation techniques and models with their limitations, also proposes a data lake based modernistic privacy preservation technique to handle privacy preservation in unstructured data.
Conference Paper
Full-text available
Trends show that privacy concerns are rising, but end users are not armed with enough mechanisms to protect themselves. Privacy enhancing technologies (PETs) or more specifically, tools (PET-tools) are one of the mechanisms that could help users in this sense. These tools, however, reportedly have low adoption rates, and users tend to be reluctant to integrate them into their daily use of the Internet. Detailed scrutiny of current research on PET-tools, however, can guide future research to help overcome low adoption of these tools. We conducted a literature review on PET-tools to enumerate the types of tools available and how they are being evaluated, in order to shed more light on the missing elements in their evaluations. We reviewed and coded 72 articles in the PET-tool literature. Our results highlight two important issues: 1. Evaluation of most tools is performed using only artificial, summative and ex-post strategies; 2. While usability evaluation is quite common, evaluation of enhanced privacy is lacking. This research hopes to contribute to better PET-tool development, and encourage the inclusion of users in the evaluation and design process.
Article
Full-text available
Cloud computing is a revolutionary mechanism that changing way to enterprise hardware and software design and procurements. Because of cloud simplicity everyone is moving data and application software to cloud data centers. The Cloud service provider (CSP) should ensure integrity, availability, privacy and confidentiality but CSP is not providing reliable data services to customer and to stored customer data. This study identifies the issues related to the cloud data storage such as data breaches, data theft, and unavailability of cloud data. Finally, we are providing possible solutions to respective issues in cloud.
Article
Full-text available
This study attempts to answer the main research question: ‘Do security and privacy perceptions affect customers’ trust to accept and use internet banking technology to perform their banking transactions?’ This study examined the factors that affected Jordanian customers’ trust to accept internet banking services. Path analysis was used to analyze 198 responses where results suggested that the hypothesized model was an accurate reflection of the factors that affect trust to accept and use internet banking services. Results indicated that trust has a positive effect on behavioral intention to use internet banking services as its usefulness, security and privacy perceptions significantly influenced the perceived trust. Finally, perceived ease of use had failed to predict Jordanians’ intention to use internet banking.
Article
Full-text available
Automated data-driven decision systems are ubiquitous across a wide variety of online services, from online social networking and e-commerce to e-government. These systems rely on complex learning methods and vast amounts of data to optimize the service functionality, satisfaction of the end user and profitability. However, there is a growing concern that these automated decisions can lead to user discrimination, even in the absence of intent. In this paper, we introduce fairness constraints, a mechanism to ensure fairness in a wide variety of classifiers in a principled manner. Fairness prevents a classifier from outputting predictions correlated with certain sensitive attributes in the data. We then instantiate fairness constraints on three well-known classifiers -- logistic regression, hinge loss and support vector machines (SVM) -- and evaluate their performance in a real-world dataset with meaningful sensitive human attributes. Experiments show that fairness constraints allow for an optimal trade-off between accuracy and fairness.
Article
Full-text available
This paper extends the unified theory of acceptance and use of technology (UTAUT) to study acceptance and use of technology in a consumer context. Our proposed UTAUT2 incorporates three constructs into UTAUT: hedonic motivation, price value, and habit. Individual differences — namely, age, gender, and experience — are hypothesized to moderate the effects of these constructs on behavioral intention and technology use. Results from a two-stage online survey, with technology use data collected four months after the first survey, of 1,512 mobile Internet consumers supported our model. Compared to UTAUT, the extensions proposed in UTAUT2 produced a substantial improvement in the variance explained in behavioral intention (56 percent to 74 percent) and technology use (40 percent to 52 percent). The theoretical and managerial implications of these results are discussed.
Article
Full-text available
The lack of consumer confidence in information privacy has been identified as a major problem hampering the growth of e-commerce. Despite the importance of understanding the nature of online consumers' concerns for information privacy, this topic has received little attention in the information systems community. To fill the gap in the literature, this article focuses on three distinct, yet closely related, issues. First, drawing on social contract theory, we offer a theoretical framework on the dimensionality of Internet users' information privacy concerns (IUIPC). Second, we attempt to operationalize the multidimensional notion of IUIPC using a second-order construct, and we develop a scale for it. Third, we propose and test a causal model on the relationship between IUIPC and behavioral intention toward releasing personal information at the request of a marketer. We conducted two separate field surveys and collected data from 742 household respondents in one-on-one, face-to-face interviews. The results of this study indicate that the second-order IUIPC factor, which consists of three first-order dimensions--namely, collection, control, and awareness--exhibited desirable psychometric properties in the context of online privacy. In addition, we found that the causal model centering on IUIPC fits the data satisfactorily and explains a large amount of variance in behavioral intention, suggesting that the proposed model will serve as a useful tool for analyzing online consumers' reactions to various privacy threats on the Internet.
Article
Full-text available
Introduced the statistic kappa to measure nominal scale agreement between a fixed pair of raters. Kappa was generalized to the case where each of a sample of 30 patients was rated on a nominal scale by the same number of psychiatrist raters (n = 6), but where the raters rating 1 s were not necessarily the same as those rating another. Large sample standard errors were derived.
Article
Full-text available
Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, 1986. Includes bibliographical references (leaves 233-250). Photocopy.
Technical Report
The aim of this study is the technical evaluation of de-identification methods based on decentralization, in particular methods of distributed and federated learning for personal data in concrete use cases in the mobility domain. The General Data Protection Regulation (GDPR) has significantly increased the incentive and effort for companies to process personal data in compliance with the law. This includes the creation, distribution, storage and deletion of personal data. Non- compliance with the GDPR and other legislation now poses a significant financial risk to companies that work with personal data. With a substancial increase in computing power at the users’ side, distributed and federated learning techniques provide a promising path for de-identification of personal data. Such methods and techniques enable organizations to store and process sensitive user data locally. To do so, a sub-model of the main model that processes data is stored in the local environment of the users. Since only necessary updates are transmitted between the submodel and the main model, two advantages can be achieved from this approach. First, there is no central database, which makes it immensely difficult for potential attackers to obtain large amounts of data. Second, only fragments of the locally stored data are transferred to the main model. In the first work package of this report, suitable use cases for this study are identified through a scientific literature review. The following use cases are identified and analyzed with regard to data, benefits, model and sensible data: Traffic flow prediction, Energy demand prediction, Eco-routing, Autonomous driving, Vehicular object detection, Parking space estimation.
Conference Paper
Vehicles are becoming interconnected and autonomous while collecting, sharing and processing large amounts of personal, and private data. When developing a service that relies on such data, ensuring privacy preserving data sharing and processing is one of the main challenges. Often several entities are involved in these steps and the interested parties are manifold. To ensure data privacy, a variety of different de-identification techniques exist that all exhibit unique peculiarities to be considered. In this paper, we show at the example of a location-based service for weather prediction of an energy grid operator, how the different de-identification techniques can be evaluated. With this, we aim to provide a better understanding of state-of-the-art de-identification techniques and the pitfalls to consider by implementation. Finally, we find that the optimal technique for a specific service depends highly on the scenario specifications and requirements.
Chapter
Data privacy regulations pose an obstacle to healthcare centres and hospitals to share medical data with other organizations, which in turn impedes the process of building deep learning models in the healthcare domain. Distributed deep learning methods enable deep learning models to be trained without the need for sharing data from these centres while still preserving the privacy of the data at these centres. In this paper, we compare three privacy-preserving distributed learning techniques: federated learning, split learning, and SplitFed. We use these techniques to develop binary classification models for detecting tuberculosis from chest X-rays and compare them in terms of classification performance, communication and computational costs, and training time. We propose a novel distributed learning architecture called SplitFedv3, which performs better than split learning and SplitFedv2 in our experiments. We also propose alternate mini-batch training, a new training technique for split learning, that performs better than alternate client training, where clients take turns to train a model.
Chapter
The German Corona-Warn-App (CWA) is one of the most controversial tools to mitigate the Corona virus spread with roughly 25 million users. In this study, we investigate individuals’ knowledge about the CWA and associated privacy concerns alongside different demographic factors. For that purpose, we conducted a study with 1752 participants in Germany to investigate knowledge and privacy concerns of users and non-users of the German CWA. We investigate the relationship between knowledge and privacy concerns and analyze the demographic effects on both.
Chapter
[Motivation] Artificial intelligence (AI) creates many opportunities for public institutions, but the unethical use of AI in public services can reduce citizens’ trust. [Question] The aim of this study was to identify what kind of requirements citizens have for trustworthy AI services in the public sector. The study included 21 interviews and a design workshop of four public AI services. [Results] The main finding was that all the participants wanted public AI services to be transparent. This transparency requirement covers a number of questions that trustworthy AI services must answer, such as about their purposes. The participants also asked about the data used in AI services and from what sources the data were collected. They pointed out that AI must provide easy-to-understand explanations. We also distinguished two other important requirements: controlling personal data usage and involving humans in AI services. [Contribution] For practitioners, the paper provides a list of questions that trustworthy public AI services should answer. For the research community, it illuminates the transparency requirement of AI systems from the perspective of citizens.
Conference Paper
Privacy sensitive information (PSI) detection tools have the potential to help users protect their privacy when posting information online, i. e. they can identify when a social media post contains information that users could later regret sharing. However, although users consider this type of tools useful, previous research indicates that the intention of using them is not very high. In this paper, we conduct a user survey (n=147) to investigate the factors that influence the intention to use a PSI detection tool. The results of a logistic regression analysis indicate a positive association of intention to use a PSI detection tool with performance expectation, social influence, and perception of accuracy of the tool. In addition, intention is negatively associated with privacy concerns related to the tool itself and with the participants' self-perceived ability to protect their own privacy. On the other hand, we did not find significant association with the participants' demographic characteristics or social media posting experience. We discuss these findings in the context of the design and development of PSI detection tools.
Article
The growing number of mobile and IoT devices has nourished many intelligent applications. In order to produce high-quality machine learning models, they constantly access and collect rich personal data such as photos, browsing history and text messages. However, direct access to personal data has raised increasing public concerns about privacy risks and security breaches. To address these concerns, there are two emerging solutions to privacy-preserving machine learning, namely local differential privacy and federated machine learning. The former is a distributed data collection strategy where each client perturbs data locally before submitting to the server, whereas the latter is a distributed machine learning strategy to train models on mobile devices locally and merge their output (e.g., parameter updates of a model) through a control protocol. In this paper, we conduct a comparative study on the efficiency and privacy of both solutions. Our results show that in a standard population and domain setting, both can achieve an optimal misclassification rate lower than 20% and federated machine learning generally performs better at the cost of higher client CPU usage. Nonetheless, local differential privacy can benefit more from a larger client population (> 1k). As for privacy guarantee, local differential privacy also has flexible control over the data leakage.
Chapter
[Context and motivation] Modern software engineering processes have shifted from traditional upfront requirements engineering (RE) to a more continuous way of conducting RE, particularly including data-driven approaches. [Question/problem] However, current research on data-driven RE focuses more on leveraging certain techniques such as natural language processing or machine learning than on making the concept fit for facilitating its use in the entire software development process. [Principal ideas/results] In this paper, we propose a research agenda composed of six distinct research directions. These include a data-driven RE infrastructure, embracing data heterogeneity, context-aware adaptation, data analysis and decision support, privacy and confidentiality, and finally process integration. Each of these directions addresses challenges that impede the broader use of data-driven RE. [Contribution] For researchers, our research agenda provides topics relevant to investigate. For practitioners, overcoming the underlying challenges with the help of the proposed research will allow to adopt a data-driven RE approach and facilitate its seamless integration into modern software engineering. For users, the proposed research will enable the transparency, control, and security needed to trust software systems and software providers.
Article
This study aims to examine the key factors that may hinder or facilitate the adoption of mobile banking services in a cross-cultural context. A conceptual framework was developed through extending the Unified Theory of Acceptance and Use of Technology UTAUT2 by incorporating three additional constructs, namely trust (TR), security (PS) and privacy (PP). Data were collected using an online survey and a self-administrated questionnaire from 901 mobile banking users who were either Lebanese or English. These were analysed using structural equation modelling based on AMOS 23.0. The results of this analysis indicated that behavioural intention towards adoption of mobile banking services was influenced by habit (HB), perceived security (PS), perceived privacy (PP) and trust (TR) for both the Lebanese and English consumers. In addition, performance expectancy (PE) was a significant predictor in Lebanon but not in England; whereas price value (PV) was significant in England but not in Lebanon. Contrary to our expectation, Social Influence (SI) and Hedonic Motivations (HM) were insignificant for both the Lebanese and English consumers. Overall, the proposed model achieved acceptable fit and explained 78% of the variance for the Lebanese sample and 83% for the English sample – both of which are higher than that of the original UTAUT2. These findings are expected to help policy makers and bank directors understand the issues facing mobile banking adoption in different cultural settings. Subsequently, they will help guide them in formulating appropriate strategies to improve the uptake of mobile banking activities. As the low mobile banking adoption rate in Lebanon can be attributed to the novelty of this technology, the Lebanese banking sector stands to greatly benefit from this study.
Article
For privacy concerns to be addressed adequately in today's machine-learning (ML) systems, the knowledge gap between the ML and privacy communities must be bridged. This article aims to provide an introduction to the intersection of both fields with special emphasis on the techniques used to protect the data.
Conference Paper
The recent breakthroughs in Artificial Intelligence (AI) have allowed individuals to rely on automated systems for a variety of reasons. Some of these systems are the currently popular voice-enabled systems like Echo by Amazon and Home by Google that are also called as Intelligent Personal Assistants (IPAs). Though there are rising concerns about privacy and ethical implications, users of these IPAs seem to continue using these systems. We aim to investigate to what extent users are concerned about privacy and how they are handling these concerns while using the IPAs. By utilizing the reviews posted online along with the responses to a survey, this paper provides a set of insights about the detected markers related to user interests and privacy challenges. The insights suggest that users of these systems irrespective of their concerns about privacy, are generally positive in terms of utilizing IPAs in their everyday lives. However, there is a significant percentage of users who are concerned about privacy and take further actions to address related concerns. Some percentage of users expressed that they do not have any privacy concerns but when they learned about the "always listening" feature of these devices, their concern about privacy increased.
Article
Interactive Machine Learning (IML) seeks to complement human perception and intelligence by tightly integrating these strengths with the computational power and speed of computers. The interactive process is designed to involve input from the user but does not require the background knowledge or experience that might be necessary to work with more traditional machine learning techniques. Under the IML process, non-experts can apply their domain knowledge and insight over otherwise unwieldy datasets to find patterns of interest or develop complex data-driven applications. This process is co-adaptive in nature and relies on careful management of the interaction between human and machine. User interface design is fundamental to the success of this approach, yet there is a lack of consolidated principles on how such an interface should be implemented. This article presents a detailed review and characterisation of Interactive Machine Learning from an interactive systems perspective. We propose and describe a structural and behavioural model of a generalised IML system and identify solution principles for building effective interfaces for IML. Where possible, these emergent solution principles are contextualised by reference to the broader human-computer interaction literature. Finally, we identify strands of user interface research key to unlocking more efficient and productive non-expert interactive machine learning applications.
Article
This study extends privacy concerns research by providing a test of a model inspired by the ‘Antecedents – Privacy Concerns – Outcomes’ (APCO) framework. Focusing at the individual level of analysis, the study examines the influences of privacy awareness (PA) and demographic variables (age, gender) on concern for information privacy (CFIP). It also considers CFIP’s relationship to privacy-protecting behaviours and incorporates trust and risk into the model. These relationships are tested in a specific, Facebook-related context. Results strongly support the overall model. PA and gender are important explanators for CFIP, which in turn explains privacy-protecting behaviours. We also find that perceived risk affects trust, which in turn affects behaviours in the studied context. The results yield several recommendations for future research as well as some implications for management.
Conference Paper
We describe theoretical development of a user acceptance model for anonymous credentials and its evaluation in a real-world trial. Although anonymous credentials and other advanced privacy-enhanced technologies (PETs) reached technical maturity , they are not widely adopted so far, such that understanding user adoption factors is one of the most important goals on the way to better privacy management with the help of PETs. Our model integrates the Technology Acceptance Model (TAM) with the considerations that are specific for security-and privacy-enhancing technologies, in particular, with their " secondary goal " property that means that these technologies are expected to work in the background, facilitating the execution of users' primary, functional goals. We introduce five new constructs into the TAM: Perceived Usefulness for the Primary Task (PU1), Perceived Usefulness for the Secondary Task (PU2), Situation Awareness, Perceived Anonymity and Understanding of the PET. We conduct an evaluation of our model in the concrete scenario of a university course evaluation. Although the sample size (30 participants) is prohibitively small for deeper statistical analysis such as multiple regressions or structural equation modeling, we are still able to derive useful conclusions from the correlation analysis of the constructs in our model. Especially, PU1 is the most important factor of user adoption, outweighing the usability and the usefulness of the deployed PET (PU2). Moreover, correct Understanding of the underlying PET seems to play a much less important role than a user interface of the system that clearly conveys to the user which data are transmitted when and to which party (Situation Awareness).
Article
Scholars in various disciplines have considered the causes, nature, and effects of trust. Prior approaches to studying trust are considered, including characteristics of the trustor, the trustee, and the role of risk. A definition of trust and a model of its antecedents and outcomes are presented, which integrate research from multiple disciplines and differentiate trust from similar constructs. Several research propositions based on the model are presented.
Article
Several recent surveys conclude that people are concerned about privacy and consider it to be an important factor in their online decision making. This paper reports on a study in which (1) user concerns were analysed more deeply and (2) what users said was contrasted with what they did in an experimental e-commerce scenario. Eleven independent variables were shown to affect the online behavior of at least some groups of users. Most significant were trust marks present on web pages and the existence of a privacy policy, though users seldom consulted the policy when one existed. We also find that many users have inaccurate perceptions of their own knowledge about privacy technology and vulnerabilities, and that important user groups, like those similar to the Westin “privacy fundamentalists”, do not appear to form a cohesive group for privacy-related decision making.In this study we adopt an experimental economic research paradigm, a method for examining user behavior which challenges the current emphasis on survey data. We discuss these issues and the implications of our results on user interpretation of trust marks and interaction design. Although broad policy implications are beyond the scope of this paper, we conclude by questioning the application of the ethical/legal doctrine of informed consent to online transactions in the light of the evidence that users frequently do not consult privacy policies.
Conference Paper
Long-term personal GPS data is useful for many UbiComp services such as traffic monitoring and environmental impact assessment. However, inference attacks on such traces can reveal private information including home addresses and schedules. We asked 32 participants from 12 households to collect 2 months of GPS data, and showed it to them in visualizations. We explored if they understood how their individual privacy concerns mapped onto 5 location obfuscation schemes (which they largely did), which obfuscation schemes they were most comfortable with (Mixing, Deleting data near home, and Randomizing), how they monetarily valued their location data, and if they consented to share their data publicly. 21/32 gave consent to publish their data, though most households' members shared at different levels, which indicates a lack of awareness of privacy interrelationships. Grounded in real decisions about real data, our findings highlight the potential for end-user involvement in obfuscation of their own location data.
Article
Information privacy has been called one of the most important ethical issues of the informa-tion age. Public opinion polls show rising levels of concern about privacy among Americans. Against this backdrop, research into issues associated with information privacy is increasing. Based on a number of preliminary studies, it has become apparent that organizational practices, individuals' perceptions of these practices, and societal responses are inextricably linked in many ways. Theories regarding these relationships are slowly emerging. Unfortunately, researchers attempting to examine such relationships through confirmatory empirical approaches may be impeded by the lack of validated instruments for measuring individuals' concerns about organizational information privacy practices. To enable future studies in the information privacy research stream, we developed and validated an instrument that identifies and measures the primary dimensions of individuals' concerns about organizational information privacy practices. The development process included examinations of privacy literature; experience surveys and focus groups; and the use of expert judges. The result was a parsimonious 15-item instrument with four sub-scales tapping into dimensions of individuals' concerns about organizational information privacy practices. The instrument was rigorously tested and validated across several heterogenous populations, providing a high degree of confidence in the scales' validity, reliability, and generalizability.
Article
This paper presents a general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies. The procedure essentially involves the construction of functions of the observed proportions which are directed at the extent to which the observers agree among themselves and the construction of test statistics for hypotheses involving these functions. Tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interobserver agreement are developed as generalized kappa-type statistics. These procedures are illustrated with a clinical diagnosis example from the epidemiological literature.
Privacy-preserving techniques of genomic data-a survey
  • Al Md Momin
  • Md Nazmus Aziz
  • Dima Sadat
  • Shuang Alhadidi
  • Xiaoqian Wang
  • Cheryl L Jiang
  • Noman Brown
  • Mohammed
  • Aziz Momin Al
Md Momin Al Aziz, Md Nazmus Sadat, Dima Alhadidi, Shuang Wang, Xiaoqian Jiang, Cheryl L Brown, and Noman Mohammed. 2019. Privacy-preserving techniques of genomic data-a survey. Briefings in bioinformatics 20, 3 (2019), 887-895.
Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies
  • D John
  • Brian Mac Kelleher
  • Namee
John D Kelleher, Brian Mac Namee, and Aoife D'arcy. 2020. Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. MIT press.
A generic framework for privacy preserving deep learning
  • Theo Ryffel
  • Andrew Trask
  • Morten Dahl
  • Bobby Wagner
  • Jason Mancuso
  • Daniel Rueckert
  • Jonathan Passerat-Palmbach
  • Ryffel Theo
Theo Ryffel, Andrew Trask, Morten Dahl, Bobby Wagner, Jason Mancuso, Daniel Rueckert, and Jonathan Passerat-Palmbach. 2018. A generic framework for privacy preserving deep learning. arXiv preprint arXiv:1811.04017 (2018).
Are All Internet Users’ Information Privacy Concerns (IUIPC) Created Equal?
  • Miaoyi Zeng
  • Shuaifu Lin
  • Deborah Armstrong
  • Zeng Miaoyi
Miaoyi Zeng, Shuaifu Lin, and Deborah Armstrong. 2020. Are All Internet Users' Information Privacy Concerns (IUIPC) Created Equal? AIS TRR 6, 1 (2020), 3.
Privacy-preserving machine learning through data obfuscation
  • Tianwei Zhang
  • Zecheng He
  • Ruby B Lee
  • Zhang Tianwei
Tianwei Zhang, Zecheng He, and Ruby B Lee. 2018. Privacy-preserving machine learning through data obfuscation. arXiv preprint arXiv:1807.01860 (2018).