Article

The cost of reading privacy policies

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Such hindrances to user understanding are aggravated further by the fact that policies are often drafted with flexibility. Another challenge associated with reading policies is the requirement of significant time commitment from users [72]. ...
... We gathered a total of 109 papers, of which 82 were explicitly tailored to the field of privacy policies. The remaining 27 papers were devoted to the use of NLP in general or to its use in fields similar to privacy, and are analyzed to aid in identifying research potential for NLP in the privacy Accessibility Jensen and Potts [58] 2004 -Milne et al. [75] 2006 --McDonald and Cranor [72] 2008 --Schwartz and Solove [101] 2009 --Inglesant and Sasse [55] 2010 --Meiselwitz [73] 2013 --Ermakova et al. [36] 2015 --Reidenberg et al. [92] 2015 --Reidenberg et al. [91] 2016 --Fabian et al. [37] 2017 --Libert [67] 2018 -Habib et al. [47] 2020 -domain. During our analysis, we identified five significant categories of privacy policy research: 'Comprehension challenges, ' 'Non-NLP solutions, ' 'Dataset creation and analysis, ' 'NLP solutions, ' and 'Word embedding model. ...
... Readability and comprehensibility. Privacy policies are arduously long, averaging over 2500 words [72], complicated, and have low readability and comprehension. This discourages users from attempting to read and understand them. ...
Preprint
Natural Language Processing (NLP) is an essential subset of artificial intelligence. It has become effective in several domains, such as healthcare, finance, and media, to identify perceptions, opinions, and misuse, among others. Privacy is no exception, and initiatives have been taken to address the challenges of usable privacy notifications to users with the help of NLP. To this aid, we conduct a literature review by analyzing 109 papers at the intersection of NLP and privacy policies. First, we provide a brief introduction to privacy policies and discuss various facets of associated problems, which necessitate the application of NLP to elevate the current state of privacy notices and disclosures to users. Subsequently, we a) provide an overview of the implementation and effectiveness of NLP approaches for better privacy policy communication; b) identify the methodologies that can be further enhanced to provide robust privacy policies; and c) identify the gaps in the current state-of-the-art research. Our systematic analysis reveals that several research papers focus on annotating and classifying privacy texts for analysis but need to adequately dwell on other aspects of NLP applications, such as summarization. More specifically, ample research opportunities exist in this domain, covering aspects such as corpus generation, summarization vectors, contextualized word embedding, identification of privacy-relevant statement categories, fine-grained classification, and domain-specific model tuning.
... This metric is a means of measuring the difficulty of reading a text written in the English language. In regards to time taken to read a policy, this study adopts the approach of McDonald and Cranor who "assumed an average reading rate of 250 words per minute" [28]. ...
... In 2008, McDonald and Cranor posed an intriguing question: "If website users were to read the privacy policy for each site they visit just once a year, what would their time be worth?" [28]. By analyzing the "word count of the 75 most popular websites", they determined that the "national opportunity cost" of reading privacy policies would be $781 billion dollars. ...
... Across policies for 207,000 sites, the average number of words per-policy is 1,404. Using "an average reading rate of 250 words per minute", an average website policy would require 5.6 minutes to read [28]. This is lower than the average of 10 minutes found by McDonald and Cranor in 2008. ...
Preprint
A dominant regulatory model for web privacy is "notice and choice". In this model, users are notified of data collection and provided with options to control it. To examine the efficacy of this approach, this study presents the first large-scale audit of disclosure of third-party data collection in website privacy policies. Data flows on one million websites are analyzed and over 200,000 websites' privacy policies are audited to determine if users are notified of the names of the companies which collect their data. Policies from 25 prominent third-party data collectors are also examined to provide deeper insights into the totality of the policy environment. Policies are additionally audited to determine if the choice expressed by the "Do Not Track" browser setting is respected. Third-party data collection is wide-spread, but fewer than 15% of attributed data flows are disclosed. The third-parties most likely to be disclosed are those with consumer services users may be aware of, those without consumer services are less likely to be mentioned. Policies are difficult to understand and the average time requirement to read both a given site{\guillemotright}s policy and the associated third-party policies exceeds 84 minutes. Only 7% of first-party site policies mention the Do Not Track signal, and the majority of such mentions are to specify that the signal is ignored. Among third-party policies examined, none offer unqualified support for the Do Not Track signal. Findings indicate that current implementations of "notice and choice" fail to provide notice or respect choice.
... Already in 2008, it was estimated that each US citizen would spend between 181 and 304 hours per year to read (or between 81 and 293 hours per year to skim) privacy notices for each new website they visit (McDonald and Cranor 2008). Some progress has been made to improve privacy notices-for instance, by improving the information offered (eg, Reidenberg et al. 2016, Sánchez et al. 2021 or the user interfaces for communicating the information (eg, Karegar et al. 2020, Schaub et al. 2017. ...
... Establishing coverage is a necessary but not sufficient metarequirement for establishing TIPP. As demonstrated by the practice of posting privacy notices (McDonald andCranor 2008, Sunyaev et al. 2015), presenting consumers with a lot of information leads to information overload and prevents consumers from retrieving information of interest to them (McDonald and Cranor 2008, Milne and Culnan 2004, Sheng and Simpson 2014. Consequently, establishing TIPP also requires adaptivity. ...
... Establishing coverage is a necessary but not sufficient metarequirement for establishing TIPP. As demonstrated by the practice of posting privacy notices (McDonald andCranor 2008, Sunyaev et al. 2015), presenting consumers with a lot of information leads to information overload and prevents consumers from retrieving information of interest to them (McDonald and Cranor 2008, Milne and Culnan 2004, Sheng and Simpson 2014. Consequently, establishing TIPP also requires adaptivity. ...
Preprint
Full-text available
The rising diffusion of information systems (IS) throughout society poses an increasingly serious threat to privacy as a social value. One approach to alleviating this threat is to establish transparency of information privacy practices (TIPP) so that consumers can better understand how their information is processed. However, the design of transparency artifacts (eg, privacy notices) has clearly not followed this approach, given the ever-increasing volume of information processing. Hence, consumers face a situation where they cannot see the 'forest for the trees' when aiming to ascertain whether information processing meets their privacy expectations. A key problem is that overly comprehensive information presentation results in information overload and is thus counterproductive for establishing TIPP. We depart from the extant design logic of transparency artifacts and develop a theoretical foundation (TIPP theory) for transparency artifact designs useful for establishing TIPP from the perspective of privacy as a social value. We present TIPP theory in two parts to capture the sociotechnical interplay. The first part translates abstract knowledge on the IS artifact and privacy into a description of social subsystems of transparency artifacts, and the second part conveys prescriptive design knowledge in form of a corresponding IS design theory. TIPP theory establishes a bridge from the complexity of the privacy concept to a metadesign for transparency artifacts that is useful to establish TIPP in any IS. In essence, transparency artifacts must accomplish more than offering comprehensive information; they must also be adaptive to the current information needs of consumers.
... Motivations. Unfortunately, prior studies indicate that 91% of users typically agree to privacy policies by simply clicking the checkbox, while skipping reading their contents [24,43]. What's even worse, in the current VR ecosystem, our preliminary analysis (Section 3) reveals that for a significant proportion of VR apps, their privacy policies neither meet the legal requirements nor satisfy user expectations, especially in the following three aspects. ...
... Early stage of privacy policy in the VR ecosystem. It is reported that 91% users agree to the privacy policy by clicking on the checkbox without necessarily reading it [24,43]. This phenomenon will have a more negative impact on VR scenarios: firstly, VR apps harvest more information about the user than existing mobile apps; secondly, some mainstream VR platforms pay little attention to vetting privacy policies and several platforms even do not require app developers to provide privacy policies when publishing apps. ...
Preprint
Full-text available
Virtual reality (VR) apps can harvest a wider range of user data than web/mobile apps running on personal computers or smartphones. Existing law and privacy regulations emphasize that VR developers should inform users of what data are collected/used/shared (CUS) through privacy policies. However, privacy policies in the VR ecosystem are still in their early stages, and many developers fail to write appropriate privacy policies that comply with regulations and meet user expectations. In this paper, we propose VPVet to automatically vet privacy policy compliance issues for VR apps. VPVet first analyzes the availability and completeness of a VR privacy policy and then refines its analysis based on three key criteria: granularity, minimization, and consistency of CUS statements. Our study establishes the first and currently largest VR privacy policy dataset named VRPP, consisting of privacy policies of 11,923 different VR apps from 10 mainstream platforms. Our vetting results reveal severe privacy issues within the VR ecosystem, including the limited availability and poor quality of privacy policies, along with their coarse granularity, lack of adaptation to VR traits and the inconsistency between CUS statements in privacy policies and their actual behaviors. We open-source VPVet system along with our findings at repository https://github.com/kalamoo/PPAudit, aiming to raise awareness within the VR community and pave the way for further research in this field.
... Social media networks in particular display network effects which have made it impossible for a real marketplace of choices to operate, displacing consumer choice. Privacy policies have become notorious for inordinate lengthiness (Loos & Luzak, 2016) and requiring reading comprehension abilities at university level (Edwards & Brown, 2013;Jensen & Potts, 2004;McDonald & Faith Cranor, 2008), and users have no incentive to read them anyway (Obar & Oeldorf-Hirsch, 2020) as they often change frequently without additional consents sought. ...
Article
Full-text available
Large or “foundation” models are now being widely used to generate not just text and images but also video, music and code from prompts. Although this “generative AI” revolution is clearly driving new opportunities for innovation and creativity, it is also enabling easy and rapid dissemination of harmful speech and potentially infringing existing laws. Much attention has been paid recently to how we can draft bespoke legislation to control these risks and harms; however, private ordering by generative AI providers, via user contracts, licenses and privacy policies, has so far attracted less attention. Drawing on the extensive history of study of the terms and conditions (T&C) and privacy policies of social media companies, this paper reports the results of pilot empirical work conducted in January–March 2023, in which T&C were mapped across a representative sample of generative AI. With the focus on copyright and data protection, our early findings indicate the emergence of a “platformisation paradigm,” in which providers of generative AI attempt to position themselves as neutral intermediaries. This study concludes that new laws targeting “big tech” must be carefully reconsidered to avoid repeating past power imbalances between users and platforms.
... Although privacy policies have become the primary privacy notice approach for mobile applications [21,30,31,36,37,44,65], their presentation and readability have always been criticized [28,57]. To improve the usability, Kelly et al. [41,42,43] introduced the privacy nutrition labels, or privacy labels, designed to facilitate consumers' understanding of how their information is collected and utilized in a concise and structured manner. ...
Preprint
Full-text available
Privacy regulations mandate that developers must provide authentic and comprehensive privacy notices, e.g., privacy policies or labels, to inform users of their apps' privacy practices. However, due to a lack of knowledge of privacy requirements, developers often struggle to create accurate privacy notices, especially for sophisticated mobile apps with complex features and in crowded development teams. To address these challenges, we introduce Privacy Bills of Materials (PriBOM), a systematic software engineering approach that leverages different development team roles to better capture and coordinate mobile app privacy information. PriBOM facilitates transparency-centric privacy documentation and specific privacy notice creation, enabling traceability and trackability of privacy practices. We present a pre-fill of PriBOM based on static analysis and privacy notice analysis techniques. We demonstrate the perceived usefulness of PriBOM through a human evaluation with 150 diverse participants. Our findings suggest that PriBOM could serve as a significant solution for providing privacy support in DevOps for mobile apps.
... Most participants admitted that they did not read the terms of service, license agreements, or privacy policies of VR apps, a known phenomenon also found in users' interactions with websites and mobile apps [58,85]. A few participants acknowledged that they did not pay attention to the permissions they were granting to VR apps. ...
Article
Full-text available
The immersive nature of Virtual Reality (VR) and its reliance on sensory devices like head-mounted displays introduce privacy risks to users. While earlier research has explored users' privacy concerns within VR environments, less is known about users' comprehension of VR data practices and protective behaviors; the expanding VR market and technological progress also necessitate a fresh evaluation. We conducted semi-structured interviews with 20 VR users, showing their diverse perceptions regarding the types of data collected and their intended purposes. We observed privacy concerns in three dimensions: institutional, social, and device-specific. Our participants sought to protect their privacy through considerations when selecting the device, scrutinizing VR apps, and selective engagement in different VR interactions. We contrast our findings with observations from other technologies and ecosystems, shedding light on how VR has altered the privacy landscape for end-users. We further offer recommendations to alleviate users' privacy concerns, rectify misunderstandings, and encourage the adoption of privacy-conscious behaviors.
... When evaluating whether to try a new product, users cannot easily understand terms of service (TOS) and privacy policies, which are often incomprehensible in practice, exceeding reasonable standards of length and readability [96]. Research has shown that very few people read privacy policies and terms of service, and that reading these for all the apps that a person uses would take hundreds of hours a year [102]. Even if users did somehow read all the available information about how their data is being handled, it would be unrealistic to expect them to assimilate this information and act in their best interest based on it, given the vast amounts of data generated over an extended period of time [3]. ...
Preprint
End-to-end encryption (E2EE) has become the gold standard for securing communications, bringing strong confidentiality and privacy guarantees to billions of users worldwide. However, the current push towards widespread integration of artificial intelligence (AI) models, including in E2EE systems, raises some serious security concerns. This work performs a critical examination of the (in)compatibility of AI models and E2EE applications. We explore this on two fronts: (1) the integration of AI "assistants" within E2EE applications, and (2) the use of E2EE data for training AI models. We analyze the potential security implications of each, and identify conflicts with the security guarantees of E2EE. Then, we analyze legal implications of integrating AI models in E2EE applications, given how AI integration can undermine the confidentiality that E2EE promises. Finally, we offer a list of detailed recommendations based on our technical and legal analyses, including: technical design choices that must be prioritized to uphold E2EE security; how service providers must accurately represent E2EE security; and best practices for the default behavior of AI features and for requesting user consent. We hope this paper catalyzes an informed conversation on the tensions that arise between the brisk deployment of AI and the security offered by E2EE, and guides the responsible development of new AI features.
... Indeed, Obar and Oeldorf-Hirsch find that the vast majority of people do not even read such documents [40], with all participants in a user study accepting terms including handing over their first-born child to use a social network site. McDonald and Cranor measure the economic cost of reading lengthy policies [31], noting the inequity of expecting people to spend an average of ten minutes of their time reading and comprehending a complex document in order to use a service. Freidman et al. caution that simply including more information and more frequent consent interventions can be counter-productive, by frustrating people and leading them to making more complacent consent decisions [13]. ...
Preprint
Companies and academic researchers may collect, process, and distribute large quantities of personal data without the explicit knowledge or consent of the individuals to whom the data pertains. Existing forms of consent often fail to be appropriately readable and ethical oversight of data mining may not be sufficient. This raises the question of whether existing consent instruments are sufficient, logistically feasible, or even necessary, for data mining. In this chapter, we review the data collection and mining landscape, including commercial and academic activities, and the relevant data protection concerns, to determine the types of consent instruments used. Using three case studies, we use the new paradigm of human-data interaction to examine whether these existing approaches are appropriate. We then introduce an approach to consent that has been empirically demonstrated to improve on the state of the art and deliver meaningful consent. Finally, we propose some best practices for data collectors to ensure their data mining activities do not violate the expectations of the people to whom the data relate.
... However, data producers are not always aware of how their data are used and processed. Terms of Use are shown to be limited and ineffective [4,32]. Security and privacy of users' data depend entirely on data consumers and as a result misuse of personal information is possible, for instance, discrimination or limited freedom and autonomy by personalized persuasive systems [22,7,24,21]. ...
Preprint
The pervasiveness of Internet of Things results in vast volumes of personal data generated by smart devices of users (data producers) such as smart phones, wearables and other embedded sensors. It is a common requirement, especially for Big Data analytics systems, to transfer these large in scale and distributed data to centralized computational systems for analysis. Nevertheless, third parties that run and manage these systems (data consumers) do not always guarantee users' privacy. Their primary interest is to improve utility that is usually a metric related to the performance, costs and the quality of service. There are several techniques that mask user-generated data to ensure privacy, e.g. differential privacy. Setting up a process for masking data, referred to in this paper as a `privacy setting', decreases on the one hand the utility of data analytics, while, on the other hand, increases privacy. This paper studies parameterizations of privacy-settings that regulate the trade-off between maximum utility, minimum privacy and minimum utility, maximum privacy, where utility refers to the accuracy in the approximations of aggregation functions. Privacy settings can be universally applied as system-wide parameterizations and policies (homogeneous data sharing). Nonetheless they can also be applied autonomously by each user or decided under the influence of (monetary) incentives (heterogeneous data sharing). This latter diversity in data sharing by informational self-determination plays a key role on the privacy-utility trajectories as shown in this paper both theoretically and empirically. A generic and novel computational framework is introduced for measuring privacy-utility trade-offs and their optimization. The framework computes a broad spectrum of such trade-offs that form privacy-utility trajectories under homogeneous and heterogeneous data sharing.
... Despite increased concerns over data practices, Data Protection Authorities (DPA) and consumers often fail to detect GDPR violations, since privacy policies (PPs) are characterised by vagueness, complexities and legal technicalities [1,2]. Their analysis is a highly complex and time-consuming task [3]. Studies, including Zaem et al. [4], reveal that, while the GDPR has improved certain compliance metrics, privacy policies have become longer and more ambiguous, further hindering user understanding. ...
Conference Paper
Full-text available
Despite some improvements in compliance metrics after the implementation of the European General Data Protection Regulation (GDPR), privacy policies have become longer and more ambiguous. They often fail to fully meet GDPR requirements, thus leaving users without a reliable way to understand how their data is processed. We present a novel corpus composed by 30 privacy policies of online platforms and a new set of annotation guidelines, to assess the level of comprehensiveness of information. We focus on the processed categories of data, classifying each clause either as fully informative or as insufficiently informative. In our experimental evaluation, we perform 6 different classification and detection tasks, comparing BERT models and generative Large Language Models.
... Privacy policies are a crucial mechanism for organizations to communicate their data habits with external entities, including consumers and regulators. Unfortunately, they have been known to fall short of meaningfully achieving these goals due to their inaccessibility or incomprehensibility (McDonald and Cranor 2009). In response, privacy advocates and researchers have focused on developing Natural Language Processing tools to make privacy policies more usable for consumers and regulators (Harkous et al. 2018;Zimmeck and Bellovin 2014). ...
... When we think about how much time it would take to read them all, these figures are not surprising. A 2008 study estimated that it would take the average person 244 hours to read all the privacy policies they encountered in a year (McDonald & Cranor, 2008). An estimate given today would no doubt be much higher, given the greater number of devices we use that connect to the internet (e.g., smartwatches, smart fridges, and so on), and the growing list of stakeholders who buy and sell our personal data. ...
... McDonald and Cranor have shown that the average time that would be required to read privacy policies adds up to 244 hours per year. 41 Coupled with the shorter attention spans of users, it is undeniable that the cost of becoming familiar with the informa tion is too high for consumers. Hence, consumers often agree to the terms without reading them as they may believe that as a lot of other consumers had priorly accepted the same terms and conditions, these cannot be 'that' harmful. ...
Chapter
Full-text available
This book provides an analysis on whether and how the current European legal framework adequately deals with personal conditions in which digital technologies might prove particularly disruptive. It furthermore assesses how the existing policies and rules could be reinterpreted, reimagined and reshaped. In doing so, it offers a remarkable symbiosis between policy-oriented legal academic work, and more theoretical and philosophical scholarship. In particular, this book provides a more concrete meaning to the fluid concept of digital vulnerability, clarifying how this emerging paradigm may be applied both on a descriptive and prescriptive level; Identifies effective measures/remedies to ensure the utmost protection of those who may be digitally vulnerable against emerging technological threats; Helps to reconsider traditional private law micro- and macro-categories revolving around the notion of digital vulnerability, challenging traditional legal taxonomies. With contributions by Prof. Dr. Mateja Durovic | Prof. Dr. Fabrizio Esposito | Prof. Dr. Catalina Goanta | Prof. Dr. Giovanni de Gregorio | Prof. Dr. Eleni Kaprou | Prof. Dr. Emilia Mišćenić | Prof. Dr. Frank Pasquale | Prof. Dr. Oreste Pollicino | Prof. Dr. Jerry Spanakis
... Service providers ask users to go through a user agreement at setup or install time so that they can inspect their data practices before using their service. However, in practice, the terms of service, privacy policies and other agreements are long, complex, and tend to be ignored and forgotten after the initial sign up [6,19]. We address this problem by understanding user perspectives of and preferences for such agreements. ...
Preprint
Full-text available
This paper aims to cover and summarize the field of IoT and related privacy concerns through the lens of privacy by design. With the ever-increasing incorporation of technology within our daily lives and an ever-growing active research into smart devices and technologies, privacy concerns are inevitable. We intend to briefly cover the broad topic of privacy in the IoT space, the inherent challenges and risks in such systems, and a few recent techniques that intend to resolve these issues on the subdomain level and a system scale level. We then proceed to approach this situation through design thinking and privacy-by-design, given that most of the prior efforts are based on resolving privacy concerns on technical grounds with system-level design. We participated in a co-design workshop for the privacy of a content creation platform and used those findings to deploy a survey-based mechanism to tackle some key concern areas for user groups and formulate design principles for privacy that promote transparent, user-centered, and awareness-provoking privacy design.
... Despite these advancements, privacy policies are still often long and complex, making them difficult for users to fully understand. McDonald and Cranor (McDonald & Cranor, 2008) estimated that reading all privacy policies a user encounters would take 201 hours per year, highlighting the need for more user-friendly formats. ...
Preprint
This paper presents a novel application of large language models (LLMs) to enhance user comprehension of privacy policies through an interactive dialogue agent. We demonstrate that LLMs significantly outperform traditional models in tasks like Data Practice Identification, Choice Identification, Policy Summarization, and Privacy Question Answering, setting new benchmarks in privacy policy analysis. Building on these findings, we introduce an innovative LLM-based agent that functions as an expert system for processing website privacy policies, guiding users through complex legal language without requiring them to pose specific questions. A user study with 100 participants showed that users assisted by the agent had higher comprehension levels (mean score of 2.6 out of 3 vs. 1.8 in the control group), reduced cognitive load (task difficulty ratings of 3.2 out of 10 vs. 7.8), increased confidence in managing privacy, and completed tasks in less time (5.5 minutes vs. 15.8 minutes). This work highlights the potential of LLM-based agents to transform user interaction with privacy policies, leading to more informed consent and empowering users in the digital services landscape.
... Such applications might force users to disclose more personal data and impose difficulties in assessing the costs and benefits from the user's viewpoint due to a lack of transparency or understandability. In 2008, it was calculated that it would take an average internet user between 181 and 304 hours every year to read every privacy policy of all web services they are using [23]. The GDPR [5] also caused the complexity and length of privacy policies to be increase [8]. ...
Preprint
Full-text available
The growing use of Machine Learning and Artificial Intelligence (AI), particularly Large Language Models (LLMs) like OpenAI's GPT series, leads to disruptive changes across organizations. At the same time, there is a growing concern about how organizations handle personal data. Thus, privacy policies are essential for transparency in data processing practices, enabling users to assess privacy risks. However, these policies are often long and complex. This might lead to user confusion and consent fatigue, where users accept data practices against their interests, and abusive or unfair practices might go unnoticed. LLMss can be used to assess privacy policies for users automatically. In this interdisciplinary work, we explore the challenges of this approach in three pillars, namely technical feasibility, ethical implications, and legal compatibility of using LLMs to assess privacy policies. Our findings aim to identify potential for future research, and to foster a discussion on the use of LLM technologies for enabling users to fulfil their important role as decision-makers in a constantly developing AI-driven digital economy.
... 14 Various studies indicate that, if informed by privacy notices, users are empowered to choose IT systems that match their preferences, typically those with high data security and privacy standards, and avoid less secure ones. 16,17 But the current formats used for privacy notices, most commonly privacy policies, tend to provide rather detailed information and often use legal jargon [18][19][20][21][22] , which aims to maximize legal protection of IT providers rather than to transparently inform users. 23 Research has shown that overly lengthy and complex privacy policies may ultimately serve as a 'red flag', leading users to loose trust in the provider, if not to discontinue technology use altogether. ...
Preprint
Full-text available
Background: The German electronic health record (EHR) aims to enhance patient care and reduce costs, but users often worry about data security. To mitigate disease-related privacy concerns, for instance, surrounding stigmatized diseases, we test the effect of privacy fact sheets (PFS) - a concise but comprehensive transparency feature - on increasing EHR usage. Objective: We investigate whether displaying a PFS shortly before upload decisions must be made mitigates disease-related privacy concerns and makes uploads more likely. Methods: In an online user study, 393 German participants were asked to interact with a randomly assigned medical report that varied systematically in terms of disease-related stigma (high vs. low) and time course (acute vs. chronic). They were then asked to decide whether to upload the report to the EHR, while we systematically varied the presentation of privacy information (PFS vs. no PFS). Results: The results show that, in general, upload behavior is negatively influenced by disease-related stigma (OR 0.130, p<.001) and positively influenced when a PFS is given (OR 4.527, p<.001). This increase was particularly pronounced for stigmatized diseases (OR 5.952, p=.006). Time course of diseases had no effect. Conclusions: Our results demonstrate that PFSs help to increase EHR uploads by mitigating privacy concerns related to stigmatized diseases. This indicates that a PFS is mainly relevant and effective for users with increased privacy risk perceptions, while they do not hurt other users. Thus, implementing PFSs can increase the likelihood that more patients, even those with increased privacy concerns due to stigmatized diseases, upload their data to the EHR, ultimately increasing health equity. That is, PFS may help to realize EHR benefits such as more efficient healthcare processes, improved treatment outcomes, and reduced costs for more users.
... A typical privacy policy describes different aspects about collection and processing of personal data by an online service, e.g., what personal data are collected, how they are collected, why they are collected, how such data are protected, how such data are stored, and what data are shared with third parties. Although accepting the content of a privacy policy is often made mandatory for starting using an online service, most users tend to omit reading privacy policies because they are often too long to read quickly and too difficult to understand due to the legal and formal wording used [1]. Despite the existence of data protection laws and regulations in many countries, it has been found that service providers' privacy policies often do not fully comply with such laws and regulations, leading to concerns from online users, researchers and privacy advocates [2]. ...
Preprint
Full-text available
Machine learning based classifiers that take a privacy policy as the input and predict relevant concepts are useful in different applications such as (semi-)automated compliance analysis against requirements of the EU GDPR. In all past studies, such classifiers produce a concept label per segment (e.g., sentence or paragraph) and their performances were evaluated by using a dataset of labeled segments without considering the privacy policy they belong to. However, such an approach could overestimate the performance in real-world settings, where all segments in a new privacy policy are supposed to be unseen. Additionally, we also observed other research gaps, including the lack of a more complete GDPR taxonomy and the less consideration of hierarchical information in privacy policies. To fill such research gaps, we developed a more complete GDPR taxonomy, created the first corpus of labeled privacy policies with hierarchical information, and conducted the most comprehensive performance evaluation of GDPR concept classifiers for privacy policies. Our work leads to multiple novel findings, including the confirmed inappropriateness of splitting training and test sets at the segment level, the benefits of considering hierarchical information, and the limitations of the "one size fits all" approach, and the significance of testing cross-corpus generalizability.
... Privacy policies are a crucial mechanism for organizations to communicate their data habits with external entities, including consumers and regulators. Unfortunately, they have been known to fall short of meaningfully achieving these goals due to their inaccessibility or incomprehensibility (McDonald and Cranor 2009). In response, privacy advocates and researchers have focused on developing Natural Language Processing tools to make privacy policies more usable for consumers and regulators (Harkous et al. 2018;Zimmeck and Bellovin 2014). ...
Preprint
Full-text available
The development of tools and techniques to analyze and extract organizations data habits from privacy policies are critical for scalable regulatory compliance audits. Unfortunately, these tools are becoming increasingly limited in their ability to identify compliance issues and fixes. After all, most were developed using regulation-agnostic datasets of annotated privacy policies obtained from a time before the introduction of landmark privacy regulations such as EUs GDPR and Californias CCPA. In this paper, we describe the first open regulation-aware dataset of expert-annotated privacy policies, C3PA (CCPA Privacy Policy Provision Annotations), aimed to address this challenge. C3PA contains over 48K expert-labeled privacy policy text segments associated with responses to CCPA-specific disclosure mandates from 411 unique organizations. We demonstrate that the C3PA dataset is uniquely suited for aiding automated audits of compliance with CCPA-related disclosure mandates.
... Privacy policies are ubiquitous and required in many settings [35][36][37]64], and for better or worse, are an important tool for communicating about the behavior of systems. Natural language policies have many shortcomings and are full of technical details and jargon that significantly impact their usability as a tool to inform users clearly about the behaviors and data management practices [28,58]. Privacy nutrition labels, or privacy labels, offer an alternative to both simplify and standardize the communication of privacy behavior similar to food nutrition labels [20,51]. ...
Article
Full-text available
Apple introduced privacy labels in Dec. 2020 as a way for developers to report the privacy behaviors of their apps. While Apple does not validate labels, they also require developers to provide a privacy policy, which offers an important comparison point. In this paper, we fine-tuned BERT-based language models to extract privacy policy features for 474,669 apps on the iOS App Store, comparing the output to the privacy labels. We identify discrepancies between the policies and the labels, particularly as they relate to data collected linked to users. We find that 228K apps' privacy policies may indicate data collection linked to users than what is reported in the privacy labels. More alarming, a large number (97%) of the apps with a Data Not Collected privacy label have a privacy policy indicating otherwise. We provide insights into potential sources for discrepancies, including the use of templates and confusion around Apple's definitions and requirements. These results suggest that significant work is still needed to help developers more accurately label their apps. Our system can be incorporated as a first-order check to inform developers when privacy labels are possibly misapplied.
... In practice, many people pursue what Daniel Solove terms "security through obscurity"-believing their data is secure online because it would not be of interest to anyone (Hartzog & Stutzman, 2013). The ability to transparently understand the uses to which their data may be put is not merely a matter of reading the privacy statement, though one study put the opportunity cost of every user reading the privacy policy of every website they use at least once a year at 54 billion hours in the United States alone (this compares with 3.4 billion hours spent by every American taxpayer completing their income tax returns around the time of the study) (McDonald & Cranor, 2008). Rather, even if individuals were to consent to the uses to which their personal data were put by one website or another, it is in the sale and aggregation of this data and its potential secondary uses that transparency becomes near impossible. ...
Article
Full-text available
Big Data, understood as high-volume, high-velocity and/or high-variety information assets that enable insight, decision-making, and process automation (Gartner, 2015), offers both opportunities and challenges in all aspects of human life. As Higher Education serves as preparation not only for economic but also for health, welfare, social and civic participation, these changes are imbricated in many aspects of academic endeavor. In relation to research ethics, this change represents a normative difference in degree rather than a difference in kind. Data is more messy, more rapid, more difficult to predict and more difficult to identify owners, but the principles of informed consent, confidentiality and prevention of harm apply equally to digital as to traditional research data. Central to applying these principles, however, is the recognition that technologies are not inherently value neutral, and that data collection, aggregation, and its use in decision making can both create and intensify inequities and harms. A data justice approach to research ethics extends concern with voice and authenticity into the digital domain. The transparency and ethics of our research processes have wider significance, as they determine the creation of new knowledge, and the processes by which this is disseminated to students. Universities provide an important role as gatekeepers to professional accreditation in a number of fields, including software engineering, and the relation between academic freedom of enquiry, state and corporate interests in the Big Data age raises important questions about power and control in the academy, which in turn have implications for the norms of research governance.
... Users are increasingly concerned about online privacy [13], yet empirical studies consistently show that privacy policy documents have become substantially longer over the past two decades. With median word counts ranging from 1,500 [14] to 2,500 [15], these documents take too long to read [14], [16], resulting in users making little effort to read and understand them [17]. Further, the writing and presentation of these documents often make them inaccessible [7], with the end result often being uninformed consent [18]. ...
Preprint
While many online services provide privacy policies for end users to read and understand what personal data are being collected, these documents are often lengthy and complicated. As a result, the vast majority of users do not read them at all, leading to data collection under uninformed consent. Several attempts have been made to make privacy policies more user friendly by summarising them, providing automatic annotations or labels for key sections, or by offering chat interfaces to ask specific questions. With recent advances in Large Language Models (LLMs), there is an opportunity to develop more effective tools to parse privacy policies and help users make informed decisions. In this paper, we propose an entailment-driven LLM based framework to classify paragraphs of privacy policies into meaningful labels that are easily understood by users. The results demonstrate that our framework outperforms traditional LLM methods, improving the F1 score in average by 11.2%. Additionally, our framework provides inherently explainable and meaningful predictions.
... Besides ongoing criticism, privacy policies remain one of the primary sources for informing users about privacy practices. The most prominent points of criticism are their length [46] and abstract legal language [44,56], making it hard for users to engage with them meaningfully. Hence, research also proposed several improvements, the most promising one being privacy labels that provide privacy information in a more condensed and digestible manner [33]. ...
Article
Users frequently use their smartphones in combination with other smart devices, for example, when streaming music to smart speakers or controlling smart appliances. During these interconnected interactions, user data gets handled and processed by several entities that employ different data protection practices or are subject to different regulations. Users need to understand these processes to inform themselves in the right places and make informed privacy decisions. We conducted an online survey (N=120) to investigate whether users have accurate mental models about interconnected interactions. We found that users consider scenarios more privacy-concerning when multiple devices are involved. Yet, we also found that most users do not fully comprehend the privacy-relevant processes in interconnected interactions. Our results show that current privacy information methods are insufficient and that users must be better educated to make informed privacy decisions. Finally, we advocate for restricting data processing to the app layer and better encryption to reduce users' data protection responsibilities.
... While aiming to inform users about their rights and the extent of their data's utilization, these documents detail the data collection, use, and sharing practices. However, the complexity, length, and legal jargon within these policies often render them impenetrable to the general public [11,12,36]. This disconnect undermines the policies' intended purpose of promoting transparency and informed consent, highlighting a significant gap in digital privacy governance [43]. ...
Preprint
Full-text available
Privacy policies are often obfuscated by their complexity, which impedes transparency and informed consent. Conventional machine learning approaches for automatically analyzing these policies demand significant resources and substantial domain-specific training, causing adaptability issues. Moreover, they depend on extensive datasets that may require regular maintenance due to changing privacy concerns. In this paper, we propose, apply, and assess PAPEL (Privacy Policy Analysis through Prompt Engineering for LLMs), a framework harnessing the power of Large Language Models (LLMs) through prompt engineering to automate the analysis of privacy policies. PAPEL aims to streamline the extraction, annotation, and summarization of information from these policies, enhancing their accessibility and comprehensibility without requiring additional model training. By integrating zero-shot, one-shot, and few-shot learning approaches and the chain-of-thought prompting in creating predefined prompts and prompt templates, PAPEL guides LLMs to efficiently dissect, interpret, and synthesize the critical aspects of privacy policies into user-friendly summaries. We demonstrate the effectiveness of PAPEL with two applications: (i) annotation and (ii) contradiction analysis. We assess the ability of several LLaMa and GPT models to identify and articulate data handling practices, offering insights comparable to existing automated analysis approaches while reducing training efforts and increasing the adaptability to new analytical needs. The experiments demonstrate that the LLMs PAPEL utilizes (LLaMA and Chat GPT models) achieve robust performance in privacy policy annotation, with F1 scores reaching 0.8 and above (using the OPP-115 gold standard), underscoring the effectiveness of simpler prompts across various advanced language models.
... On the other hand, users are often unaware of or misunderstand privacy settings and find them too intimidating to configure [32], and they may not be aware of all the SDKs they have installed [33]. While the use of SDK geolocation data is legal under current legislation when explicit consent is granted by the user, the difficulty of reading and comprehending privacy notices [34,35] may raise concerns about the use of such data for certain sensitive cases. In general, further research is needed into how smartphone users exert meaningful control over their privacy without bogging down the user experience. ...
Article
Full-text available
The ubiquity and pervasiveness of mobile network technologies has made them so deeply ingrained in our everyday lives that by interacting with them for very simple purposes (e.g., messaging or browsing the Internet), we produce an unprecedented amount of data that can be analyzed to understand our behavior. While this practice has been extensively adopted by telcos and big tech companies in the last few years, this condition, which was unimaginable just 20 years ago, has only been mildly exploited to fight the COVID-19 pandemic. In this paper, we discuss the possible alternatives that we could leverage in the current mobile network ecosystem to provide regulators and epidemiologists with the right understanding of our mobility patterns, to maximize the efficiency and extent of the introduced countermeasures. To validate our analysis, we dissect a fine-grained dataset of user positions in two major European countries severely hit by the pandemic. The potential of using these data, harvested employing traditional mobile network technologies, is unveiled through two exemplary cases that tackled macro and microscopic aspects.
... There are many reasons data subjects struggle with online consent processes. The length and complexity of service terms are a longstanding concern (McDonald and Cranor, 2008;Reidenberg et al, 2015;Obar, 2022aObar, , 2022b. The problematic user interface designs of digital services can also make it difficult to realize information protections (Acquisti et al, 2017). ...
... There are many reasons data subjects struggle with online consent processes. The length and complexity of service terms are a longstanding concern (McDonald and Cranor, 2008;Reidenberg et al, 2015;Obar, 2022aObar, , 2022b. The problematic user interface designs of digital services can also make it difficult to realize information protections (Acquisti et al, 2017). ...
Chapter
Full-text available
This chapter proposes development of an Accelerator for Cognitive Economics to speed to lower these barriers. It invites readers to join in and to speed development of cognitive economics. It invites readers to complete a survey to indicate areas of particular interest. It also invites participation in a series of meet-ups, both virtual and personal. This includes meetings of interdisciplinary teams of researchers with grant officers, policy makers, business leaders, community leaders, and other interested parties.
Chapter
Full-text available
Chapter
Full-text available
This chapter introduces cognitive economics. It explains its inter-disciplinary nature and the fundamental role of utility functions, subjective beliefs, costs of learning , and Blackwell experiments . It outlines the need for economic data engineering to separate out beliefs from preferences and thereby to scientifically study decision-making mistakes . It also discusses distinctions and connections between cognitive economics and behavioral economics .
Chapter
Full-text available
This chapter highlights important next steps in cognitive economic research in relation to the cognitive revolution. Particular topics discussed are design of human- AI interactions, ensuring that workers and students are as well informed as possible about the impact of AI on their future earnings, and how to test students in the era of Large Language Models. It also stresses business and policy applications of cognitive economic research in relation to financial planning, cognitive decline, and online consumer protection.
Article
Статья посвящена актуальной проблеме соотношения информационной безопасности и прав человека. В России понятие «информационная безопасность», как правило, раскрывается через «состояние защищенности жизненно важных интересов личности, общества и государства». В США и Европейском Союзе определение «информационная безопасность» связывается с правовыми принципами конфиденциальности, целостности и доступности информации и информационных систем. Реализация данных принципов позволяет обеспечить баланс интересов различных участников правоотношений, выступая, таким образом, гарантией прав человека. Влияние современных технологий проявляется прежде всего в сфере личных прав, среди которых особое место занимает право на неприкосновенность частной жизни. С одной стороны, в качестве гарантий права на неприкосновенность частной жизни выступают меры, непосредственно направленные на защиту данного права. С другой стороны, при установлении ограничений права на неприкосновенность частной жизни соответствующие гарантии должны предупреждать возможные злоупотребления. В дополнение к принципам конфиденциальности, целостности и доступности выработаны специальные правовые принципы защиты права на неприкосновенность частной жизни. Основными механизмами защиты права на неприкосновенность частной жизни, которые находят выражение в данных принципах, являются согласие субъекта персональных данных на обработку персональных данных и его уведомление о такой обработке. В то же время с развитием интернет-технологий данные механизмы оказываются недостаточными для соблюдения права человека на неприкосновенность частной жизни. Развитие гарантий данного права осуществляется путем создания дополнительных механизмов защиты конфиденциальности, которые выражены в предъявлении специфических требований в сфере сбора и обработки личной информации. Совершенствование технологий также приводит к появлению новых угроз национальной безопасности и информационной безопасности государства как одной из ее составляющих. Необходимость их защиты становится основанием для ограничения права на неприкосновенность частной жизни. Соразмерность ограничений целям их установления выступает гарантией прав человека и информационной безопасности личности. В государствах с демократическими правовыми режимами, как правило, установлен приоритет прав человека перед обеспечением национальной безопасности.
Article
Full-text available
The current unilateral and bilateral governance agreement cannot solve problems such as the large strength gap, loose organisation, and cultural diversity in cross-border data flow in Asia. Therefore, we are in urgent need of structuring a multilateral governance mechanism, that is, to build a mutual trust platform for cross-border data flow in Asia. From a digital technology perspective, the Asian Cross-Border Data Flow Trust Platform is a blockchain-based digital technology architecture. From the perspective of the organisational model, the Asian cross-border data flow governance based on the mutual trust platform can be understood as a cooperative network in which multiple Asian countries cooperate to make cross-border data decisions. As a necessary medium to eliminate the complexity of the cooperation network, legal procedures will transform the chaos on the Asian Cross-Border Data Flow Mutual Trust Platform into order by simplifying the communication between multiple agents.
Preprint
This paper explores the importance of accountability to data protection, and how it can be built into the Internet of Things (IoT). The need to build accountability into the IoT is motivated by the opaque nature of distributed data flows, inadequate consent mechanisms, and lack of interfaces enabling end-user control over the behaviours of internet-enabled devices. The lack of accountability precludes meaningful engagement by end-users with their personal data and poses a key challenge to creating user trust in the IoT and the reciprocal development of the digital economy. The EU General Data Protection Regulation 2016 (GDPR) seeks to remedy this particular problem by mandating that a rapidly developing technological ecosystem be made accountable. In doing so it foregrounds new responsibilities for data controllers, including data protection by design and default, and new data subject rights such as the right to data portability. While GDPR is technologically neutral, it is nevertheless anticipated that realising the vision will turn upon effective technological development. Accordingly, this paper examines the notion of accountability, how it has been translated into systems design recommendations for the IoT, and how the IoT Databox puts key data protection principles into practice.
Conference Paper
Providing personalised recommendations has become standard practice across social and streaming services, online news aggre- gators, and various other media platforms. While success metrics usually paint a picture of user satisfaction and steer development to- wards further personalisation, these do not directly articulate users’ experiences of and opinions towards personalised content. This paper presents a mixed-methods investigation into the benefits, harms, and comfort levels regarding personalised media perceived by 211 people in the UK. Overall, participants believe that the ben- efits of personalisation outweigh the harms. However, they reveal conflicted feelings in relation to their comfort levels. Participants advocated for more agency, and provider transparency, including around data collection and handling. Given the high likelihood of ac- celerating media personalisation, we conclude that it is imperative to emphasise user-centric design in personalisation development and provide a set of design recommendations.
Article
Privacy notice and choice has largely failed us so far because we are not giving it the legal and technical support it needs.
Article
An organization’s privacy policy states how it collects, stores, processes, and shares its users’ personal information. The growing number of data protection laws and regulations as well as the numerous sectors where the organizations are collecting user information, has led to the investigation of privacy policies with regards to their accessibility, readability, completeness, comparison with organization’s actual data practices, use of machine learning/natural language processing for automated analysis, and comprehension/perception/concerns of end-users via summarization/visualization tools and user studies. However, there is limited work on systematically reviewing the existing research on this topic. We address this gap by conducting a systematic review of the existing privacy policy literature. To this end, we compiled and analyzed 202 papers (published till 31 st December 2023) that investigated privacy policies. Our work advances the field of privacy policies by summarizing the analysis techniques that have been used to study them, the data protection laws/regulations explored, and the sectors to which these policies pertain. We provide actionable insights for organizations to achieve better end-user privacy.
ResearchGate has not been able to resolve any references for this publication.