Technical Report

Data Donation for Impactful Insights: A Framework for Platform Selection and its Application to the Use Case of German Loyalty Card Providers

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

To streamline the process of platform selection for data donations, we have created a framework to structure the evaluation of possible DDP-providing platforms. By compiling these insights, this report endeavors to offer orientation in the area of loyalty card data and enable impactful data donation initiatives in the future.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Section 2 of this report contains the framework for platform selection [6], which provides a list of factors that are important to consider. Version A of this report has evaluated several German loyalty card providers [7], paving the way for future data donation initiatives utilizing shopping data. ...
Research
Full-text available
To conduct studies involving learning analytics data from the platform Duolingo, we evaluated the available data access methods. The report's structure is based on an evaluation framework for providing platforms. By compiling these insights, this report endeavors to offer orientation for the value creation from Duolingo user data and enable impactful data donation initiatives in the future.
Article
Full-text available
We propose a novel approach to detecting road defects by leveraging smartphones. This approach presents an automatic data collection mechanism and a deep learning model for road defect detection on smartphones. The automatic data collection mechanism provides a practical and reliable way to collect and label data for road defect detection research, significantly facilitating the execution of investigations in this research field. By leveraging the automatically collected data, we designed a CNN-based model to classify speed bumps, manholes, and potholes, which outperforms conventional models in both accuracy and processing speed. The proposed system represents a highly practical and scalable technology that can be implemented using commercial smartphones, thereby presenting substantial promise for real-world applications.
Article
Full-text available
Data donations represent a user-centered approach to data collection where researchers ask EU participants to exercise their right of access (GDPR) vis-à-vis intermediaries and to donate the digital trace data they receive to academic research. These data donations are often combined with survey data to gain deeper insights into the questions under investigation. Although initially promising, this process is complex for respondents and involves serious methodological, ethical, and legal challenges for researchers. A series of recently developed software solutions facilitate and streamline data donation studies. However, these stand-alone systems work separately from survey software. As a result, respondents typically face two platforms, one for the survey and one for the data donation. To facilitate their combination, we integrated two existing software solutions for online surveys (SoSci Survey) and data donations (OSD2F). We present our integrated solution and report on experiences with the approach from two exemplary studies.
Article
Full-text available
Combining surveys and digital trace data can enhance the analytic potential of both data types. We present two studies that examine factors influencing data sharing behaviour of survey respondents for different types of digital trace data: Facebook, Twitter, Spotify and health app data. Across those data types, we compared the relative impact of four factors on data sharing: data sharing method, respondent characteristics, sample composition and incentives. The results show that data sharing rates differ substantially across data types. Two particularly important factors predicting data sharing behaviour are the incentive size and data sharing method, which are both directly related to task difficulty and respondent burden. In sum, the paper reveals systematic variation in the willingness to share additional data which need to be considered in research designs linking surveys and digital traces.
Article
Full-text available
Studies assessing the effects of social media use are largely based on measures of time spent on social media. In recent years, scholars increasingly ask for more insights in social media activities and content people engage with. Data Download Packages (DDPs), the archives of social media platforms that each European user has the right to download, provide a new and promising method to collect timestamped and content-based information about social media use. In this paper, we first detail the experiences and insights of a data collection of 110 Instagram DDPs gathered from 102 adolescents. We successively discuss the challenges and opportunities of collecting and analyzing DDPs to help future researchers in their consideration of whether and how to use DDPs. DDPs provide tremendous opportunities to get insight in the frequency, range, and content of social media activities, from browsing to searching and posting. Yet, collecting, processing, and analyzing DDPs is also complex and laborious, and demands numerous procedural and analytical choices and decisions. © 2022 The Author(s). Published with license by Taylor & Francis Group, LLC.
Article
Full-text available
Approximately one-third of all food produced for human consumption is either lost or wasted. Given the central position of retailers in the supply chain, they have the potential to effectively reduce consumer food waste by implementing targeted interventions. To do so, however, they should target distinct consumer groups. In this research, we use a unique data set comprising the grocery shopping data of customers who use loyalty cards, complemented with food waste reports, to derive three distinct target groups: traditionals, time-constrained, and convenience lovers. Based on the general behavioral change literature, we discuss diverse target group-specific interventions that retailers can implement to reduce consumer food waste. Overall, we pave a research path to examine how retailers and marketing can effectively shift consumer behavior toward more sustainable food and shopping practices and assume responsibility within the food supply chain.
Preprint
Full-text available
Routinely collected clinical patient data posits a valuable resource for data-driven medical innovation. Such secondary data use for medical research purposes is dependent on the patient's consent. To gain an understanding of the patients' values and needs regarding medical data donations, we developed a participatory workshop method, integrating approaches from value-sensitive and reflective design to explore patients' values and translate them into hypothetical, ideal design solutions. The data gathered in the workshop are used to derive practicable design requirements for patient-oriented data donation technologies. In this paper, we introduce the workshop process and evaluate its application.
Article
Full-text available
The rapid development of community group buying platforms has attracted a huge attention from both the practical and academic communities. Although previous research has explored the influence patterns of community group buying platform on the customers’ purchase intention, there are limited studies on how customers’ purchase intention is influenced by their participation behavior. Therefore, based on social identity theory, this study constructs a theoretical model of consumer participation influencing users’ purchase intention through community identity in the community group purchase context, and examines the moderating role of users’ privacy concerns in this process in conjunction with privacy concern theory to systematically explore the role of consumer participation on purchase intention and its boundary conditions. In this study, the data collected from 532 valid samples are analyzed by structural equation modeling. The results of the study found that customer engagement behavior had a significant effect on purchase intention through the mediation of community identity, where privacy concerns negatively moderated the effect of community identity on purchase intention. The study reveals the intrinsic mechanism of customer engagement influencing purchase intention and its boundary conditions, which provides the suggestions for the marketing management and business practice of community group platforms.
Article
Full-text available
This preview article discusses PORT—a data donation software newly developed by Boeschoten et al.—toward the background of three core data donation principles: privacy protection, meaningful data extraction, and securing user agency.
Article
Full-text available
In light of the globally increasing prevalence of diet-related chronic diseases, new scalable and non-invasive dietary monitoring techniques are urgently needed. Automatically collected digital receipts from loyalty cards hereby promise to serve as an objective and automatically traceable digital marker for individual food choice behavior and do not require users to manually log individual meal items. With the introduction of the General Data Privacy Regulation in the European Union, millions of consumers gained the right to access their shopping data in a machine-readable form, representing a historic chance to leverage shopping data for scalable monitoring of food choices. Multiple quantitative indicators for evaluating the nutritional quality of food shopping have been suggested, but so far, no comparison has validated the potential of these alternative indicators within a comparative setting. This manuscript thus represents the first study to compare the calibration capacity and to validate the discrimination potential of previously suggested food shopping quality indicators for the nutritional quality of shopped groceries, including the Food Standards Agency Nutrient Profiling System Dietary Index (FSA-NPS DI), Grocery Purchase Quality Index-2016 (GPQI), Healthy Eating Index-2015 (HEI-2015), Healthy Trolley Index (HETI) and Healthy Purchase Index (HPI), checking if any of them performs differently from the others. The hypothesis is that some food shopping quality indicators outperform the others in calibrating and discriminating individual actual dietary intake. To assess the indicators’ potentials, 89 eligible participants completed a validated food frequency questionnaire (FFQ) and donated their digital receipts from the loyalty card programs of the two leading Swiss grocery retailers, which represent 70% of the national grocery market. Compared to absolute food and nutrient intake, correlations between density-based relative food and nutrient intake and food shopping data are stronger. The FSA-NPS DI has the best calibration and discrimination performance in classifying participants’ consumption of nutrients and food groups, and seems to be a superior indicator to estimate nutritional quality of a user’s diet based on digital receipts from grocery shopping in Switzerland.
Article
Full-text available
The General Data Protection Regulation (GDPR) grants all natural persons the right to access their personal data if this is being processed by data controllers. The data controllers are obliged to share the data in an electronic format and often provide the data in a so called Data Download Package (DDP). These DDPs contain all data collected by public and private entities during the course of a citizens’ digital life and form a treasure trove for social scientists. However, the data can be deeply private. To protect the privacy of research participants while using their DDPs for scientific research, we developed a de-identification algorithm that is able to handle typical characteristics of DDPs. These include regularly changing file structures, visual and textual content, differing file formats, differing file structures and private information like usernames. We investigate the performance of the algorithm and illustrate how the algorithm can be tailored towards specific DDP structures.
Article
Full-text available
Although food retailers have embraced organic certified food products as a way to reduce their environmental loading, organic sales only make up a small proportion of total sales worldwide. Most consumers have positive attitudes towards organic food, but attitudes are not reflected in behaviour. This article addresses consumers’ attitude–behaviour gap regarding their purchase of organic food and reports on how visualization of personal shopping data may encourage them to buy more organic food. Through the design of the visualization tool, the EcoPanel, and through an empirical study of its use, we provide evidence on the potential of the tool to promote sustainable food shopping practices. Of 65 users that tested the EcoPanel for five months, in-depth interviews were made with nine of these. The test users increased their purchase of organic food by 23%. The informants used the EcoPanel to reflect on their shopping behaviour and to increase their organic shopping. We conclude that the visualization of food purchases stimulates critical reflection and the formation of new food shopping practices. This implies that food retailers may increase sales of organic food through using a visualization tool available for their customers. In this way, these retailers may decrease their environmental impact.
Article
Full-text available
Privacy-preserving data markets are one approach to restore users’ online privacy and informational self-determination and to build reliable data markets for companies and research. We empirically analyze internet users’ preferences for privacy in data sharing, combining qualitative and quantitative empirical methods. Study I aimed at uncovering users’ mental models of privacy and preferences for data sharing. Study II quantified and confirmed motives, barriers, and conditions for privacy in data markets. Finally, in a conjoint study, trade-offs between decisive attributes that shape the decision to share data are analyzed. Additionally, differences between user groups with high and with low privacy concerns are observed. The results show that the anonymization level has the greatest impact on the willingness to share data, followed by the type of data. Users with higher privacy concerns are less willing to share data in data markets and want more privacy protection. The results contribute to an understanding of how privacy-preserving data markets could be designed to suit users’ preferences.
Article
Full-text available
Background: Patients generate large amounts of digital data through devices, social media applications, and other online activities. Little is known about patients' perception of the data they generate online and its relatedness to health, their willingness to share data for research, and their preferences regarding data use. Methods: Patients at an academic urban emergency department were asked if they would donate any of 19 different types of data to health researchers and were asked about their views on data types' health relatedness. Factor analysis was used to identify the structure in patients' perceptions of willingness to share different digital data, and their health relatedness. Results: Of 595 patients approached 206 agreed to participate, of whom 104 agreed to share at least one types of digital data immediately, and 78% agreed to donate at least one data type after death. EMR, wearable, and Google search histories (80%) had the highest percentage of reported health relatedness. 72% participants wanted to know the results of any analysis of their shared data, and half wanted their healthcare provider to know. Conclusion: Patients in this study were willing to share a considerable amount of personal digital data with health researchers. They also recognize that digital data from many sources reveal information about their health. This study opens up a discussion around reconsidering US privacy protections for health information to reflect current opinions and to include their relatedness to health.
Article
Full-text available
Objective To study the characteristics of large-scale loyalty card data obtained in Finland, and to evaluate their potential and challenges in health research. Methods We contacted the holders of a certain loyalty card living in a specific region in Finland via email, and requested their electronic informed consent to obtain their basic background characteristics and grocery expenditure data from 2016 for health research purposes. Non-participation and the characteristics and expenditure of the participants were mainly analysed using summary statistics and figures. Results The data on expenditure came from 14,595 (5.6% of those contacted) consenting loyalty card holders. A total of 68.5% of the participants were women, with an average age of 46 years. Women and residents of Helsinki were more likely to participate. Both young and old participants were underrepresented in the sample. We observed that annual expenditure represented roughly two-thirds of the nationally estimated annual averages. Customers and personnel differed in their characteristics and expenditure, but not so much in their most frequently bought items. Conclusions Loyalty card data from a major retailer enabled us to reach a large, heterogeneous sample with fewer resources than conventional surveys of the same magnitude. The potential of the data was great because of their size, coverage, objectivity, and long periods of dynamic data collection, which enables timely investigations. The challenges included bias due to non-participation, purchases in other stores, the level of detail in product grouping, and the knowledge gaps in what is being consumed and by whom. Loyalty card data are an underutilised resource in research, and could be used not only in retailers’ activities, but also for societal benefit.
Article
Using data donations to collect digital trace data holds great potential for communication research, which has not yet been fully realized. Besides limited awareness and expertise among researchers, a central challenge is to motivate people to donate their personal data. Therefore, this article investigates which factors affect people’s willingness to donate across different platforms and data types. The study applies a multilevel approach that explains the reported willingness to donate different types of data (level 1) belonging to different platforms (level 2) from potential data donors with individual characteristics (level 3) to a hypothetical research project. The analysis is based on data collected through a national online survey (n = 833). We find higher willingness to donate YouTube data compared to Facebook, Instagram, or Google, as well as relevant influencing factors at all three levels. Greater willingness is found for lower perceived sensitivity and higher perceived relevance of the data (level of data type), greater perceived behavioral control to request and submit the data (platform level), more favorable attitudes toward data donation and the donation purpose, as well as lower contextual privacy concerns (individual level). Based on these findings, practical implications for future data donation studies are proposed.
Article
People’s activities and opinions recorded as digital traces online, especially on social media and other web-based platforms, offer increasingly informative pictures of the public. They promise to allow inferences about populations beyond the users of the platforms on which the traces are recorded, representing real potential for the social sciences and a complement to survey-based research. But the use of digital traces brings its own complexities and new error sources to the research enterprise. Recently, researchers have begun to discuss the errors that can occur when digital traces are used to learn about humans and social phenomena. This article synthesizes this discussion and proposes a systematic way to categorize potential errors, inspired by the Total Survey Error (TSE) framework developed for survey methodology. We introduce a conceptual framework to diagnose, understand, and document errors that may occur in studies based on such digital traces. While there are clear parallels to the well-known error sources in the TSE framework, the new “Total Error Framework for Digital Traces of Human Behavior on Online Platforms” (TED-On) identifies several types of error that are specific to the use of digital traces. By providing a standard vocabulary to describe these errors, the proposed framework is intended to advance communication and research about using digital traces in scientific social research.
Article
Purpose This study aims to explicate the behavioral factors that determine willingness to share personal health data for secondary uses. Design/methodology/approach A theoretical model is developed and tested with structural equation modeling using survey data from Finland. Findings It is shown that attitude toward information sharing is the strongest factor contributing to the willingness to share personal health information (PHI). Trust and control serve as mediating factors between the attitude and willingness to share PHI. Research limitations/implications The measures of the model need further refinement to cover the various aspects of the behavioral concepts. Practical implications The model provides useful insights into the factors that affect the willingness for information sharing in health care and in other areas where personal information is distributed. Social implications Sharing of PHI for secondary purposes can offer social benefits through improvements in health-care performance. Originality/value A broad-scale empirical data gives a unique view of attitudes toward sharing of PHI in one national setting.
Article
Health research using electronic health records (EHR) has gained popularity, but misclassification of EHR‐derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error. In this paper, we develop new strategies for handling disease status misclassification and selection bias in EHR‐based association studies. We first focus on each type of bias separately. For misclassification, we propose three novel likelihood‐based bias correction strategies. A distinguishing feature of the EHR setting is that misclassification may be related to patient‐varying factors, and the proposed methods leverage data in the EHR to estimate misclassification rates without gold standard labels. For addressing selection bias, we describe how calibration and inverse probability weighting methods from the survey sampling literature can be extended and applied to the EHR setting. Addressing misclassification and selection biases simultaneously is a more challenging problem than dealing with each on its own, and we propose several new strategies. For all methods proposed, we derive valid standard error estimators and provide software for implementation. We provide a new suite of statistical estimation and inference strategies for addressing misclassification and selection bias simultaneously that is tailored to problems arising in EHR data analysis. We apply these methods to data from The Michigan Genomics Initiative, a longitudinal EHR‐linked biorepository.
Article
The increasing interest in sustainable consumption has lead several scholars to investigate the determinants that drive the consumption of organic food. Most of this research is based on consumers' self‐reports of their purchasing behavior by exploring declared behavioral intentions. There is a lack of understanding concerning the determinants of organic food consumption based on actual purchasing behavior. To fill this gap, this study is based on a combination of actual purchasing data and self‐reported data from a sample of 79 Italian consumers. The determinants of organic food consumption are explored by analyzing the effects of subjective norms, attitude, perceived behavioral control, intention to buy, organic knowledge, and health consciousness on actual purchasing behavior. Our results suggest that actual purchasing behavior is positively influenced by intention to buy and negatively by subjective norms. Although attitude towards buying organics is positively affected by health consciousness and perceived behavioral control, consumer knowledge about organics is found to influence purchase intentions. Theoretical and managerial implications, along with avenues for future research, are discussed.
Conference Paper
In this study, learning analytics was conducted based on the data of 2582 students' learning on an online learning platform. Through correlation and regression analysis, the effects of 13 indicators of behavioral engagement on learning performance were investigated. The results showed that nine of the 13 indicators were significantly correlated with the pass rate of lesson quizzes. Among them, 8 were significant contributors, which explained 59.8% of variability in the pass rate of lesson quizzes. Ten indicators were significantly correlated with the mean score of unit tests. Among them, 7 were significant contributors, which explained 65.4% of the variability. It also showed that learning motivation was a crucial factor of the quality of online learning engagement. The quality of online learning engagement of the non-degree students who had internal motivation was better than that of degree students.
Regulation (EU) 2022/1925 of the European Parliament and of the Council of 14 September 2022 on contestable and fair markets in the digital sector
Regulation (EU) 2022/1925 of the European Parliament and of the Council of 14 September 2022 on contestable and fair markets in the digital sector (Digital Markets Act). 2022, S. 1-66. doi: 10.5040/9781782258674.
Data donation checklist
  • N De Schipper
N. De Schipper, "Data donation checklist", GitHub. Zugegriffen: 6. Juni 2024. [Online]. Verfügbar unter: https://github.com/d3i-infra/data-donation-task/wiki/Data-donation-checklist
Kundenkarten und Bonuspunktekarten nach Anzahl der Nutzer in Deutschland im Jahr 2022
  • L Lohmeier
L. Lohmeier, "Kundenkarten und Bonuspunktekarten nach Anzahl der Nutzer in Deutschland im Jahr 2022", Statista, 2023. Zugegriffen: 13. Mai 2024. [Online]. Verfügbar unter: https://de.statista.com/statistik/daten/studie/1351411/umfrage/kundenkarten-nach-anzahl-dernutzer-in-deutschland/
Marktanteile der führenden Unternehmen im LEH 2022
  • S Ahrens
S. Ahrens, "Marktanteile der führenden Unternehmen im LEH 2022", Statista, 2023. Zugegriffen: 13. Mai 2024. [Online]. Verfügbar unter: https://de.statista.com/statistik/daten/studie/4916/umfrage/marktanteile-der-5-groesstenlebensmitteleinzelhaendler/
Number of people in Germany who purchase groceries for their household on the internet or from online shops from
  • Statista Research Department
Statista Research Department, "Number of people in Germany who purchase groceries for their household on the internet or from online shops from 2019 to 2023", 2023. Zugegriffen: 24. Januar 2024. [Online]. Verfügbar unter: https://www.statista.com/statistics/989726/purchasing-groceries-on-the-internet-or-fromonline-shops/
Duolingo Shareholder Letter Q4/FY 2023
  • D Belevan
D. Belevan, "Duolingo Shareholder Letter Q4/FY 2023", 2024. Zugegriffen: 17. Mai 2024.