ResearchPDF Available

Data Donation for Impactful Insights: A Framework for Platform Selection and its Application to the Use Case of Duolingo Data Donation Methods

Authors:

Abstract and Figures

To conduct studies involving learning analytics data from the platform Duolingo, we evaluated the available data access methods. The report's structure is based on an evaluation framework for providing platforms. By compiling these insights, this report endeavors to offer orientation for the value creation from Duolingo user data and enable impactful data donation initiatives in the future.
Content may be subject to copyright.
Last Updated: September 25, 2024
1
Data Donation for Impactful Insights:
A Framework for Platform Selection
and its Application to the Use Case of
Duolingo Data Donation Methods
Version B: Evaluating Duolingo Data Donation Methods
Last Updated: September 25, 2024
2
Corresponding Author:
Leonie Manzke
Project „DataDonations4SustainableChange“
Friedrich-Alexander-Universität Erlangen-Nürnberg
Institute of Information Systems (IS)
Tenure-Track Professorship for Digital Transformation
Lange Gasse 20
90403 Nürnberg
e-mail: leonie.manzke@fau.de
With contributions of:
Philipp Hartl
Project „DataDonations4SustainableChange“
University of Regensburg
Faculty of Informatics and Data Science
Chair of Machine Learning and Uncertainty Quantification
Maria Klose
Leibniz Institute for Educational Trajectories (LIfBi), Bamberg
Research Group: Education in a Digital World
Elisabeth Schmidbauer
Project „DataDonations4SustainableChange“
Ludwig-Maximilian University of Munich
Department for Media and Communication
Funding
The project DataDonations4SustainableChange is funded by the Bavarian Institute for Digital
Transformation (bidt), an institute of the Bavarian Academy of Sciences and Humanities.
The working group “DataDuo” comprises all authors and is also supported by the bidt.
Last Updated: September 25, 2024
3
Contents
1. Introduction .................................................................................................................................. 4
2. Framework for Platform Selection .............................................................................................. 5
3. Evaluations of Duolingo Data Donation Methods ..................................................................... 7
3.1. Donation of Data Download Package (DDP) ......................................................................... 8
3.2. Donation of Username + Public API Query ......................................................................... 10
3.3. Donation of Username + Internal API Query ....................................................................... 12
3.4. Donation of Public API Query Results (while logged in) ...................................................... 14
3.5. Donation of Internal API Query Results (while logged in) .................................................... 16
3.6. Donation of JSON Web Token (JWT) .................................................................................. 18
3.7. Summary of Data Access Methods ..................................................................................... 20
3.8. Sensitivity of Related Data ................................................................................................... 21
3.9. Limitations ............................................................................................................................ 21
4. Outlook ........................................................................................................................................ 22
5. References .................................................................................................................................. 23
6. Appendix ..................................................................................................................................... 25
A. Screenshots of Deidentified Data Samples ............................................................................. 25
B. “How-To”s for Duolingo Data Access Methods ........................................................................ 32
Last Updated: September 25, 2024
4
1. Introduction
Data donations are a powerful method to enhance research on human behavior, creating new
opportunities for validation and measurement. The approach is based on individuals’ right to
data access, granted by Art. 12 of the GDPR [1]. It obliges data controllers to provide data
subjects with a copy of their personal data upon request. As per Art. 20 of the GDPR, the right
to data portability creates a legal framework to enable citizens to donate their data to
researchers for a purpose in the public interest. This opens the way for research to utilize this
type of data in different domains.
It takes substantial efforts to set up a data donation study [2]. To justify these efforts, it is
necessary to evaluate the data provided by data-controlling platforms beforehand to consider
the feasibility, scalability, and impact potential of data donation studies in the given context.
Prior research has consolidated such experiences with using Data Download Packages
(DDPs) from specific platforms (e.g. Instagram: [3], [4]) and provided error frameworks to
navigate data donation studies in all their stages [4], [5], with a social media setting in mind.
Section 2 of this report contains the framework for platform selection [6], which provides a list
of factors that are important to consider. Version A of this report has evaluated several German
loyalty card providers [7], paving the way for future data donation initiatives utilizing shopping
data.
Other fields like educational research are dealing with unmet needs related to data availability.
The investigation of non-formal learning behavior, i.e. learning behavior that takes place
outside of formal, institutional settings (e.g. schools, university), often relies on self-reports [8].
Therefore, user data of non-formal digital learning platforms may meaningfully extend
researchers’ toolboxes. For example, the donation of user data enables the examination of log
data to identify learning behavior patterns to better understand the non-formal learning process
[9]. The commercial language-learning platform Duolingo regularly funds studies to examine
the effectiveness of its programs (such as [10]).
Researchers may also pursue a user-centered data collection approach, such as with data
donations, and may therefore act independently from companies granting them access. There
are several different ways in which users can retrieve usage data, and subsequently donate it.
These methods for accessing data differ in terms of the contents of the data, transaction costs
and users’ (perceived) privacy risks. Therefore, we aim to answer two questions:
è Are these different types of retrievable Duolingo user data suitable for data donation
studies?
è How to determine the best retrieval method to balance feasibility and scalability of
prospective data donation studies?
This report contains the framework for platform selection, as well as its application to different
retrieval methods of Duolingo user data. By compiling these insights, this report endeavors to
offer guidance and facilitate effective data donation initiatives in the future.
Abbreviation: DDP = Data Download Packages
Last Updated: September 25, 2024
5
2. Framework for Platform Selection
Aspects of Platform Compliance and Considerations Regarding Feasibility
Criterion
Underlying Properties
Exemplary Features Occurring
with Evaluated Data Access
Methods
GDPR [1]
Require-
ment*
Machine-
Readability
Provided file format
single vs. multiple files
Open formats (.json, .csv)
DDP containing multiple .csv files
Art. 20(1)
Transparency
Understandability of
variable names and
values, possibly
including documentation
Self-explanatory variable names vs.
variables with empty values and/or
unclear meaning.
Internal identifiers for language skills
impossible to decipher without
associated documentation.
Art.
5(1)(a)
Art. 12(1)
Data Quality
Consistency within and
between data streams
Match between activity timestamps
and their actual timing.
Consistent data for the same user
across access methods.
Art. 15(1)
(a)-(h)
Granularity, relevant
content, and inclusion of
meta information
Provision of full activity history incl.
timestamps vs. minimal information
on current learning progress.
DDP Retrieval
Number of necessary
steps and efficiency of
the process
Data donors providing their
username vs. requesting their own
data for active donation.
Art. 12(3)
Duration between
request and retrieval
Mostly instantaneous,
up to several hours.
*In the case of Duolingo, only the DDP
provided by the platform has to fulfill regulations, as the company holds no external responsibility for its internal APIs.
Last Updated: September 25, 2024
6
Considerations Regarding Scalability and Impact Potential
Criterion
Underlying Properties
Exemplary Features Occurring with
Evaluated Data Access Methods
Challenges in Research and Application
Data Engineering
Use of standardized notations
Unix timestamps, valid formats, such as
datetime.
o Hurdles for efficient data engineering may increase the
time and effort necessary to analyze donated data,
particularly at scale. For example: lack of transparency
in how individual values are encoded.
o Deidentification needs to be conceptualized for every
type of DDP [14].
Identifiability of personal data
Personal information is stored in
multiple files and/or repeated
(e.g. “id” and “username”)
Donation
Opportunities
Repeated donations
All data access methods are automated
on Duolingo’s side, enabling repeated
access.
o Heterogeneous levels of automation request handling
manually processed requests can only be made
once a year per data subject.
o Continuous data donations requiring ex-ante consent
may induce more privacy concerns than one-time data
donations [15].
Continuous donations
Data access via API may be utilized in a
design based on ex-ante consent.
Quantity &
Representativeness
of Potential Data
Donors
Provider market share
Number and demographics of
active platform users
88 million monthly, and 28 million daily
active users, most of which belong to a
younger demographic. A large share is
US-based.
o DDPs of more impactful platform providers may not be
suitable for efficient processing and vice versa.
o There are multiple sources of error to the
representativeness of data donation studies [5]
o Self-selection is one of these possible biases [16].
Sensitivity of DDP
contents
Perceived sensitivity of
specific kinds of data, DDPs
may contain multiple types
Included types of data depends on the
chosen access method and may invoke
different levels of privacy concerns. For
an overview, see table 1.
o Perceived data sensitivity affects privacy concerns.
o These can influence the willingness to donate [15].
Last Updated: September 25, 2024
7
3. Evaluations of Duolingo Data Donation Methods
There are a number of different ways Duolingo users may donate their data to researchers. It
is possible for users to request a download link for a DDP (evaluated in section 3.1). There are
also application programming interfaces (API) which allow the display and subsequent
download of user data from the browser, and in a different structure compared to the DDP. To
do so, the username of the account must be known.
There are two different APIs available that we will henceforth refer to as the “public API” which
can be queried even if the requesting party is not part of Duolingo’s ecosystem, and Duolingos
“internal API”, the use of which is more restricted. Both can be used with either an
‘unauthenticated’ or ‘authenticated’ access. The former requires researchers to know a
username. In the latter case, users can log into their accounts and query the data for their own
account, and then subsequently donate this data. Finally, it is possible to extract users’ Login-
Tokens (JSON Web Token (JWT)) from browsers’ development consoles via JavaScript or a
subsequent add-on. With this token, researchers can access the respective user data directly.
As long as users do not change their password, the JWT remains valid, and researchers retain
continuous access to the data of the consenting participant. Figure 1 shows how these options
compare with each other and in which sections they will be evaluated.
Figure 1: Overview of data retrieval processes and enabling technologies.
More detailed outlines for the data retrieval processes are included in Appendix 3.B in the form
of instructions given to test donors. For each data retrieval method, we evaluated test cases
from four to five Duolingo users.
The framework for platform selection, included in section 2, suggests evaluating the factor
Quantity & representativeness of potential data donors”. Certain demographics may be
disproportionally affected by increased transaction costs, i.e., older study participants may be
more affected by technical difficulties during the donation process. However, aside from this,
the aspects of potential donor quantity and representativeness lie more in the selection of the
learning platform, rather than the way to retrieve data from it.
In Q3 of 2023, Duolingo stated to have about 88 million monthly, and 28 million daily active
users [17]. Statistics from 2021 indicated that the main target audience is young people
between 16 and 22 years, but more than two thirds of its user base belong to age categories
older than that [18]. Most of the platform’s user base is located in the USA (about 25 % of
users) [18].
Last Updated: September 25, 2024
8
3.1. Donation of Data Download Package (DDP)
Criterion
Underlying DDP
Properties
Examined DDP Properties
Status: 01-22-2024
Machine-
Readability
Provided file format
single vs. multiple files
o ZIP folder containing multiple CSV and image files.
Transparency
Understandability of
variable names and
values by being self-
explanatory or including
documentation
o Many CSV files contain self-explanatory variables, or
their meaning can be inferred.
o Some variables are puzzling, for example the setting
“disable_mature_words” is set as “True” in all trial
datasets, despite there being no such setting visible in
the settings UI.
o Some values are internal identifiers in the form of
strings of numbers and letters, without documentation
to make sense of them.
o No documentation is provided with the DDP.
Data Quality
Consistency within and
between data streams
o Some values appear suspicious, for example, the time
since the last activitydoes not match up with the user’s
last instance of using the app therefore, the variable
in question could be representing something different.
Granularity, relevant
content, and inclusion of
meta information
o Information included: Current streak; Purchase history;
History of leaderboard positions and XP progression;
total earned points, number of completed lessons,
active days (for each language); extent of social
function usage (number of followers and follows).
o No information on learned content.
o An exception is the documentation of “story”
completions, but the story titles are not enough to infer
their content. For users of language courses which do
not support stories, the corresponding files are
included, but empty.
o No information on app usage in terms of daily learning
time or error rate.
DDP Retrieval
Number of necessary
steps and efficiency of
the process
o When logging into a user account in a web browser, the
option “request my data” is available in a sub-menu of
the settings, albeit in a grayed-out font color.
o Issuing the request triggers an email with a link that
needs to be clicked. After that, a download link is
provided in another email.
Duration between
request and retrieval
o According to Duolingo, a download link is provided
“within 30 days” of confirming the request.
o However, it took less than a day in all trials.
Data Engineering
Use of standardized
notations
o Many internal or encrypted value notations, without
documentation to decode them.
Last Updated: September 25, 2024
9
Identifiability of
personal data
o Out of the 14 CSV files in the DDP, half of them contain
some kind of identifiable information (username, email,
etc.).
o Avatar images are included in the DDP, both as image
files in multiple sizes, and as URLs to access them.
o In small sample sizes, data donor identities may be
inferred by what language(s) they are learning in
combination with their time zone and estimated
birthdate”.
Donation
Opportunities
Repeated donations
Continuous donations
o Another manual request in the settings is necessary for
a repeated donation.
o Since the process is automated on Duolingo’s side, this
can be repeated for an indefinite number of times.
Sensitivity of DDP
contents
Perceived sensitivity of
specific kinds of data
o Type of donated data: Usage statistics, learning
progress, purchase history, history of leaderboard
rankings
o Users donate the DDP actively. Continuous data
access is not enabled this way.
o However, this would only be the case if the donor’s
username is redacted in the donated files, otherwise
continuous access to learning progress is possible with
the method described in 3.2.
Domain-specific
Challenges in Value
Creation
Comparability of users
o Information on active A/B-Testing is missing, so users
may experience different features.
o Courses in different languages are structured very
differently and can vary in terms of quality. Therefore,
they may not be comparable in some dimensions.
Highlights
o Easy data retrieval process that can be repeated.
o Some information on learning progress, but other
methods yield more detailed histories.
o ZIP folder should not be transmitted without local pre-
processing and anonymization, as avatar images may
contain sensitive personal information.
o Many values with unclear meaning.
Last Updated: September 25, 2024
10
3.2. Donation of Username + Public API Query
Criterion
Underlying Properties
of
Data Access
Examined Properties
Status: 09-06-2024
Machine-
Readability
Provided file format
single vs. multiple files
o An API query results in a JSON file being displayed in the
browser. This file can be exported.
o Researchers can do this without the input of the data
donor.
Transparency
Understandability of
variable names and
values by being self-
explanatory or including
documentation
o Little information is available with this type of API access.
o Most variables related to app usage behavior are self-
explanatory.
o Some variables would need context to make them
interpretable, such as “hasRecentActivity15”. The
referenced timeframe of this Boolean variable is
unknown.
o There are fields such as “bio” that have been empty in all
test cases where their meaning remains unclear.
Data Quality
Consistency within and
between data streams
o Values of variables with the same names in a logged-in
API query (process evaluated in 3.3) match up.
o App updates have led to changing data structures. For
example, newer users were asked what motivated them
to learn a language. For older users, this value is empty.
Granularity, relevant
content, and inclusion of
meta information
o General information: motivation (if applicable), date of
account creation, and if it is a premium account.
o Information on course progress is limited to current
streak length, total earned XP, the daily XP, and “crowns”
(completed sections) per language.
o Multiple languages can be learned at the same time.
More detailed information on course progress is only
available for the most recently used one.
DDP Retrieval
Number of necessary
steps and efficiency of the
process
o Researchers receive data donors’ usernames.
o API query by completing the call (see [19]) with the given
username.
Duration between request
and retrieval
o Instantaneous download is possible.
Data
Engineering
Use of standardized
notations
o Dates are given as Unix timestamps that can be
converted.
o Languages codes are given in ISO-639 format.
o The resulting JSON file is valid and indicates the use of
standard datatypes, e.g., date formats.
Identifiability of
personal data
o There is no identifiable information included in the API
query result aside from the username. That information
has to be given to researchers in any case.
o There is an invite URL
Last Updated: September 25, 2024
11
Donation
Opportunities
Repeated donations
Continuous donations
o As long as the account name is not changed, the API
query can be repeated for an indefinite number of times,
without additional input from the data donor.
o This process may be automated in a study design based
on users’ ex-ante consent, allowing researchers
continuous access to their data, without them having to
do anything.
o Alternatively, a browser extension that automates the API
query and subsequent JSON retrieval could facilitate the
process for users actively donating their data.
Sensitivity of
DDP contents
Perceived sensitivity of
specific kinds of data
o Type of donated data: Learning progress
o Username donation implies continuous access to usage
statistics unless an anonymization protocol is followed
before transmitting the to-be-donated data to
researchers.
Domain-specific
Challenges in
Value Creation
Comparability of users
o Information on active A/B-Testing is not included, so
users may experience different features.
o Courses in different languages are structured very
differently and can vary in terms of quality. Therefore,
they may not be comparable in some dimensions.
Highlights
o Requires next to no effort from data donors.
o Allows for study designs with ex-ante consent.
o Not as much information on learning progress available.
o API access may change over time.
Last Updated: September 25, 2024
12
3.3. Donation of Username + Internal API Query
Criterion
Underlying Properties
of
Data Access
Examined Properties
Status: 09-06-2024
Machine-
Readability
Provided file format
single vs. multiple files
o An API query results in a JSON file being displayed in the
browser. This file can be exported.
o Researchers can do this without the input of the data
donor.
Transparency
Understandability of
variable names and
values by being self-
explanatory or including
documentation
o Most variables related to app usage behavior are self-
explanatory.
o There are fields such as “bio” that have been empty in all
test cases, so their meaning remains unclear.
Data Quality
Consistency within and
between data streams
o Values of variables with the same names in a logged-in
API query (process evaluated in 3.5) match up.
o Time stamps of learning activities appear to be valid.
Granularity, relevant
content, and inclusion of
meta information
o General information: motivation (if applicable), date of
account creation, and if it is a premium account.
o Many indicators of learning progress and behavior:
earned XP, number of learned skills, reminder settings,
daily XP goal, and many more.
o A timeline of learning activities is available but limited to
the previous two weeks.
o Multiple languages can be learned at the same time.
More detailed information on course progress is only
available for the most recently used one.
DDP Retrieval
Number of necessary
steps and efficiency of the
process
o Researchers receive data donors’ usernames.
o API query by completing the call (see [19]) with the given
username.
Duration between request
and retrieval
o Instantaneous download is possible.
Data
Engineering
Use of standardized
notations
o Dates are given as Unix timestamps that can be
converted.
o Language codes are given in ISO-639 format.
o The resulting JSON file is valid and indicates the use of
standard datatypes, e.g., date formats.
Identifiability of
personal data
o There is identifiable information included in the API query
result aside from the username (e.g., field “fullnameif
provided). That information has to be anonymized by
researchers.
o A link leading to the associated user profile is included,
but this could have also been found within the app using
the username.
Last Updated: September 25, 2024
13
Donation
Opportunities
Repeated donations
Continuous donations
o As long as the account name is not changed, the API
query can be repeated for an indefinite number of times,
without additional input from the data donor.
o This process may be automated in a study design based
on users’ ex-ante consent, allowing researchers
continuous access to their data, without them having to
do anything.
o Alternatively, a browser extension that automates the API
query and subsequent JSON retrieval could facilitate a
process for users to actively donate their data, if such a
user-centric approach would be pursued.
Sensitivity of
DDP contents
Perceived sensitivity of
specific kinds of data
o Type of donated data: Usage statistics, learning progress
o Username donation implies continuous access to usage
statistics, unless an anonymization protocol is followed
before transmitting to-be-donated data to researchers.
Domain-specific
Challenges in
Value Creation
Comparability of users
o Information on active A/B-Testing is not included, so
users may experience different features.
o Courses in different languages are structured very
differently and can vary in terms of quality. Therefore,
they may not be comparable in some dimensions.
Highlights
o Requires next to no effort from data donors.
o Allows for study designs with ex-ante consent.
o Allows for insights into many different indicators for
learning progress.
o Some information is limited to more recent activities.
o API access may change over time.
Last Updated: September 25, 2024
14
3.4. Donation of Public API Query Results (while logged in)
Criterion
Underlying Properties
of Data Access
Examined Properties
Status: 09-06-2024
Machine-
Readability
Provided file format
single vs. multiple files
o API query as described in 3.2 but it is done while users
are logged in with their own account.
o Results are displayed in the browser and can be
exported as a JSON file and donated.
Transparency
Understandability of
variable names and
values by being self-
explanatory or including
documentation
o The meaning of many variables is self-explanatory or
can be inferred through context.
o Long list of entries about active A/B-Testing
involvement of the user, but no information about what
these tests entail.
o There are fields such as “bio” that have been empty in
all test cases, so their meaning remains unclear.
Data Quality
Consistency within and
between data streams
o Some values do not make sense with intuitive
interpretation of variable names alone, for example
“hasRecentActivity15” had the value “false” for a test
donor who very recently had used the app.
o Values of variables with the same names in a public
API query match up.
o Different language courses use their unique data
structures, respectively. Therefore, not every course
might be structured in skillsor make use of the
progressQuizHistory”.
o App updates have led to changing data structures. For
example, newer users were asked what motivated them
to learn a language. For older users, this value is
empty.
Granularity, relevant
content, and inclusion of
meta information
o Multiple languages can be learned at the same time.
More detailed information on course progress is only
available for the most recently used one.
o Information on course progress is included in several
different ways: number and titles of completed lessons,
quiz scores and time taken, etc.
o Included variables differ between languages as the
courses have different structures. Usually, courses with
English as the base language include more information
on progress indicators or course contents.
DDP Retrieval
Number of necessary
steps and efficiency of
the process
o Before the query, users need to log into Duolingo in a
browser.
o API query by completing the call (see [19]) with the
given username in the used browser’s address bar.
o Users can then save the displayed results with the
respective functionality the used browser provides. This
is also possible for some mobile browsers.
[The method described in 3.4 is essentially the same
process, but with a different API.]
Last Updated: September 25, 2024
15
Duration between
request and retrieval
o The API is a first-party service, and therefore always
possible as long as Duolingo itself is currently online,
and the output is instantaneous.
o The process of querying the API may be automated for
a better user experience for data donors.
Data Engineering
Use of standardized
notations
o Dates are given as Unix timestamps that can be
converted.
o Language codes are given in ISO-639 format.
o The resulting JSON file is valid and indicates the use of
standard datatypes, e.g., date formats.
Identifiability of
personal data
o Identifiable information aside from the username is
included e.g., in the form of the account email.
o IDs from linked Google and Facebook accounts may
also be included, albeit encrypted.
Donation
Opportunities
Repeated donations
Continuous donations
o Another manual query is necessary for a repeated
donation, while making sure the user is logged into their
account at the same time.
o Developing a browser extension that automates the
JSON retrieval can facilitate this process for users.
o Since the process is automated on Duolingo’s side, this
can be repeated for an indefinite number of times as
long as the API stays open.
Sensitivity of DDP
contents
Perceived sensitivity of
specific kinds of data
o Type of donated data: Usage statistics, learning
progress
o Users may donate the JSON file actively, and therefore
not allow for continuous data access.
o However, this would only be the case if the donor’s
username, profile picture and invite URL is redacted in
the donated files, otherwise continuous access to
learning progress is possible with the method described
in 3.2.
Domain-specific
Challenges in Value
Creation
Comparability of users
o Information on active A/B-Testing is included, but it is
hard to understand the impact of these tests without
documentation. Users may experience different
features.
o Courses in different languages are structured very
differently and can vary in terms of quality. Therefore,
they may not be comparable in some dimensions.
Highlights
o Wide range of learning progress indicators available.
o If implemented at scale, researchers should expend
effort to improve data donor’s user experience and
develop an automated solution for data retrieval.
o Higher transaction costs compared to users providing
their username for researchers to query the API.
o API access may change over time.
Last Updated: September 25, 2024
16
3.5. Donation of Internal API Query Results (while logged in)
Criterion
Underlying Properties
of Data Access
Examined Properties
Status: 09-06-2024
Machine-
Readability
Provided file format
single vs. multiple files
o API query as described in 3.2 but it is done while users
are logged in with their own account.
o Results are displayed in the browser, can be exported
as a JSON file, and then donated.
Transparency
Understandability of
variable names and
values by being self-
explanatory or including
documentation
o The meaning of many variables is self-explanatory or
can be inferred through context.
o There are variables where it is unclear what they mean,
such as the Boolean field “can_transliterate” or how the
“language strength” score (a value between 0 and 1) is
calculated.
o Long list of entries about active A/B-Testing
involvement of the user, but no information about what
these tests entail.
o There are fields such as “bio” that have been empty in
all test cases, so their meaning remains unclear.
Data Quality
Consistency within and
between data streams
o Values of variables with the same names in a logged-
out API query (process evaluated in 3.3) match up.
o App updates have led to changing data structures. For
example, newer users were asked what motivated them
to learn a language. For older users, this value is
empty.
Granularity, relevant
content, and inclusion of
meta information
o Multiple languages can be learned at the same time.
More detailed information on course progress is only
available for the most recently used one.
o Information on course progress is included in several
different ways: number and titles of completed lessons,
quiz scores and time taken, etc.
o A timeline of learning activities is available but limited to
the previous two weeks.
DDP Retrieval
Number of necessary
steps and efficiency of
the process
[Essentially the same process as described in 3.4, but with
a different API.]
o Before the query, users need to log into Duolingo in a
browser.
o API query by completing the call (see [19]) with the
given username in the used browser’s address bar.
o Users can then save the displayed results with the
respective functionality the used browser provides. This
is also possible for some mobile browsers.
Duration between
request and retrieval
o The API is a first-party service, and therefore always
possible as long as Duolingo itself is currently online,
and the output is instantaneous.
o The process of querying the API may be automated for
a better user experience for data donors.
Last Updated: September 25, 2024
17
Data Engineering
Use of standardized
notations
o Dates are given as Unix timestamps that can be
converted.
o Languages codes are given in ISO-639 format.
o The resulting JSON file is valid and indicates the use of
standard datatypes, e.g., date formats.
Identifiability of
personal data
o Identifiable information aside from the username is
included e.g., in the form of the account email.
o IDs from linked Google and Facebook accounts may
also be included, albeit encrypted.
o A link leading to the associated user profile is included,
but this could have also been found within the app
using the username.
Donation
Opportunities
Repeated donations
Continuous donations
o Another manual query is necessary for a repeated
donation, while making sure the user is logged into their
account at the same time.
o Developing a browser extension that automates the
JSON retrieval can facilitate this process for users.
o Since the process is automated on Duolingo’s side, this
can be repeated for an indefinite number of times as
long as the API stays open.
Sensitivity of DDP
contents
Perceived sensitivity of
specific kinds of data
o Type of donated data: Usage statistics, learning
progress
o Users donate the JSON file actively, and therefore do
not allow for continuous data access.
o However, this would only be the case if the donor’s
username, profile picture and invite URL is redacted in
the donated files, otherwise continuous access to
learning progress is possible with the methods
described in 3.2 and 3.3.
Domain-specific
Challenges in Value
Creation
Comparability of users
o Information on active A/B-Testing is included, but it is
hard to understand the impact of these tests without
documentation. Users may experience different
features.
o Courses in different languages are structured very
differently and can vary in terms of quality. Therefore,
they may not be comparable in some dimensions.
Highlights
o Wide range of learning progress indicators available.
o If implemented at scale, researchers should expend
effort to improve data donor’s user experience and
develop an automated solution for data retrieval.
o Higher transaction costs compared to users providing
their username for researchers to query the API.
o API access may change over time.
Last Updated: September 25, 2024
18
3.6. Donation of JSON Web Token (JWT)
Criterion
Underlying Properties
of Data Access
Examined Properties
Status: 09-06-2024
Machine-
Readability
Provided file format
single vs. multiple files
o Real-time API Access to a user’s account data via the
combination of the public API (used in 3.2. and 3.4)
and Duolingos internal API (used in 3.3. and 3.5 [19].
o There are no limits to what kind of data can be queried.
From a technical perspective, it is essentially account
access.
Transparency
Understandability of
variable names and
values by being self-
explanatory or including
documentation
o No documentation on structure or variable contents
available.
o Many variables self-explanatory (“lastStreak”,
“KnownWords”), or understandable in context.
o Note: The capabilities of the software to access the API
[19] are limited by what the open-source developers of
the API are interested in or have knowledge of.
Data Quality
Consistency within and
between data streams
o Consistency with other forms of Duolingo user data
donation in all trial cases.
Granularity, relevant
content, and inclusion of
meta information
o Extensive information on known topics, learned skills,
daily progress, and more.
o Timeseries of daily activities.
o Detailed information on multiple languages is available
by switching the currently active language.
o Includes all information available through the methods
described in 3.2 through 3.5.
DDP Retrieval
Number of necessary
steps and efficiency of
the process
Test samples were retrieved by having donors obtain the
JWT token themselves. This involved the following steps:
o Account login in an internet browser.
o Open the developer’s console and copy a line of
JavaScript code to receive a JWT, a long string.
o Donate this string and the corresponding
username.
o With it, researchers can access the data of that
account.
At scale, automation of data retrieval is needed. This could
be done by having users install a browser-add-on that
would then retrieve the token and streamline the donation
for the user.
Duration between
request and retrieval
o Instant and continuous access.
o Dependence on user’s technical skills and adherence
to provide the token (if no supporting application is
provided)
Last Updated: September 25, 2024
19
Data Engineering
Use of standardized
notations
o Timeseries data with standardized timestamps.
o Includes all standardized formats mentioned in 3.2
and 3.3.
Identifiability of
personal data
o Anonymization impossible upon donation, as
username and access information are required. Users
may withdraw their consent by changing their
password.
o After the relevant information has been retrieved, it
may be pseudonymized before analysis.
o Follower names are accessible. They cannot be
redacted if continuous account access was granted
but may be anonymized/pseudonymized by the
researchers.
Donation
Opportunities
Repeated donations
Continuous donations
o Requires ex-ante consent to continuous tracking.
o Repeated data donations do not require any action
from donors.
o A script could be implemented to locally query
specific data and only transmit the (anonymized)
data, and not the access information to researchers.
Sensitivity of DDP
contents
Perceived sensitivity of
specific kinds of data
o Type of donated data: Usage statistics, learning
progress, purchase history, usage of social features,
o The donation of a JWT is associated with a
continuous access to usage data.
Domain-specific
Challenges in Value
Creation
Comparability of users
o Information on all active A/B-Testing is included, but it
is hard to understand the impact of these tests
without documentation. Users may experience
different features.
o Courses in different languages are structured very
differently and can vary in terms of quality. Therefore,
they may not be comparable in some dimensions.
Highlights
o Currently high transaction costs in retrieval, unless a
supporting application is provided to data donors.
o This could also enable data minimalism and
anonymization, which is not possible when a JWT is
donated to grant continuous access.
o Most extensive overview on learning activities.
o Opportunity for continuous data donations.
o Most substantial privacy concerns compared to other
donation options.
o API access may change over time.
Last Updated: September 25, 2024
20
3.7. Summary of Data Access Methods
When conducting a study with Duolingo user data, the kind of information included in the data
depends on the chosen method of data access. Table contains a summary of variables and
constructs that may be investigated with this data, and through which kind of data access
method it is available.
Measures and Related Constructs
3.1
3.2
3.3
3.4
3.5
3.6
Time series
Leaderboard positions
x
x1
Learning activities
x2
x2
x
Gained language proficiency
Current course level
x
x
x
Experience points
x
x
x
x
x
x
Language “strength” score
x
x
x
Current leaderboard league
x
x
x
Quiz or placement test scores and
time taken
x
x
x
x
Total number of words learned so far
x
x
x
x
Acquired language skills
Number and names of completed
Stories
x
x
Completed lessons (internally called
“Skills”)
x
x
x
x
x
Complete list of learned words
x
x
x
x
Other aspects
Streak length: Persistency
x
x
x
x
x
x
Reminder settings: Employed tools to
ensure consistent learning
x
x
x
Daily XP Goal: Learning intensity
intentions
x
x
x
x
x
Motivation for language learning
x
x
x
x
x
Number of follows and followers:
Usage of social features
x
x
x
x
Table 1: Contents of the evaluated data access methods.
1Can be constructed by making use of continuous data access.
2Limited to activities from the previous two weeks.
Last Updated: September 25, 2024
21
3.8. Sensitivity of Related Data
It has been established that citizens’ willingness to share their data partly depends on the type
of data researchers are asking for [15]. Anonymization can alleviate privacy concerns and thus,
increase willingness to share data [20]. Another important factor is user trust, which can be
fostered with clear and transparent information about storage and processing of the data [21].
Future studies need to investigate other relevant factors that influence participant willingness
to donate data in order to achieve impactful sample sizes.
Results of an unpublished pretest survey conducted by the author team indicate that self-
reported willingness to donate is highest for methods where only the username has to be given
(such as with the methods described in 3.3 and 3.5), and much lower for methods that require
more user input, such as donating a DDP (3.1) or providing a JWT (3.6), the latter of which
also represents a more extensive access to account data. Reported reasons for reluctance to
donate highlight a combination of privacy concerns and transaction costs, which future
research will need to explore in greater detail.
3.9. Limitations
The present evaluation was performed to explore opportunities in utilizing Duolingo user data,
in order to facilitate decisions regarding future research conducted within this context. Applying
the framework [6] helped structure the process to organize information gained from these first
experiences. While doing so, we did not reach out to the Duolingo research team or data
protection office to request documentation for understanding the given data structure.
Individual data donors may experience the app differently. Even from a small number of trial
cases, we saw how factors such as the time of initial registration, or the currently active
language course affect how the retrieved data was structured. In order to use Duolingo user
data at scale, these differences need to be considered.
Finally, the used APIs are unofficial and not supported by Duolingo. This means that their
function is not guaranteed and, just like DDP contents in general, may be subject to change at
the company’s discretion. On the other hand, their open-source nature allows for continuous
adaptation and extension of functionality.
Last Updated: September 25, 2024
22
4. Outlook
This report has demonstrated how the framework for platform selection can be applied to data
retrieval methods on a digital learning platform.
Researchers interested in non-formal learning behavior may benefit a lot from studying
learning platform user data, as it gives access to aspects of directly observable learning
behavior that were previously inaccessible. Moreover, this type of data is less susceptible to
self-report biases. Therefore, data donations may pave the way for the application of learning
analytics, i.e. analyzing log data to identify patterns for understanding and optimizing learning
behavior [22], in commercial non-formal settings while being independent from the company’s
explicit cooperation.
We found multiple ways in which Duolingo user data can be retrieved and put to use, to
advance educational research in the public interest. Researchers need to weigh the available
options and select the one that is most suited to their needs and resource availability, and we
hope that this report can provide them with the information they need to make that decision.
Last Updated: September 25, 2024
23
5. References
[1] Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the
protection of natural persons with regard to the processing of personal data and on the free
movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)
(Text with EEA relevance). 2016. Accessed: Nov. 08, 2023. [Online]. Available: https://eur-
lex.europa.eu/eli/reg/2016/679/oj/eng
[2] T. C. Carrière, L. Boeschoten, B. Struminskaya, H. Janssen, N. C. de Schipper, and T. Araujo,
“Best practices for studies using digital data donation,” 2023, OSF. doi: 10.31219/osf.io/3vhbj.
[3] I. I. Van Driel, A. Giachanou, J. L. Pouwels, L. Boeschoten, I. Beyens, and P. M. Valkenburg,
“Promises and Pitfalls of Social Media Data Donations,” Communication Methods and Measures,
vol. 16, no. 4, pp. 266282, Oct. 2022, doi: 10.1080/19312458.2022.2109608.
[4] L. Boeschoten, J. Ausloos, J. E. Möller, T. Araujo, and D. L. Oberski, “A framework for privacy
preserving digital trace data collection through data donation,” Computational Communication
Research, vol. 4, no. 2, pp. 388423, Oct. 2022, doi: 10.5117/CCR2022.2.002.BOES.
[5] I. Sen, F. Flöck, K. Weller, B. Weiß, and C. Wagner, “A Total Error Framework for Digital Traces
of Human Behavior on Online Platforms,” Public Opinion Quarterly, vol. 85, no. S1, pp. 399422,
Sep. 2021, doi: 10.1093/poq/nfab018.
[6] L. Manzke, “Data Donation for Impactful Insights: A Framework for Platform Selection,” 2024,
ResearchGate. doi: 10.13140/RG.2.2.15236.74882.
[7] L. Manzke and P. Hartl, “Data Donation for Impactful Insights: A Framework for Platform
Selection and its Application to the Use Case of German Loyalty Card Providers,” May 2024.
[8] S. Li, C. Yu, J. Hu, and Y. Zhong, “Exploring the Effect of Behavioral Engagement on Learning
Achievement in Online Learning Environment: Learning Analytics of Non-degree Online Learning
Data,” in 2016 International Conference on Educational Innovation through Technology (EITT),
Tainan: IEEE, Sep. 2016, pp. 246250. doi: 10.1109/EITT.2016.56.
[9] D. Gašević, S. Dawson, and G. Siemens, “Let’s not forget: Learning analytics are about
learning,” TECHTRENDS TECH TRENDS, vol. 59, no. 1, pp. 6471, Jan. 2015, doi:
10.1007/s11528-014-0822-x.
[10] E. Sudina and L. Plonsky, “The effects of frequency, duration, and intensity on L2 learning
through Duolingo: A natural experiment,” Journal of Second Language Studies, vol. 7, no. 1, pp.
1–43, Mar. 2024, doi: 10.1075/jsls.00021.plo.
[11] Regulation (EU) 2022/1925 of the European Parliament and of the Council of 14 September
2022 on contestable and fair markets in the digital sector (Digital Markets Act). 2022, pp. 166.
doi: 10.5040/9781782258674.
[12] M. Haim, D. Leiner, and V. Hase, “Integrating Data Donations in Online Surveys,” M&K, vol. 71,
no. 12, pp. 130137, 2023, doi: 10.5771/1615-634X-2023-1-2-130.
[13] H. Silber et al., “Linking surveys and digital trace data: Insights from two studies on determinants
of data sharing behaviour,” Journal of the Royal Statistical Society: Series A (Statistics in
Society), vol. 185, pp. S387S407, Dec. 2022, doi: 10.1111/rssa.12954.
[14] L. Boeschoten, R. Voorvaart, R. Van Den Goorbergh, C. Kaandorp, and M. De Vos, “Automatic
de-identification of data download packages,” Data Science, vol. 4, no. 2, pp. 101120, Jan.
2021, doi: 10.3233/DS-210035.
[15] N. Pfiffner and T. N. Friemel, “Leveraging Data Donations for Communication Research:
Exploring Drivers Behind the Willingness to Donate,” Communication Methods and Measures,
vol. 17, no. 3, pp. 227249, Jul. 2023, doi: 10.1080/19312458.2023.2176474.
[16] L. J. Beesley and B. Mukherjee, “Statistical inference for association studies using electronic
health records: handling both selection bias and outcome misclassification,” Biometrics, vol. 78,
no. 1, pp. 214226, Mar. 2022, doi: 10.1111/biom.13400.
Last Updated: September 25, 2024
24
[17] D. Belevan, “Duolingo Shareholder Letter Q4/FY 2023,” 2024. Accessed: May 17, 2024.
[Online]. Available: https://investors.duolingo.com/static-files/06cda5ae-c66f-4d99-82ec-
da764ecb1034
[18] SignHouse, “Duolingo Users and Growth Statistics (2024).” Accessed: Aug. 02, 2024. [Online].
Available: https://usesignhouse.com/blog/duolingo-stats/
[19] P. Hartl, Duolingo API for Python. (Feb. 02, 2024). Python. Accessed: Aug. 30, 2024. [Online].
Available: https://github.com/phHartl/Duolingo
[20] E.-M. Schomakers, C. Lidynia, and M. Ziefle, “All of me? Users’ preferences for privacy-
preserving data markets and the importance of anonymity,” Electron Markets, vol. 30, no. 3, pp.
649665, Sep. 2020, doi: 10.1007/s12525-020-00404-9.
[21] J. Juga, J. Juntunen, and T. Koivumäki, “Willingness to share personal health information:
impact of attitudes, trust and control,” Records Management Journal, vol. 31, no. 1, pp. 4859,
Jan. 2021, doi: 10.1108/RMJ-02-2020-0005.
[22] G. Siemens and D. Gašević, “Special issue on learning and knowledge analytics.,” Educational
Technology & Society, vol. 15, no. 3, pp. 1163, 2012.
Last Updated: September 25, 2024
25
6. Appendix
A. Screenshots of Deidentified Data Samples
3.1. Donation of Data Download Package (DDP)
Format: multiple CSV files
Example: from leaderboards.csv
Last Updated: September 25, 2024
26
3.2. Donation of Username + Public API Query
Format: Username given à API query à JSON file
Example: Current XP, crown score for all courses, and current streak length
Last Updated: September 25, 2024
27
3.3. Donation of Username + Internal API Query
Format: Login into any Duolingo Account à API query à JSON file
Example: Summary statistics of a currently learned language
Example: Description of a learned skill.
Last Updated: September 25, 2024
28
3.4. Donation of Public API Query Results (while logged in)
Format: Login à API query à JSON file
Example: Performance metrics in quizzes
Example: Description of acquired skill
Last Updated: September 25, 2024
29
3.5. Donation of Internal API Query Results (while logged in)
Format: Login à API query à JSON file
Example: Summary statistics for an actively learned language
Example: Section “calendar” represents a time series of learning activities during the last two weeks.
Last Updated: September 25, 2024
30
3.6. Donation of JSON Web Token (JWT)
Format: Full access to user data via the Unofficial Duolingo API for Python
Example: Explicit progress in a specific section of a course
Last Updated: September 25, 2024
31
Example: Detailed timestamps and skills done the last two weeks
Last Updated: September 25, 2024
32
B. How-To”s for Duolingo Data Access Methods
Instructions are included for the required user input for the data access methods described in
sections 3.1, 3.5 and 3.6. The process for the method described in 3.5 is identical to 3.3, except
for a different API call.
Screenshots in the first guide show a web interface that has since been updated in its UI, but
the position of the relevant button remained the same.
Last Updated: September 25, 2024
33
1. How to download a DDP from the Duolingo user profile settings
(Only possible logged into a desktop browser)
In the menu on the left:
Last Updated: September 25, 2024
34
Scroll all the way down on the next page „Account“.
Click „export my data“.
Clicking this button will trigger a message to your linked email address, asking you to confirm
the request. Within a day, you should receive a download link to your email.
Last Updated: September 25, 2024
35
2. How to download a JSON file from a username query, after logging in with that
profile
1. Log in to Duolingo in the browser of your choice (https://www.duolingo.com). This is
also possible on mobile devices.
2. Note your own username at Duolingo e.g., user_name123
3. Go to https://www.duolingo.com/2017-06-30/users?username={username} in the
same browser session. In the example this would be:
https://www.duolingo.com/2017-06-30/users?username=user_name123
4. Depending on the selected browser, save the data on the website.
a. Firefox:
[The process described in 3.5 would look similar, with a different API endpoint used in step 3.]
Last Updated: September 25, 2024
36
b. Chrome/Edge (right click, „Save as“):
c. Chrome Mobile (3-dot menu, then download):
Last Updated: September 25, 2024
37
3. How to extract a JWT from your browser’s developer console
1. Log in to Duolingo in the desktop browser of your choice and make a note of your
account login name (https://www.duolingo.com).
2. Call up the developer console of the respective browser. For example, Chrome/Edge
(https://developer.chrome.com/docs/devtools/open?hl=de):
a. Chrome/Edge (https://developer.chrome.com/docs/devtools/open?hl=en)
b. Firefox (https://firefox-source-docs.mozilla.org/devtools-user/)
3. enter the following Javascript code in the respective console:
document.cookie.match(new RegExp('(^| )jwt_token=([^;]+)'))[0].slice(11);
4. the console now shows the JWT of Duolingo for the current account.
What we need:
JWT (a long character string)
Username of the associated account
ResearchGate has not been able to resolve any citations for this publication.
Research
Full-text available
Data donations are a powerful method to enhance research into human behavior, creating new opportunities for validation and measurement. This framework was created for researchers aiming to conduct a data donation study, irrespective of the target domain. It summarizes important considerations and thus, provides a structured approach for platform selection.
Article
Full-text available
Instructed second language (L2) research has frequently addressed the effects of spacing, or, alternatively, the distribution of practice effects. The present study addresses Rogers and Cheung’s (2021) concerns about the ecological validity of such work via a natural experiment ( Craig et al., 2017 ). Learners’ self-determined exposure and in-app behavior were examined in relation to language gains over time. Duolingo learners of Spanish or French ( N = 287) completed a background questionnaire, scales measuring L2 motivation and grit, and two tests of L2 proficiency before and after a six-month period of user-controlled app usage. Total minutes of app exposure exhibited a correlation with written but not oral proficiency gains. More dependable correlates of gains were frequency- and curriculum-oriented measures. Additionally, L2 grit and motivation were weakly to moderately correlated with several in-app behaviors. We conclude with implications for how apps can best be leveraged to produce L2 gains.
Article
Full-text available
Data donations represent a user-centered approach to data collection where researchers ask EU participants to exercise their right of access (GDPR) vis-à-vis intermediaries and to donate the digital trace data they receive to academic research. These data donations are often combined with survey data to gain deeper insights into the questions under investigation. Although initially promising, this process is complex for respondents and involves serious methodological, ethical, and legal challenges for researchers. A series of recently developed software solutions facilitate and streamline data donation studies. However, these stand-alone systems work separately from survey software. As a result, respondents typically face two platforms, one for the survey and one for the data donation. To facilitate their combination, we integrated two existing software solutions for online surveys (SoSci Survey) and data donations (OSD2F). We present our integrated solution and report on experiences with the approach from two exemplary studies.
Article
Full-text available
Combining surveys and digital trace data can enhance the analytic potential of both data types. We present two studies that examine factors influencing data sharing behaviour of survey respondents for different types of digital trace data: Facebook, Twitter, Spotify and health app data. Across those data types, we compared the relative impact of four factors on data sharing: data sharing method, respondent characteristics, sample composition and incentives. The results show that data sharing rates differ substantially across data types. Two particularly important factors predicting data sharing behaviour are the incentive size and data sharing method, which are both directly related to task difficulty and respondent burden. In sum, the paper reveals systematic variation in the willingness to share additional data which need to be considered in research designs linking surveys and digital traces.
Article
Full-text available
Studies assessing the effects of social media use are largely based on measures of time spent on social media. In recent years, scholars increasingly ask for more insights in social media activities and content people engage with. Data Download Packages (DDPs), the archives of social media platforms that each European user has the right to download, provide a new and promising method to collect timestamped and content-based information about social media use. In this paper, we first detail the experiences and insights of a data collection of 110 Instagram DDPs gathered from 102 adolescents. We successively discuss the challenges and opportunities of collecting and analyzing DDPs to help future researchers in their consideration of whether and how to use DDPs. DDPs provide tremendous opportunities to get insight in the frequency, range, and content of social media activities, from browsing to searching and posting. Yet, collecting, processing, and analyzing DDPs is also complex and laborious, and demands numerous procedural and analytical choices and decisions. © 2022 The Author(s). Published with license by Taylor & Francis Group, LLC.
Article
Full-text available
The General Data Protection Regulation (GDPR) grants all natural persons the right to access their personal data if this is being processed by data controllers. The data controllers are obliged to share the data in an electronic format and often provide the data in a so called Data Download Package (DDP). These DDPs contain all data collected by public and private entities during the course of a citizens’ digital life and form a treasure trove for social scientists. However, the data can be deeply private. To protect the privacy of research participants while using their DDPs for scientific research, we developed a de-identification algorithm that is able to handle typical characteristics of DDPs. These include regularly changing file structures, visual and textual content, differing file formats, differing file structures and private information like usernames. We investigate the performance of the algorithm and illustrate how the algorithm can be tailored towards specific DDP structures.
Technical Report
To streamline the process of platform selection for data donations, we have created a framework to structure the evaluation of possible DDP-providing platforms. By compiling these insights, this report endeavors to offer orientation in the area of loyalty card data and enable impactful data donation initiatives in the future.
Preprint
Data donation forms an innovative and ethical approach to collection of digital trace data. It relies on the EU legislation around personal data, which mandates data controllers to provide data subjects with a copy on personal data collected on them upon request. Participants in data donation studies can request a data controller (e.g. online platform) for a copy of their digital trace data, and actively consent to donate (part of this) data to a researcher. Setting up a data donation study involves a lot of steps and considerations, which all can threaten the study quality if executed poorly. In this paper, a workflow for setting up a data donation study is introduced. This workflow is based on error sources identified in the Total Error Framework for data donation by Boeschoten and colleagues (2022), as well as experiences in earlier data donation studies by the authors. The workflow is discussed in detail and linked to challenges and considerations for each step. All steps are illustrated by three previous studies on data donation. The paper aims to provide guidelines and a starting point for researchers wanting to conduct a data donation study.
Article
Using data donations to collect digital trace data holds great potential for communication research, which has not yet been fully realized. Besides limited awareness and expertise among researchers, a central challenge is to motivate people to donate their personal data. Therefore, this article investigates which factors affect people’s willingness to donate across different platforms and data types. The study applies a multilevel approach that explains the reported willingness to donate different types of data (level 1) belonging to different platforms (level 2) from potential data donors with individual characteristics (level 3) to a hypothetical research project. The analysis is based on data collected through a national online survey (n = 833). We find higher willingness to donate YouTube data compared to Facebook, Instagram, or Google, as well as relevant influencing factors at all three levels. Greater willingness is found for lower perceived sensitivity and higher perceived relevance of the data (level of data type), greater perceived behavioral control to request and submit the data (platform level), more favorable attitudes toward data donation and the donation purpose, as well as lower contextual privacy concerns (individual level). Based on these findings, practical implications for future data donation studies are proposed.