Conference PaperPDF Available

Towards Open Data in Digital Education Platforms

Authors:

Abstract

Despite the traction gained by the open data movement and the rise of big data and learning analytics in education, there is limited support for researchers in education to generate, access, and share experimental data using openly-available digital education platforms. To explore how this gap could be addressed and elicit requirements, we conducted a survey with 40 researchers in the field of technology-enhanced learning, examining their experience and needs handling research data. Drawing on the results of our survey, we devised a set of features that educational platforms should provide to address the identified requirements, enabling researchers in education to run studies within typical learning environments, adhere to legal and ethical frameworks concerning privacy, and share their data confidently with a wider audience. We then categorized these features into five stages that represent the user flow, namely (1) Bootstrapping Research Studies, (2) Ensuring Consent, (3) Gathering Data, (4) Managing Data Sets, and (5) Supporting Open Research and Collaboration. Our aim is to guide forthcoming research and developments to relieve researchers of the burdens of conducting data-sensitive experiments, support the adoption of best practices, and pave the way for open data policies in digital education.
Towards Open Data in Digital Education Platforms
Joana Soares Machado, Juan Carlos Farah, Denis Gillet
School of Engineering
´
Ecole Polytechnique F´
ed´
erale de Lausanne
Lausanne, Switzerland
{joana.machado,juancarlos.farah,denis.gillet}@epfl.ch
Mar´
ıa Jes´
us Rodr´
ıguez-Triana
School of Digital Technologies
Tallinn University
Tallinn, Estonia
mjrt@tlu.ee
Abstract—Despite the traction gained by the open data move-
ment and the rise of big data and learning analytics in education,
there is limited support for researchers in education to generate,
access, and share experimental data using openly-available digital
education platforms. To explore how this gap could be addressed
and elicit requirements, we conducted a survey with 40 re-
searchers in the field of technology-enhanced learning, examining
their experience and needs handling research data. Drawing on
the results of our survey, we devised a set of features that
educational platforms should provide to address the identified
requirements, enabling researchers in education to run studies
within typical learning environments, adhere to legal and ethical
frameworks concerning privacy, and share their data confidently
with a wider audience. We then categorized these features into
five stages that represent the user flow, namely (1) Bootstrapping
Research Studies, (2) Ensuring Consent, (3) Gathering Data, (4)
Managing Data Sets, and (5) Supporting Open Research and
Collaboration. Our aim is to guide forthcoming research and
developments to relieve researchers of the burdens of conducting
data-sensitive experiments, support the adoption of best practices,
and pave the way for open data policies in digital education.
Keywords-open data, data management, data sharing, privacy,
education, learning analytics, open research
I. INTRODUCTION
In recent years, there has been a number of initiatives to
encourage the adoption of open data policies across research
institutions [1], [2], encompassing different subjects such as
data science [3], genomics [4] and physics [5]. Bolstered
by the adoption of digital technologies and the proliferation
of information generated by educational platforms [6], the
field of education appears poised to take part in the open
data movement. In fact, open data has already been used
for evidence-based research in learning analytics (LA) [7].
Nevertheless, there are ethical and privacy concerns associated
with handling data in education [6], [8], which may amplify
as the volume of and access to information increases. These
concerns are mainly addressed by regulations such as the
European Union’s (EU) General Data Protection Regulation
(GDPR), which stipulate ethical and legal requirements for
data privacy protection.
In this paper, we propose a set of features to enhance
educational platforms that would allow researchers in edu-
cation to conduct studies more easily, follow best practices
when handling sensitive data, and publish their anonymized
data sets in the spirit of open data. The paper is structured
as follows. Section II provides the motivation behind our
approach and highlights related work. In Section III we present
our requirements elicitation process, which we conducted as
a survey. In Section IV we discuss the results of our survey
and the analysis through which we selected key features with
which to support researchers. Section V presents these features
and how we categorized them into the proposed user flow. We
discuss the results in Section VI, drawing the conclusions that
drive our future work.
II. MOTI VAT IO N AND RE LATE D WORK
Due to regulations such as the GDPR, there is a need for
greater transparency in the way digital education platforms
gather data from their users [9]. This requirement extends
to researchers, who are often also subject to codes of con-
duct. Ensuring that research follows both ethical and legal
frameworks is a challenge for researchers handling sensitive
data [10]. Furthermore, the lack of confidence on whether
the protocols followed are appropriate, together with the
bureaucracy that ethics and privacy entail, hinder the path
towards open data.
The main challenges of open data in education arise due
to dispersion, unclear licensing, insufficient standardization of
data, lack of incentives and infrastructure for data sharing, as
well as ethical and data privacy issues [11]. Recent attempts to
address these challenges include a data integration and sharing
platform for digital education [12], standard data models for
data collection in e-learning [13], as well as techniques for
privacy-preserving LA [14].
Furthermore, as teachers have been shown to play an
important role in selecting the tools used in their class-
rooms [15], researchers in education also face the potential
challenge of having to adapt to platforms already in use by
teachers. However, to the best of our knowledge, there is no
educational platform that supports both learning processes in
digital settings and transparent data handling policies in a way
that could foster open data. This motivates our approach to
encourage research and open data in education by focusing
on enhancing existing learning technologies.
III. REQUIREMENTS ELICI TATIO N
In order to understand how to enhance learning platforms
with research functionalities, we conducted an online survey1.
As our focus was on learning platforms already being ex-
ploited by teachers and students, we identified researchers as
the key stakeholders in the requirements elicitation process.
We therefore distributed the survey between December 2018
and January 2019 to 40 researchers in technology-enhanced
1Online Survey: http://bit.ly/2PcKH4G
learning, mainly from European institutions. Building on the
challenges identified in Section II, the survey asked partici-
pants about their experience in the following areas: (A) Usage
of Open Data in Research, (B) Sharing Research Data, (C)
Data Management and Sharing Features, and (D) Ethics and
Data Privacy. The survey combined multiple-choice and open-
ended questions to better understand the rationale behind
responses through quantitative and qualitative data.
IV. SURVEY RESU LTS A ND ANALYS IS
In this section we analyze the results of the survey following
the areas listed in Section III. Within each area, we present
the requirements that emerged.
A. Usage of Open Data in Research
In our survey, 53% of participants used open data in their
research. Specifically, they used open data to explore publicly
available data (70%), to complement their own data (52%),
to conduct secondary analysis (48%), to create visualizations
(48%), and to reproduce other research results (35%). From
these results we can infer that tools for data exploration,
visualization, and analysis are needed in open data platforms
(Requirement A1). The aspects of open data that respondents
found most problematic were that the data format is not
always easy to use (78%), the license for using the data is
not always clear (53%), and data authenticity cannot always
be ensured (45%). Platforms should thus support interoperable
data formats, clear licensing, and data authenticity certificates
(Requirement A2).
B. Sharing Research Data
Research data was shared by 48% of respondents. Their
motivations for sharing included improving the transparency
and accountability of their research (80%), increasing the
exposure of their work (70%), allowing others to reuse and
reinterpret their data (65%), fostering collaborations (60%),
getting feedback on their research (60%), and allowing others
to reproduce their work (55%). These motivations emphasize
the need for transparency and accountability when sharing data
(Requirement B1), for enabling collaboration in data platforms,
and for linking new contributions to open data sets (Require-
ment B2). The reasons highlighted for not sharing research
data included ethical and legal constraints (67%), the lack of
standards and data infrastructure for data sharing (57%), the
cost of preparing data and documentation for sharing (52%),
and the lack of training to manage data effectively (38%).
Also, 73% of participants agreed that they would be more
inclined to share their research data if platforms provided
guidelines and tools for data management and sharing. Thus,
platforms should integrate features that help researchers handle
the sharing process in a more efficient and automated way
(Requirement B3). Finally, 80% of respondents would be
willing to share their research data with colleagues from their
research group, with the participants of the study in question
(55%), and with trusted peers (50%), while 45% would want to
share data openly with the public. Platforms should therefore
support not only open data in the broad sense, but also sharing
data with different degrees of exposure (Requirement B4).
C. Data Management and Sharing Features
On a scale of 1 (not interested) to 5 (very interested),
participants were asked to rate the features presented in Fig.
1. Considering ratings 4 and 5, the following requirements
emerge. A total of 78% of participants were interested in plat-
forms providing a way to ask users to participate in research
studies within the platform itself (Requirement C1). A consent
management tool was considered useful by 81% of respon-
dents, indicating that data platforms are in a suitable position
to help researchers comply with existing ethical and legal
regulations (Requirement C2). Moreover, 71% were interested
in the automated removal of data when consent is withdrawn
and 43% in allowing students to disable tracking, showing
the need for tools designed to help researchers manage data
privacy more effectively (Requirement C3). These results also
confirmed Requirements A1,A2, and B2, as participants re-
ported their interest in an open data repository (68%), in tools
for interacting with data sets (63%), in tools to import (63%)
and export (83%) data in multiple formats, and in certifying
data authenticity (58%). Finally, 53% of participants showed
interest in tagging data sets with a Digital Object Identifier
(DOI) (Requirement C4).
D. Ethics and Data Privacy
While 58% of respondents followed an explicit code of
conduct in their research, such as their institutional or national
codes, 35% were not sure, and 7% did not follow any code of
conduct at all. These results reveal a lack of awareness of eth-
ical and legal requirements guiding research practices, which
was reinforced by the fact that only 62% of participants had
an ethics committee in their institution. These figures support
Requirement C2. Furthermore, 50% of respondents tracked the
consent given by subjects for their research studies, mostly
on paper (80%). Only 40% of respondents had strategies or
Fig. 1. Perceived interest on data management features (1 = not interested;
5 = very interested).
TABLE I
PROC ES SE S AN D FEAT UR ES TO S UP PO RT TH E RE QUI RE ME NT S TH AT EME RG ED F ROM T HE S URV EY RE SU LTS .
Process Requirements Features
Bootstrapping
Research Studies C1 - Allow researchers to ask teachers to participate in research studies.
Ensuring Consent C2, C3
- Allow participants to provide consent.
- Allow researchers and participants to view signed consent forms.
- Allow participants to withdraw consent.
Gathering Data C1
- Configure the data-related parameters of an experiment.
- Collect data generated inside the educational platform.
- Provide contextual information.
Managing Data Sets A2, C2, C3, D1
- Dedicated access for participants to view data collected about them.
- Automatic removal of data from participants who withdraw consent.
- Compliance with data privacy protection requirements.
- Store data in a custom location.
- Verification of the authenticity of the data generated in the platform.
Supporting
Open Research
and Collaboration
A1, A2, B1, B2,
B3, B4, C4
- Repository to expose data sets and associated resources.
- Different levels of exposure and granularity to share data sets.
- Citable identifier for data sets.
- Data export in multiple formats.
- Data import from external repositories and new contributions.
- Clear specification of rights and terms of use of the data set.
- Interaction with data sets in the repository.
methods in place to handle data privacy-related processes
in a reproducible way and 28% did not allow subjects to
access information collected about them, confirming the need
for Requirement C3. Researchers also expressed the need to
store experiment data in custom locations (40%), so platforms
should provide the option to specify where the data collected
is kept (Requirement D1).
V. USE R FLOW
With a focus on the scientific method’s steps of conducting
experiments and communicating results, we defined the user
flow of a digital education platform supporting experimen-
tal research according to the following five processes: (1)
Bootstrapping Research Studies, (2) Ensuring Consent, (3)
Gathering Data, (4) Managing Data Sets, and (5) Support-
ing Open Research and Collaboration. We then mapped the
requirements that emerged from the survey to features that
would address them within the context of our user flow. Table
I highlights this mapping. In supporting a user flow with these
features, a digital education platform could relieve researchers
of the burdens of conducting data-sensitive experiments, which
could be limiting the adoption of open data practices in
education. Moreover, these features are meant to empower all
stakeholders, as recommended in [8]. Firstly, by creating a di-
rect communication channel between researchers and teachers
to increase transparency of how research studies are framed.
Secondly, by allowing students to provide consent directly,
temporarily disable tracking, and potentially withdraw consent.
Finally, external contributors can participate in research in the
spirit of the open data movement.
VI. CONCLUSIONS AND FUTURE WO RK
In this paper, we identified a set of features necessary
to encourage open research and foster open data in digital
education platforms. The results of our survey showed that
around half of respondents used open data and shared their
own research data with others. Nevertheless, the majority of
respondents had a number of concerns regarding data sharing,
which were mostly linked to ethical and legal requirements.
Our findings suggest that researchers in education are willing
to participate in the open data movement, but require support
tools to manage and share their research data.
Although our analysis is constrained by the sample size of
our survey, we aim to build on our work by implementing
an architecture that supports the aforementioned features, and
conducting a usability study. This will allow us to evaluate
which features are more valuable for researchers, validate the
user interface, and receive direct feedback from stakeholders.
ACK NOWL ED GME NT
This research has been partially funded by the European
Union (grant agreement nos. 731685 and 669074).
REFERENCES
[1] A. Burton, D. Groenewegen, C. Love, A. Treloar, and R. Wilkinson,
“Making research data available in Australia,IEEE IS, 2012.
[2] “EPFL Open Science Fund,” https://research-office.epfl.ch/
epflopensciencefund/, accessed: Jan 2019.
[3] A. C. Bart, J. Tibau, E. Tilevich, C. A. Shaffer, and D. Kafura,
“Blockpy: An open access data-science environment for introductory
programmers,” Computer, 2017.
[4] “Human Genome Project,” https://web.ornl.gov/sci/techresources/
Human Genome, accessed: Jan 2019.
[5] “CERN Open Data Portal,” http://opendata.cern.ch, accessed: Jan 2019.
[6] A. Pardo and G. Siemens, “Ethical and privacy principles for learning
analytics,” BJET, 2014.
[7] S. Kellogg and A. Edelmann, “Massively open online course for
educators (mooc-ed) network dataset,” BJET, 2015.
[8] N. Sclater, “Developing a code of practice for learning analytics,JLA,
2016.
[9] J. C. Farah, A. Vozniuk, M. J. Rodr´
ıguez-Triana, and D. Gillet, “A
teacher survey on educational data management practices: Tracking and
storage of activity traces,EP4LA Workshop @ EC-TEL, 2017.
[10] J. P. Daries et al., “Privacy, Anonymity, and Big Data in the Social
Sciences,” Communications of the ACM, 2014.
[11] S. Dietze, G. Siemens, D. Taibi, and H. Drachsler, “Datasets for learning
analytics,” JLA, 2016.
[12] J. Nicholson and I. Tasker, “Dataexchange: Privacy by design for data
sharing in education,” in IEEE FADS, 2017.
[13] A. del Blanco, A. Serrano, M. Freire, I. Mart´
ınez-Ortiz, and
B. Fern´
andez-Manj´
on, “E-learning standards and learning analytics. Can
data collection be improved by using standard data models?” in IEEE
EDUCON, 2013.
[14] M. E. Gursoy, A. Inan, M. E. Nergiz, and Y. Saygin, “Privacy-preserving
learning analytics: challenges and techniques,” TLT, 2017.
[15] M. J. Rodr´
ıguez-Triana, A. Mart´
ınez-Mon´
es, and S. Villagr´
a-Sobrino,
“Learning analytics in small-scale teacher-led innovations: ethical and
data privacy issues,JLA, 2016.
... Given that flexibility is often required for such implementations, we focus both on how our pipeline's components can be used as part of an integrated system and individually as standalone units. Building on a previous requirements elicitation process [5], we begin by reviewing related work and motivating the need for such a pipeline. We then introduce the design considerations and architecture of our data pipeline in Section III and Section IV, respectively. ...
... We build on our review of related work and on a previous requirements elicitation process to guide the design considerations for our pipeline. Outcomes from this elicitation process showed that researchers in education were most interested in 19 features across five different processes related to LA data collection, management, and dissemination [5]. These features were to be supported by a toolkit comprising tools for (i) data exploration, visualization, and analysis, (ii) data management and sharing, (iii) managing data privacy, (iv) interacting with datasets, (v) importing and exporting data in multiple formats, and (vi) certifying data authenticity. ...
... The architecture was seen to have significant practical implications. First, the requirement elicitation process [5] highlighted the diversity of tools researchers require to manage experimental data, and the devised pipeline provided these tools under one umbrella. This helped ensure that researchers (i) have all the necessary tools to manage their data over its life-cycle, (ii) do not need to find, configure, and adapt tools designed by different entities, and (iii) benefit from a consistent user experience and interface design. ...
Conference Paper
Full-text available
Despite the importance of learning analytics in digital education, there is limited support for researchers in education to generate, access, and share experimental data while complying with ethical and privacy legislation. We propose a set of related tools that support researchers with these tasks and present a blueprint for how these tools can be integrated with existing platforms, enabling researchers to run studies within learning environments, adhere to legal and ethical privacy frameworks, and share their anonymous or anonymized data with a wider audience. We demonstrate the integration of these features into an existing online learning platform.
... Scalability and Adaptability: EREDA's architecture is designed to scale, accommodating increasing data volumes as the platform gains wider adoption. The flexibility of the data pipeline and storage solutions ensures that EREDA can adapt to various educational contexts, making it a versatile tool for researchers across different institutions (Machado et al., 2019). Ethical Considerations: Ongoing monitoring of privacy and security measures is essential to maintaining the platform's ethical standards. ...
Conference Paper
The Educational Research Data Platform (EREDA) developed at Kyoto University addresses the challenges of managing and analyzing educational data while prioritizing privacy and security. EREDA seamlessly integrates with the existing Learning and Evidence Analytics Framework (LEAF) and offers advanced data management and analysis capabilities, including a robust Data Integrity Checker (DIC) tool and backfilling. By employing real-time data streaming, robust anonymization techniques, and a multi-tenancy design, EREDA empowers researchers to conduct secure and efficient educational research. The platform also features collaborative tools that enhance knowledge sharing and accelerate insight generation. As EREDA continues to evolve, its potential to drive data-driven decision-making in educational institutions and influence educational policies is significant. Future developments will focus on scaling the platform to reach a broader user base and enhancing its functionality to better support researchers.
... As for availability and sharing, there is a lack of awareness of the importance of standardization in opening data so that it is reusable and interchangeable in different opportunities. Other difficulties perceived by Machado et al. [2019] regarding open data in education were verified during the study performed in this paper, which are dispersion, unclear licensing, insufficient standardization of data, and lack of incentives and infrastructure for data sharing. ...
Article
Full-text available
Open data is a concept attributed to sharing data with anyone, and in addition to being accessed, this data can be manipulated and redistributed. The optimized and interchangeable use of open data can lead to so-called open innovation, which can be understood as the crossing of information between different organizations, to generate more complete and innovative systems and solutions. Despite the clear benefit for society, there are major challenges highlighted in different studies for its implementation, such as the lack of promotion of open data, the lack of standardization in data availability, as well as the lack of complete and updated information, among others. This study uses an available reproducible methodology, to show, through different dimensions, the open data panorama in Brazil, which indicates that there are many opportunities for improvement, in categories such as standardization of data exposure and its licenses, update rate, and, due to the absence of some data, the lack of promotion of open data.
... Data privacy management is an essential part of ethical LA practice. There is a growing base of research on the measurement and mitigation of privacy risks to address ethical challenges presented by the collection and use of learner data for analytics (Chicaiza et al., 2020;Corrin et al., 2019;Drachsler & Greller, 2016;Ferguson, 2019;Gursoy et al., 2017;Hoel & Chen, 2016;Khalil & Ebner, 2016;Machado et al., 2019;Pardo & Siemens, 2014;Steiner et al., 2016). There is also some work addressing the crossdisciplinary nature of learning analytics, including privacy concerns (eg, Teasley, 2019). ...
Article
Data is fundamental to Learning Analytics research and practice. However, the ethical use of data, particularly in terms of respecting learners’ privacy rights, is a potential barrier that could hinder the widespread adoption of Learning Analytics in the education industry. Despite the policies and guidelines of privacy protection being available worldwide, this does not guarantee successful implementation in practice. It is necessary to develop practical approaches that would allow for the translation of the existing guidelines into practice. In this study, we examine an initial set of privacy-preserving mechanisms on a large-scale education dataset. The data utility is evaluated before and after privacy-preserving mechanisms are applied by fitting into commonly used Learning Analytics models, providing an evaluation of the utility loss. We further explore the balance between preserving data privacy and maintaining data utility in Learning Analytics. The results prove the compatibility between preserving learners’ privacy and Learning Analytics, providing a benchmark of utility loss to practitioners and researchers in the education sector. Our study reminds an imminent concern of data privacy and advocates that privacy-preserving can and should be an integral part of the design of any Learning Analytics technique.
... Moreover, technological disruptions can create distractions and detract from the learning process, as students may be tempted to engage in non-educational activities during online learning sessions. Effective strategies need to be developed to keep students engaged and motivated during online learning (Machado et al., 2019). ...
Article
Full-text available
Purpose: This research delves into the potential transformation of future education systems as a result of technological advancements. The study emphasizes the role of technology in addressing socioeconomic disparities, promoting inclusivity, and encouraging global collaboration. Its objective is to provide a detailed analysis of how technology can reshape educational paradigms and its wider implications for human advancement. Materials and Methods: The study involves a thorough assessment of relevant research studies to synthesize the effects of technological disruptions on educational methodologies. The data was gathered through surveys to gain insights and perspectives on the impact of technological disruptions on education. Statistical techniques including ANOVA analysis were used to analyze the data and compare differences between groups. Additionally, data visualization techniques such as bar charts and pie charts were utilized to present key findings and trends in the dataset. Findings: According to the analysis, technology has a significant impact on education transformation. A majority of 85% respondents agreed that technology can enhance access to education for marginalized populations, while 75% believed that personalized learning technologies can effectively address diverse learning needs. Approximately 68% of participants expressed optimism about the potential of online learning platforms to promote global collaboration and cultural exchange in education. Implications to Theory, Practice and Policy: This research elucidates the potential of technology in education and its transformative impact on both theory and practice. The findings suggest that technological innovations can be leveraged to mitigate socioeconomic disparities, promote inclusivity, and foster global collaboration in education. From a policy perspective, the study highlights the need for investment in technology infrastructure and inclusive educational policies to ensure equitable access to quality education for all learners. The study further provides practical insights for educators to implement technological enhanced teaching and learning strategies to meet the diverse needs of students and prepare them for success in a rapidly changing global landscape.
... Data privacy management is an essential part of ethical LA practice. There is a growing base of research on the measurement and mitigation of privacy risks to address ethical challenges presented by the collection and use of learner data for analytics (Chicaiza et al., 2020;Corrin et al., 2019;Drachsler & Greller, 2016;Ferguson, 2019;Gursoy et al., 2017;Hoel & Chen, 2016;Khalil & Ebner, 2016;Machado et al., 2019;Pardo & Siemens, 2014;Steiner et al., 2016). There is also some work addressing the crossdisciplinary nature of learning analytics, including privacy concerns (eg, Teasley, 2019). ...
Article
Full-text available
For the developers of next‐generation education technology (EdTech), the use of Learning Analytics (LA) is a key competitive advantage as the use of some form of LA in EdTech is fast becoming ubiquitous. At its core LA involves the use of Artificial Intelligence and Analytics on the data generated by technology‐mediated learning to gain insights into how students learn, especially for large cohorts, which was unthinkable only a few decades ago. This LA growth‐spurt coincides with a growing global “Ethical AI” movement focussed on resolving questions of personal agency, freedoms, and privacy in relation to AI and Analytics. At this time, there is a significant lack of actionable information and supporting technologies, which would enable the goals of these two communities to be aligned. This paper describes a collaborative research project that seeks to overcome the technical and procedural challenges of running a data‐driven collaborative research project within an agreed set of privacy and ethics boundaries. The result is a reference architecture for ethical research collaboration and a framework, or roadmap, for privacy‐preserving analytics which will contribute to the goals of an ethical application of learning analytics methods. Practitioner notes What is already known about this topic Privacy Enhancing Technologies, including a range of provable privacy risk reduction techniques (differential privacy) are effective tools for managing data privacy, though currently only pragmatically available to well‐funded early adopters. Learning Analytics is a relatively young but evolving field of research, which is beginning to deliver tangible insights and value to the Education and EdTech industries. A small number of procedural frameworks have been developed in the past two decades to consider data privacy and other ethical aspects of Learning Analytics. What this paper adds This paper describes the mechanisms for integrating Learning Analytics, Data Privacy Technologies and Ethical practices into a unified operational framework for Ethical and Privacy‐Preserving Learning Analytics. It introduces a new standardised measurement of privacy risk as a key mechanism for operationalising and automating data privacy controls within the traditional data pipeline; It describes a repeatable framework for conducting ethical Learning Analytics. Implications for practice and/or policy For the Learning Analytics (LA) and Education Technology communities the approach described here exemplifies a standard of ethical LA practice and data privacy protection which can and should become the norm. The privacy risk measurement and risk reduction tools are a blueprint for how data privacy and ethics can be operationalised and automated. The incorporation of a standardised privacy risk evaluation metric can help to define clear and measurable terms for inter‐ and intra‐organisational data sharing and usage policies and agreements (Author, Ruth Marshall, is an Expert Contributor on ISO/IEC JTC 1/SC 32/WG 6 "Data usage", due for publication in early 2022).
... The shift of open data movement has been recently addressed by Machado et al. (2019), which also mentions the "limited support for researchers in education to generate, access, and share experimental data using openly available digital education platforms." Still, this review paper is focused on the process of conducting experiments and communicating results. ...
Article
Full-text available
The availability of a dataset represents a critical component in educational data mining (EDM) pipelines. Once the dataset is at hand, the next steps within the research methodology regard proper research issue formulation, data analysis pipeline design and implementation and, finally, presentation of validation results. As the EDM research area is continuously growing due to the increasing number of available tools and technologies, one of the critical issues that constitute a bottleneck regards a properly documented review on publicly available datasets. This paper aims to present a succinct, yet informative, description of the most used publicly available data sources along with their associated EDM tasks, used algorithms, experimental results and main findings. We have found that there are three types of data sources: well‐known data sources, datasets used in EDM competitions and standalone EDM datasets. We conclude that the success of the future of EDM data sources will rely on their ability to manage proposed approaches and their experimental results as a dashboard of benchmarked runs. Under these circumstances, the reproducibility of data analysis pipelines and benchmarking of proposed algorithms becomes at hand for the research community such that progress in the EDM domain may be much more easily acquired. The most crucial outcome regards the possibility of continuously improving existing data analysis pipelines by tackling EDM tasks that rely on publicly available datasets and benchmarking data analysis pipelines that use open‐source implementations. This article is categorized under: • Application Areas > Education and Learning • Fundamental Concepts of Data and Knowledge > Big Data Mining Abstract This paper presents a brief, yet informative review on publicly available datasets, open‐source code and models, and integration systems that perform comparative analysis. The study comes as an ingredient for future progress in Educational Data Mining
Presentation
Full-text available
When students participate in computer-mediated learning, their activities are often recorded for learning analytics and educational data mining purposes. While handling student data has associated privacy concerns and is often subject to legal regulations, there is a limited understanding of how educational data is currently managed in practice. To clarify this question, we conducted a survey of over 100 teachers, examining their experience with storing and analyzing student data. The survey identified the wide variety of platforms used to track and store student activity traces, highlighting the necessity to develop common data exchange standards and enforce data management policies that are consistent across platforms. In addition, the responses also revealed a mismatch between platform affordances regarding data tracking and storage, and teacher awareness of such affordances. This disconnect could be mitigated by reinforcing the transparency policies and usability of educational platforms, as well as by improving teacher data management literacy.
Conference Paper
Full-text available
The UK education data integration and sharing market has long been based on scope or object-level data sharing. However, this approach leaves openings for data leaks and may not be compatible with the forthcoming General Data Protection Regulation. We present DataExchange, a data integration and sharing platform designed around the concept of privacy by design. DataExchange makes use of internationally reviewed education data and communication open standards. DataExchange is based on attribute-level privacy controls which improves visibility of third party data requirements, ensures that third parties can only access the data they have explicit authorization for, and provides transparency as to what data is shared.
Conference Paper
Full-text available
When students participate in computer-mediated learning, their activities are often recorded for learning analytics and educational data mining purposes. While handling student data has associated privacy concerns and is often subject to legal regulations, there is a limited understanding of how educational data is currently managed in practice. To clarify this question, we conducted a survey of over 100 teachers, examining their experience with storing and analyzing student data. The survey identified the wide variety of platforms used to track and store student activity traces, highlighting the necessity to develop common data exchange standards and enforce data management policies that are consistent across platforms. In addition, the responses also revealed a mismatch between platform affordances regarding data tracking and storage, and teacher awareness of such affordances. This disconnect could be mitigated by reinforcing the transparency policies and usability of educational platforms, as well as by improving teacher data management literacy.
Article
Full-text available
The European LinkedUp and LACE (Learning Analytics Community Exchange) project have been responsible for setting up a series of data challenges at the LAK conferences 2013 and 2014 around the LAK dataset. The LAK datasets consists of a rich collection of full text publications in the domain of Learning Analytics and Educational Data Mining. The LAK dataset offers publicly available, machine-readable versions of research articles from the Learning Analytics and Educational Data Mining communities in various formats, where the main goal is to facilitate research, analysis, and smart explorative applications. Based on the insights gained from these data challenges, the idea was born to make more Learning Analytics data sets publicly available for researchers to get to a more open access data-driven research community within Learning Analytics. With this special section, we publish four data sets that answered the call for data sets by the journal. It is our vision to collect more data sets like these in this initial collection and reward their creators through citing the datasets and connecting new research outcomes to them.
Article
Full-text available
Ethical and legal objections to learning analytics are barriers to development of the field, thus potentially denying students the benefits of predictive analytics and adaptive learning. Jisc, a charitable organisation which champions the use of digital technologies in UK education and research, has attempted to address this with the development of a Code of Practice for Learning Analytics. The Code covers the main issues institutions need to address in order to progress ethically and in compliance with the law. This paper outlines the extensive research and consultation activities which have been carried out to produce a document which covers the concerns of institutions and, critically, the students they serve. The resulting model for developing a code of practice includes a literature review, setting up appropriate governance structures, developing a taxonomy of the issues, drafting the code, consulting widely with stakeholders, publication, dissemination, and embedding it in institutions.
Article
Full-text available
As a further step towards maturity, the field of learning analytics (LA) is working on the definition of frameworks that structure the legal and ethical issues that scholars and practitioners must take into account when planning and applying LA solutions to their learning contexts. However, current efforts in this direction tend to be focused on institutional higher education approaches. This paper reflects on the need to extend these ethical frameworks to cover other approaches to LA; more concretely, small-scale classroom-oriented approaches that aim to support teachers in their practice. This reflection is based on three studies where we applied our teacher-led learning analytics approach in higher education and primary school contexts. We describe the ethical issues that emerged in these learning scenarios, and discuss them according to three dimensions: the overall learning analytics approach, the particular solution to learning analytics adopted, and the educational contexts where the analytics are applied. We see this effort as a first step towards the wider objective of providing a more comprehensive and adapted ethical framework to learning analytics that is able to address the needs of different learning analytics approaches and educational contexts.
Research
Full-text available
Paper has been submitted for consideration for a special section on ethics and privacy of the Journal of Learning Analytics. Ethical and legal objections to learning analytics are barriers to development of the field, thus potentially denying students the benefits of predictive analytics and adaptive learning. Jisc, a charitable organisation which champions the use of digital technologies in UK education and research, has attempted to address this with the development of a Code of Practice for Learning Analytics. The Code covers the main issues institutions need to address in order to progress ethically and in compliance with the law. This paper outlines the extensive research and consultation activities which have been carried out to produce a document which covers the concerns of institutions and, critically, the students they serve. The resulting model for developing a code of practice includes a literature review, setting up appropriate governance structures, developing a taxonomy of the issues, drafting the code, consulting widely with stakeholders, publication, dissemination, and embedding it in institutions.
Article
Full-text available
This paper presents the Massively Open Online Course for Educators (MOOC‐Ed) network dataset. It entails information on two online communication networks resulting from two consecutive offerings of the MOOC called The Digital Learning Transition in K‐12 Schools in spring and fall 2013. The courses were offered to educators from the USA and abroad. Though based on the same course, minor controlled variations were made to both MOOCs in terms of the course length, discussion prompts and group size. The dataset provides opportunities to examine how participants leverage online communication forums to support their learning. In particular, it allows modeling network mechanisms to better understand factors that facilitate or impede the exchange of information among educators.
Article
Non-computer science majors often struggle to find relevance in traditional computing curricula that tend to emphasize abstract concepts, focus on nonpractical entertainment, or rely on decontextualized settings. BlockPy, a web-based, open access Python programming environment, supports introductory programmers in a data-science context through a dual block/text programming view. The web extra at https://youtu.be/RzaOPqOpMoM illustrates BlockPy features discussed in the article.
Article
Educational data contains valuable information that can be harvested through learning analytics to provide new insights for a better education system. However, sharing or analysis of this data introduce privacy risks for the data subjects, mostly students. Existing work in the learning analytics literature identifies the need for privacy and pose interesting research directions, but fails to apply state of the art privacy protection methods with quantifiable and mathematically rigorous privacy guarantees. This work aims to employ and evaluate such methods on learning analytics by approaching the problem from two perspectives: (1) the data is anonymized and then shared with a learning analytics expert, and (2) the learning analytics expert is given a privacy-preserving interface that governs her access to the data. We develop proof-of-concept implementations of privacy preserving learning analytics tasks using both perspectives and run them on real and synthetic datasets. We also present an experimental study on the trade-off between individuals’ privacy and the accuracy of the learning analytics tasks.