ArticlePDF Available

Learning analytics and educational data mining: Towards communication and collaboration

Authors:

Abstract and Figures

Growing interest in data and analytics in education, teaching, and learning raises the priority for increased, high-quality research into the models, methods, technologies, and impact of analytics. Two research communities -- Educational Data Mining (EDM) and Learning Analytics and Knowledge (LAK) have developed separately to address this need. This paper argues for increased and formal communication and collaboration between these communities in order to share research, methods, and tools for data mining and analysis in the service of developing both LAK and EDM fields.
Content may be subject to copyright.
Learning Analytics and Educational Data Mining: Towards
Communication and Collaboration
George Siemens
Technology Enhanced Knowledge Research Institute
Athabasca University
gsiemens@athabascau.ca
Ryan S J.d. Baker
Department of Social Science and Policy Studies
Worcester Polytechnic Institute
rsbaker@wpi.edu
ABSTRACT
Growing interest in data and analytics in education, teaching, and
learning raises the priority for increased, high-quality research
into the models, methods, technologies, and impact of analytics.
Two research communities – Educational Data Mining (EDM)
and Learning Analytics and Knowledge (LAK) have developed
separately to address this need. This paper argues for increased
and formal communication and collaboration between these
communities in order to share research, methods, and tools for
data mining and analysis in the service of developing both LAK
and EDM fields.
Categories and Subject Descriptors
H.2.8 [Database Applications]: Data Mining
General Terms
Algorithms, Human Factors, Measurements.
Keywords
Educational data mining, learning analytics and knowledge,
collaboration
1. INTRODUCTION
In education, the emergence of “big data” through new extensive
educational media, combined with advances in computation [1]
holds promise for improving learning processes in formal
education, and beyond as well. Increasingly, very large data sets
are available from students’ interactions with educational software
and online learning - among other sources - with public data
repositories supporting researchers in obtaining this data [2].
Two distinct research communities, Educational Data Mining
(EDM) and Learning Analytics and Knowledge (LAK), have
developed in response.
The first workshop on Educational Data Mining was held in 2005,
in Pittsburgh, Pennsylvania. This was followed by annual
workshops and, in 2008, the 1st International Conference on
Educational Data Mining, held in Montreal, Quebec. Annual
conferences on EDM were joined by the Journal of Educational
Data Mining, which published its first issue in 2009, with Kalina
Yacef as Editor. The first Handbook of Educational Data Mining
was published in 2010 [7]. In the summer of 2011, the
International Educational Data Mining Society (IEDMS)
(http://www.educationaldatamining.org/) was formed to “promote
scientific research in the interdisciplinary field of educational data
mining”, organizing the conferences and journal, and the free
open-access publication of conference and journal articles. The
EDM community brings together an inter-disciplinary community
of computer scientists, learning scientists, psychometricians, and
researchers from other traditions. A first review of research in
EDM was presented by Romero & Ventura [3], followed by a
theoretical model proposed by Baker & Yacef [4]. A very
comprehensive review of EDM research can be found in [6].
The Learning Analytics and Knowledge conference series was
initiated in early summer, 2010, with the development of global
steering and program committees (https://tekri.athabascau.ca/
analytics/node/5). The conference explicitly emphasized its role as
bridging the computer science and sociology/psychology of
learning in declaring that the “technical, pedagogical, and social
domains must be brought into dialogue with each other to ensure
that interventions and organizational systems serve the needs of
all stakeholders.” The first conference, held in Banff, Canada
attracted over 100 participants, with proceedings published in
ACM [5], validating interest in inter-disciplinary approaches to
analytics in learning. In summer of 2011, the Society for Learning
Analytics (SoLAR -- http://www.solaresearch.org/) was formed to
provide oversight for the conference, develop and advance a
research agenda in learning analytics, as well as advocate for, and
educate in the use of, analytics in learning.
With growing research interest in learning analytics and
educational data mining, as well as the rapid development of
software and analytics methods, it is important for researchers and
educators to recognize the unique attributes of each community.
While LAK and EDM share many attributes and have similar
goals and interests, they have distinct technological, ideological,
and methodological orientations. As schools, university, and
corporate learning and curriculum organizations begin to adopt
data mining and analytics, both LAK and EDM can benefit from
building off work occurring in the other community. This paper
details the overlap between these different communities and
discusses the benefits of increased communication and
collaboration.
2. SIMILARITIES BETWEEN
COMMUNITIES
The EDM and LAK communities are defined in relatively similar
ways. The International Educational Data Mining Society defines
EDM as follows: “Educational Data Mining is an emerging
discipline, concerned with developing methods for exploring the
unique types of data that come from educational settings, and
using those methods to better understand students, and the settings
which they learn in.”
The Society for Learning Analytics Research defines Learning
Analytics as: “… the measurement, collection, analysis and
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
Conference’10, Month 1–2, 2010, City, State, Country.
Copyright 2010 ACM 1-58113-000-0/00/0010…$10.00.
reporting of data about learners and their contexts, for purposes of
understanding and optimizing learning and the environments in
which it occurs.”
EDM and LAK both reflect the emergence of data-intensive
approaches to education. In sectors such as government, health
care, and industry, data mining and analytics have become
increasingly prominent for gaining insight into organizational
activities. Drawing value from data in order to guide planning,
interventions, and decision-making is an important and
fundamental shift in how education systems function. LAK and
EDM share the goals of improving education by improving
assessment, how problems in education are understood, and how
interventions are planned and selected. Extensive use by
administrators, educators, and learners of the data produced
during the educational process raises the need for research-based
models and strategies. Both communities have the goal of
improving the quality of analysis of large-scale educational data,
to support both basic research and practice in education.
3. KEY DISTINCTIONS BETWEEN
COMMUNITIES
The similarities between EDM and LAK suggest numerous areas
of research overlap. Additionally, the organizational deployment
of EDM and LAK requires similar data and researcher skill-sets.
However, these two communities have different roots and some
distinctions are important to note. Table 1 shows some of the key
differences between the communities. It is important to note that
these distinctions are meant to represent broad trends in the two
communities; many EDM researchers conduct research that could
be placed on the LAK side of each of these distinctions, and many
LAK researchers conduct research that could be placed on the
EDM side of these distinctions. By identifying these distinctions,
we hope to identify places where the two communities can learn
from each other, rather than defining the communities in an
exclusive fashion. Certainly, communities that grow organically
as these two communities have done will not have rigid edges
between what work appears in the two communities.
One key distinction is found in the type of discovery that is
prioritized. In both communities, research can be found that uses
automated discovery and research can be found that leverages
human judgment through visualization and other methods.
However, EDM has a considerably greater focus on automated
discovery, and LAK has a considerably greater focus on
leveraging human judgment. Even in research which combines
these two directions, this preference can be seen; EDM research
which leverages human judgment in many cases does so to
provide labels for classification, while LAK research which uses
automated discovery often does so in the service of informing
humans who make final decisions.
This difference is associated with another difference between the
two communities: the type of adaptation and personalization
typically supported by the two communities. In line with the
greater focus on automated discovery in EDM, EDM models are
more often used as the basis of automated adaptation, conducted
by a computer system such as an intelligent tutoring system. By
contrast, LAK models are more often designed to inform and
empower instructors and learners.
A third difference, and an important one, is the distinction
between holistic and reductionistic frameworks. It is much more
typical in EDM research to see research which reduces
phenomena to components and analyzing individual components
and relationships between them. The “discovery with models”
paradigm for EDM research discussed in [4] is a clear example of
this paradigm. By contrast, LAK researchers typically place a
stronger emphasis on attempting to understand systems as wholes,
in their full complexity. The debate between reductionist and
holistic paradigms has often paralyzed discussion between
education researchers from different “camps”; encouraging
discussion between EDM and LAK researchers is a key way to
prevent this common split from reducing what EDM and LAK
researchers can learn from one another.
Two other differences are in the most common origins and
methods of researchers in these two communities. Researchers’
origins tend to drive the preferred approaches discussed above,
and these preferred approaches in turn drive preferred methods.
Greater detail on these issues is given in Table 1.
Table 1: A brief comparison of the two fields
LAK EDM
Discovery Leveraging human
judgement is key;
automated discovery is
a tool to accomplish
this goal
Automated discovery
is key; leveraging
human judgment is a
tool to accomplish this
goal
Reduction &
Holism Stronger emphasis on
understanding systems
as wholes, in their full
complexity
Stronger emphasis on
reducing to
components and
analyzing individual
components and
relationships between
them
Origins LAK has stronger
origins in semantic
web, "intelligent
curriculum," outcome
prediction, and
systemic interventions
EDM has strong
origins in educational
software and student
modeling, with a
siginficiant
community in
predicting course
outcomes
Adapation &
Personalization Greater focus on
informing and
empowering
instructors and
learners
Greater focus on
automated adaption
(e.g. by the computer
with no human in the
loop)
Techniques &
Methods Social network
analysis, sentiment
analysis, influence
analytics, discourse
analysis, learner
success prediction,
concept analysis,
sensemaking models
Classification,
clustering, Bayesian
modeling, relationship
mining, discovery with
models, visualization
4. CALL FOR COMMUNICATION AND
COLLABORATION: EDM and LAK
There is a positive value to having different communities engaged
in how to exploit “big data” to improve education. In particular,
different standards and values for “good research” and “important
research” exist in each community, allowing creativity and
advancement that might not otherwise occur in a single,
monolithic research culture. For example, EDM researchers have
placed greater focus on issues of model generalizability (e.g.
multi-level cross-validation, replication across data sets). By
contrast, LAK researchers have placed greater focus on
addressing needs of multiple stakeholders with information drawn
from data. Each of these issues are important for the long-term
success of both fields, a key opportunity for the two communities
to learn from one another.
Friendly competition between the two communities will keep both
communities vigorous, and is generally beneficial for science.
This type of competition has occurred in the past, such as in the
split between the International Conference on the Learning
Sciences and the International Conference on Artificial
Intelligence in Education in 1992. Research networks are
increasingly global, as reflected by the multi-national executive
committees of IEDMS/EDM and SoLAR/LAK, but reflect
different nations to a significant degree. Hence, the existence of
both communities broadens the number of researchers working
and collaborating in the broader area of data-driven discovery in
education. At the same time, it is very important to keep
competition healthy. Healthy competition requires that both
communities disseminate their research to each other through their
respective conferences and journals to ensure awareness of
important ideas and advances occurring in the other community.
The two communities must communicate, in order to bring the
greatest possible benefits to educational practice and the science
of learning.
5. CONCLUSION
Given the overlaps in research interests, goals, and approaches
between the EDM and LAK communities, the authors of this
paper recommend that the executive committees of SoLAR and
IEDMS formalize approaches for dissemination of research and
enacting cross-community ties. A formal relationship will allow
each community to continue developing their specialized and
distinct research methods and tools, while simultaneously
increasing opportunities for collaborative research and sharing of
research findings between the communities.
This alliance would also strengthen our opportunities to influence
non-academic research and practice. A particular concern now
facing both EDM and LAK is the rapid development of analytics
and data mining tools by commercial organizations that do not
build off of either community’s expertise, algorithms, and
research results. To give one example, there is increasing
consensus in the EDM community that cross-validation needs to
be conducted at multiple levels (in particular the student level, but
also the classroom and lesson/unit levels). However, there is not
direct support for this goal in many of the data mining/analytics
tools now emerging. To the extent that EDM and LAK can jointly
articulate quality standards for research in this area, it may be
possible to more effectively communicate these standards to the
wider community of tool-developers and analytics practitioners,
as well as the broader research community. As such, both
communities would be facilitated in communicating their vision
for data-driven science and practice in the field of education.
Both the LAK and EDM communities anticipate that the impact
of data and analytics within education will be transformative at
primary, secondary, and post-secondary levels. An open,
transparent research environment is vital to driving forward this
important work. As connected, but distinct, research disciplines,
EDM and LAK can provide a strong voice and force for
excellence in research in this area, guiding policy makers,
administrators, educators, and curriculum developers, towards the
deployment of best practices in the upcoming era of data-driven
education.
6. ACKNOWLEDGMENTS
Our thanks to Jaclyn Ocumpaugh, and the anonymous reviewers
for their valuable input and assistance on this paper.
7. REFERENCES
[1] Mayer, M. (2009) Innovation at Google: The physics of data
[PARC forum] (11 August, 2009: 3:59 mark). Available from <
http://www.slideshare.net/PARCInc/innovation-at-google-the-
physics-of-data>
[2] Koedinger, K.R., Baker, R.S.J.d., Cunningham, K.,
Skogsholm, A., Leber, B., Stamper, J. (2010) A Data Repository
for the EDM community: The PSLC DataShop. In Romero, C.,
Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (Eds.)Handbook of
Educational Data Mining. Boca Raton, FL: CRC Press, pp.43-56.
[3] Romero, C., Ventura, S. (2007). Educational Data Mining: A
Survey from 1995 to 2005. Expert Systems with Applications, 33,
125-146.
[4] Baker, R.S.J.d., Yacef, K. (2009) The State of Educational
Data Mining in 2009: A Review and Future Visions. Journal of
Educational Data Mining, 1 (1), 3-17.
[5] Gasevic, D., Conole, G., Siemens, G., Long, P. (Eds). (2011)
LAK11: International Conference on Learning Analytics and
Knowledge, Banff, Canada, 27 February - 1 March 2011.
[6] Romero, C., Ventura, S. (2010) Educational Data Mining: A
Review of the State-of-the-Art. IEEE Transaction on Systems,
Man, and Cybernetics, Part C: Applications and Reviews. 40 (6),
601-618.
[7] Romero, C., Ventura, S., Pechenizky, M., Baker, R. (2010)
Handbook of Educational Data Mining. 2010. Editorial Chapman
and Hall/CRC Press, Taylor & Francis Group. Data Mining and
Knowledge Discovery Series.
... Today, in many schools, online platforms are preferred in both education and management processes and much data is processed there. However, since such environments only provide platforms for e-interaction, effective methods that can be used to predict student behavior through online learning environments have accelerated the emergence and use of supportive tools for educators and metacognitive triggers for students (Siemens & Baker, 2019). The process that begins with obtaining the necessary evidence to make decisions by meaningfully analyzing the relationship between students and learning environments has emerged as learning analytics (Elias, 2011). ...
Article
Full-text available
In an era where educational institutions are increasingly integrating technology to enhance teaching and learning, the effective use of learning analytics has emerged as a key strategy for informed decision-making. Technological leadership plays a critical role in fostering a culture that supports the adoption and implementation of these tools. The purpose of this study is to examine how learning analytics applications are integrated and utilized within the framework of technological leadership in public and private schools. Utilizing a phenomenological design and qualitative research approach, semi-structured interviews were conducted with 18 school administrators from diverse schools and educational levels in Istanbul over a period of six months. Data analysis was conducted using MAXQDA software. The findings indicate that the application of technological competencies and learning analytics remains limited, particularly in public schools, due to various factors. Notably, learning analytics tools are not fully understood, and there is a lack of effective software infrastructure to support their use, both of which emerge as significant barriers to their successful implementation. This study underscores the importance of enhancing technological leadership and infrastructure to facilitate the widespread adoption and effective use of learning analytics in schools.
... For instance, Singapore's implementation of the 'AI Learning Analytics' system enables customized education support by analyzing individual learning data [6]. Studies by Siemens and Baker [41] and Ferguson [42] suggest that the use of AI learning analytics can enhance student achievement by 15-20% and increase learning efficiency by over 30% when teachers apply AI data critically in lesson planning. Consequently, teachers play a crucial role in interpreting AI analysis and adjusting educational strategies. ...
Article
Full-text available
The rapid advancement of digital technologies and artificial intelligence (AI) is reshaping K-12 education, thereby emphasizing the growing need for AI Literacy among teachers. This study identifies key factors that influence teachers’ self-perceived AI Literacy and evaluates their impact on AI Literacy performance across various teaching phases, using Extreme Gradient Boosting (XGBoost) and Shapley Additive Explanations (SHAP). Data collected from 1172 K-12 teachers in South Korea were preprocessed and then split into an 80:20 training-to-testing ratio. To optimize model performance, Bayesian Optimization was used to fine-tune key hyperparameters, including the learning rate, maximum depth, subsample ratio, and number of boosting rounds. The model’s predictive accuracy was assessed using R², MSE, MAE, and RMSE. The optimized model achieved R² values of 0.8206 (Class Preparation), 0.8007 (Class Implementation), 0.8066 (Class Assessment), and 0.7746 (Utilizing Assessment Results). The results indicate that technical knowledge and AI Literacy skills are the most influential factors in the Class Preparation and Implementation Phases, while educational decision-making and ethical considerations play a crucial role in the Assessment and Utilizing Assessment Results Phases. Further, SHAP analysis highlights that both teachers’ and students’ perceived levels of AI learning significantly impact the adoption of AI Literacy, underscoring the importance of contextual factors in integrating AI within education. These findings emphasize the need for AI Literacy education that integrates technical competencies, pedagogical strategies, and ethical decision-making. This study provides empirical insights to support the development of teacher training programs and AI Literacy policies, ensuring the effective integration of AI in education.
... Experiential education in ICH study tours immerses learners in cultural contexts, facilitating deeper understanding and heritage transmission (Hoekstra & Abma, 2021). Siemens and Baker (2022) discussed the role of learning analysis in study tours, using data analysis techniques to extract insights on participant feedback and satisfaction. Technology-assisted assessment methods significantly improve data collection and analysis efficiency. ...
Article
Full-text available
In the process of intangible cultural heritage (ICH) preservation, educational practices, particularly educational tourism, have become a significant educational approach. This study aims to explore the relationship between English language proficiency and ICH educational tourism, specifically focusing on the methods used in ICH educational practices that contribute to fostering students' core English language competencies. Using questionnaires and in-depth interviews, as well as statistical software like SPSS and AMOS, a multi-dimensional evaluation model for junior high school English core competencies was constructed. The study analyzes the existing problems in the evaluation mechanism and proposes optimization strategies. The results indicate that ICH educational tourism has a synergistic effect in enhancing junior high school students' core English competencies. Significant positive correlations exist between the dimensions of English language proficiency, with the strongest correlation found between cultural awareness and learning ability. ICH educational tourism practices have significantly enhanced students' language ability, cultural awareness, thinking quality, and learning ability. Improving the evaluation mechanism for English core competencies not only contributes to the enhancement of educational tourism quality but also promotes the preservation and transmission of ICH.
Article
Researchers have extensively explored learning analytics in online courses, primarily focusing on linear course structures where students progress sequentially through lessons and assessments. However, non-linear courses, which allow students to complete tasks in any order, present unique challenges for learning analytics due to the variability in course progression among students. This study proposes a method for applying learning analytics to non-linear, self-paced MOOC-style courses, addressing early performance prediction and online learning pattern detection. The novelty of our approach lies in introducing a personalized feature aggregation that adapts to each student’s progress rather than being defined at fixed timelines. We evaluated three types of features—engagement, behavior, and performance—using data from a non-linear large-scale Moodle course designed to prepare high school students for a public university entrance exam. Our approach predicted early student performance, achieving an F1-score of 0.73 at a 20% cumulative weight assessment. Feature importance analysis revealed that performance and behavior were the strongest predictors, while engagement features, such as time spent on educational resources, also played a significant role. In addition to performance prediction, we conducted a clustering analysis that identified four distinct online learning patterns recurring across various cumulative weight assessments. These patterns provide valuable insights into student behavior and performance and have practical implications, enabling educators to deliver more personalized feedback and targeted interventions to meet individual student needs.
Chapter
This chapter explores the impact of online and blended learning on teacher preparation, highlighting its historical evolution, current trends, and future directions. The chapter examines how these learning models influence curriculum design, teaching practices, skill development, collaboration, and assessment in teacher education programs. It also addresses the challenges and limitations, such as technological disparities, lack of hands-on experience, and maintaining engagement in virtual environments. Emerging trends like personalized learning through AI, immersive technologies, global collaboration, and lifelong learning are also discussed, showcasing their potential to reshape teacher training. The chapter concludes by offering recommendations for enhancing online and blended teacher preparation programs, emphasizing the need for equity, technological access, and ongoing professional development. These recommendations aim to ensure that future educators are well-prepared to meet the diverse needs of 21st-century classrooms.
Article
As a data-driven analysis and decision-making tool, student portraits have gained significant attention in education management and personalized instruction. This research systematically explores the construction process of student portraits by integrating knowledge graph technology with advanced data analytics, including clustering, predictive modelling, and natural language processing. It then examines the portraits’ applications in personalized learning, such as student-centric adaptation of content and paths, and personalized teaching, especially the educator-driven instructional adjustments. Through case studies and quantitative analysis of multimodal datasets, including structured academic records, unstructured behavioural logs, and socio-emotional assessments, the research demonstrates how student portraits enable academic early warnings, adaptive learning path design, and equitable resource allocation. The findings provide actionable insights and technical frameworks for implementing precision education.
Article
Full-text available
Educational data mining (EDM) is an emerging interdisciplinary research area that deals with the development of methods to explore data originating in an educational context. EDM uses computational approaches to analyze educational data in order to study educational questions. This paper surveys the most relevant studies carried out in this field to date. First, it introduces EDM and describes the different groups of user, types of educational environments, and the data they provide. It then goes on to list the most typical/common tasks in the educational environment that have been resolved through data-mining techniques, and finally, some of the most promising future lines of research are discussed.
Article
Full-text available
Currently there is an increasing interest in data mining and educational systems, making educational data mining as a new growing research community. This paper surveys the application of data mining to traditional educational systems, particular web-based courses, well-known learning content management systems, and adaptive and intelligent web-based educational systems. Each of these systems has different data source and objectives for knowledge discovering. After preprocessing the available data in each case, data mining techniques can be applied: statistics and visualization; clustering, classification and outlier detection; association rule mining and pattern mining; and text mining. The success of the plentiful work needs much more specialized work in order for educational data mining to become a mature area.
Book
Handbook of Educational Data Mining (EDM) provides a thorough overview of the current state of knowledge in this area. The first part of the book includes nine surveys and tutorials on the principal data mining techniques that have been applied in education. The second part presents a set of 25 case studies that give a rich overview of the problems that EDM has addressed. Researchers at the Forefront of the Field Discuss Essential Topics and the Latest Advances With contributions by well-known researchers from a variety of fields, the book reflects the multidisciplinary nature of the EDM community. It brings the educational and data mining communities together, helping education experts understand what types of questions EDM can address and helping data miners understand what types of questions are important to educational design and educational decision making. Encouraging readers to integrate EDM into their research and practice, this timely handbook offers a broad, accessible treatment of essential EDM techniques and applications. It provides an excellent first step for newcomers to the EDM community and for active researchers to keep abreast of recent developments in the field.
Innovation at Google: The physics of data [PARC forum:59 mark) Available from < http://www.slideshare.net/PARCInc/innovation-at-google-the-physics-of-data>
  • M Mayer
. Mayer, M. (2009) Innovation at Google: The physics of data [PARC forum] (11 August, 2009: 3:59 mark). Available from < http://www.slideshare.net/PARCInc/innovation-at-google-the-physics-of-data>
A Data Repository for the EDM community: The PSLC DataShopHandbook of Educational Data Mining
  • K R Koedinger
  • R S J Baker
  • K Cunningham
  • A Skogsholm
  • B Leber
  • J Stamper
Koedinger, K.R., Baker, R.S.J.d., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J. (2010) A Data Repository for the EDM community: The PSLC DataShop. In Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (Eds.)Handbook of Educational Data Mining. Boca Raton, FL: CRC Press, pp.43-56.
Innovation at Google: The physics of data
  • M Mayer
Mayer, M. (2009) Innovation at Google: The physics of data [PARC forum] (11 August, 2009: 3:59 mark). Available from < http://www.slideshare.net/PARCInc/innovation-at-google-thephysics-of-data>
Innovation at Google: The physics of data {PARC forum}
  • M Mayer
  • Mayer M.