Content uploaded by Robin Holding Kay
Author content
All content in this area was uploaded by Robin Holding Kay on Nov 21, 2019
Content may be subject to copyright.
EVALUATING THE LEARNING, DESIGN AND ENGAGEMENT VALUE
OF MOBILE APPLICATIONS: THE MOBILE APP EVALUATION
SCALE
R. Kay, A. LeSage, D. Tepylo
University of Ontario Institute of Technology (CANADA)
Abstract
As of 2017, an estimated 750,000 apps were available in the domain of education. Consequently,
teachers have a monumental task in evaluating and selecting effective educational apps to be used in
their classrooms. A number of studies have proposed, discussed and examined frameworks or
classification schemes for evaluating mobile apps, but to date, no studies have developed and tested a
scale for evaluating the quality of educational mobile apps. The purpose of the current study was to
develop a scale to evaluate the design (n=4 items), engagement (n=4 items) and learning value (n=5
items) of educational mobile apps. The scale was tested with 722 grade 7 to 10 students (female = 339,
male = 382), 33 teachers (female=25, male=8), and 32 unique mobile apps focusing on mathematics
and science. The analysis revealed that the Mobile App Evaluation Scale (MAES) demonstrated good
internal reliability and construct validity for each of the three constructs. Limited support for convergent
validity and predictive validity was observed.
Keywords: mobile apps, scale, evaluation, secondary school, mathematics, science, STEM.
1 INTRODUCTION
An educational app is a software application that works on a mobile device and is designed to support
learning [1,2]. These apps have considerable potential for supporting, teaching, learning and
achievement [3]. As of 2017, over three-quarters of a million apps, free and paid, were available to
teachers [4, 5], making it exceedingly difficult for teachers to select the most effective tools.
Since 2011, at least 11 papers have proposed classification frameworks for apps in general education
[6-8], mathematics [9-12], science [13], augmented reality [14], and higher education [15]. Over 30
distinct categories have been proposed making it challenging and somewhat confusing for educators to
efficiently choose apps for specific educational needs. Other key problem areas for categorization
systems put forth include the absence of strong theoretical grounding [6, 8, 10, 11, 12], dated
frameworks [8], limited analysis of classification schemes [13-16], insufficient category descriptions
[7,8,10,11] and dated examples of educational apps [6, 8, 9, 11, 12].
Of particular concern with previous educational app classification schemes is the absence of a reliable
and valid metric [6,8,9,10,11,14,15]. Furthermore, only one of the 11 app categorization papers [13]
included prior research to support the design and creation of their app categories.
The purpose of the current study, then, was to develop and assess a scale created to evaluate the
design, engagement, and learning value of educational mobile apps. The framework for this scale
focusses on three key areas: design, engagement and learning. This framework was developed and
tested previously on learning objects, but not mobile apps [17-19]
2 METHODOLOGY
2.1 Participants
After obtaining consent from their parents, 722 (382 females, 339 males, 1 other) students in grade 7
(n=191), grade 8 (n=142), grade 9 (n=346) and grade 10 (n=43) participated in this study, ranging in
age from 11 to 17 years old (M= 13.4, SD=1.0). Students had a mean score of 17.8 out of 21 (SD=3.2)
on a three item Likert scale assessing comfort level with computers (r=0.81). Thirty-three mathematics
(n=15) and science (n=18) teachers, (25 females, 8 males) with 5 to 23 years of teaching experience
(M= 6.3, SD=6.4) participated in the study.
Proceedings of ICERI2019 Conference
11th-13th November 2019, Seville, Spain
ISBN: 978-84-09-14755-7
1103
2.2 Context, Procedure and Data Collection
Twenty-six unique mathematics or science-based apps were used in 33 classrooms. Each educational
app was used for 30 to 90 minutes. The apps were used by students working in pairs (n=17, 52%),
students working on their own (n=9, 27%), or teachers demonstrating the app in front of the class (n=7,
21%). Most teachers (n=24, 73%) followed the pre-made lesson plan that came with the educational
app.
Student learning performance was determined by pre- and post-tests developed by the instructor or
provided by the educational app. Students were given a pre-test before they used an app and a post-
test after. After completing the post-test, students were asked to complete a Likert scale survey,
grounded in the work by Kay & Knaack [17-19]. The Mobile App Evaluation Scale for students (MAES-
S) consisted of 13, seven-point Likert scale items focusing on app design (n=4 items), engagement (n=4
items), and learning (n=5 items). Teachers also assessed the apps using the Mobile App Evaluation
Scale for teachers (MAES-T) based on Kay & Knaack’s preliminary research [20]. The MAES-T was
comprised of 11, seven-point Likert scale items focusing on app design (n=3 items), engagement (n=4
items), and learning (n=4 items).
2.3 Data Analysis
To establish the reliability and validity of the MAES-S, the following tests were conducted:
- reliability for each of the three MAES-S constructs (Person internal reliability coefficient);
- construct validity (factor analysis of the MAES-S scale items) and correlations among constructs)
- convergent validity (correlations between MAES-S and MAES-T constructs)
- predictive validity (correlation between the three MAES-S constructs and learning performance)
3 RESULTS
3.1 Overview
The main purpose of this study was to develop and assess the reliability and validity of the Mobile App
Evaluation Scale for students (MAES-S). Both students and teacher perspectives were assessed to
establish a more comprehensive evaluation scale. The reliability and validity metrics are discussed
below.
3.2 Internal Reliability
The internal reliability metrics for the MAES-S constructs based on Cronbach’s alpha were r=0.85 for
design, r= 0.91 for engagement, and r=0.91 for learning (Table 1). These values are considered
acceptable for constructs developed in the domain of social sciences (Kline, 1999; Nunnally, 1978).
Table 1. Descriptions of MAES-S Scale (n=715).
Construct
No. of Items
Possible Range
Mean Score (SD)
Internal Reliability
Design
4
7 to 28
21.6 (4.5)
r=0.85
Engagement
4
7 to 28
19.7 (5.8)
r=0.91
Learning
5
7 to 35
25.5 (6.5)
r=0.91
3.3 Construct Validity
3.3.1 Principal Components Analysis
A principal components analysis was done to explore whether the MAES-S constructs (design,
engagement and learning) were three distinct factors. The Kaiser–Meyer–Olkin measure of sampling
adequacy (0.936) and Bartlett’s test of sphericity (p < .001) indicated that the sample size was
acceptable. The principal components analysis was set to extract three factors (Table 2). The resulting
rotation supported the assumption that the MAES-S constructs were distinct.
1104
Table 2. Varimax rotated loading on MAES-S Scale.
Scale Item
Factor 1
Factor 2
Factor 3
L1 – Feedback helped learning
.814
L2 – Using app helped learning
.770
L3 – Graphics helped learning
.737
L4 – Overall app helped learning
.724
L5 – Helped review previous concepts
.576
E1 – Made learning fun
.828
E2 – Would like to use again
.803
E3 – Engaging
.789
E4 – Like overall theme
.743
D1 – Easy to Use
.850
D2 – Clear instructions
.826
D3 – Well Organized
.729
D4 – Help features were useful
.576
Factor
Eigen value
PCT of VAR
Cum PCT
1
7.68
59.1
59.1
2
1.20
9.2
68.3
3
0.83
6.4
74.7
3.3.2 Correlations Among MAES-S Constructs
The correlations among the design, engagement and learning constructs and the design were from
r=.61 to r=.74 (Table 3). The share variances, ranging from 37% to 55% were small enough to support
the assumption that each construct measured was distinct.
Table 3. Correlations Among of MAES-S Constructs.
Construct
Design
Engagement
Learning
Design
*
0.61
0.68
Engagement
*
0.74
Learning
*
3.4 Convergent Validity
Mean student perceptions of design were significantly correlated with teacher ratings of design (r=.19,
n=717, p < .001), mean student perceptions of engagement were significantly correlated with teacher
ratings of engagement (r=.16, n=717, p < .001) and mean student perceptions of learning were
significantly correlated with teacher ratings of learning (r=.19, n=714, p < .001). Overall, the correlations
were small and showed a moderate degree of consistency between student and teacher evaluations.
3.5 Predictive Validity
Increased learning performance, as measured by the difference between pre- and post-test scores, was
significantly and positively correlated with student perceptions of engagement (r=.09, n=591, p < .05)
and learning (r=.13, n=589, p < .01), but not design (r=.08, n=593, n.s.), however, these correlations
were small.
1105
4 CONCLUSIONS
This study evaluated the reliability and validity of a scale created to assess the quality of educational
mobile apps. Specifically, Kay & Knaack’s [18-20] framework for assessing the quality of learning
objects was used, focussing on three constructs: design, engagement and learning. Each of the
constructs appeared to be internal reliable with alpha coefficients between 0.85 and 0.90. Furthermore,
each of the constructs appeared to measure a distinct quality based on the principal components factor
analysis. Construct ratings were significantly correlated with each other, but relatively low shared
variance also supported the assumption that the constructs were unique.
Convergently validity for the MAES-S was not strongly supported. Correlations between student and
teacher perceptions of design, engagement and learning were significant but quite low. In other words,
students and teachers were not aligned with respect to their ratings of these three constructs. Since
the impetus for created the MAES-S was to help teachers select high-quality educational apps for
student learning, the MAES-S may need to be revised. At the very least, qualitative data, perhaps in
the form of interviews, needs to be collected and analysed to determine why students and teachers rate
mobiles apps differently.
Perhaps the most concerning finding is that correlations among learning performance and student
ratings of design, engagement and learning, while significant, were quite small. Ideally, one would want
a scale for selecting high quality educational mobile apps to predict learning success. However, the
pre- and post-tests used for the current study were pre-designed by the creators of the educational apps
and typically used a multiple-choice format. Future studies should use a more rigorous and varied set
of learning assessment metrics specifically aligned with the intended learning goals for using the
educational mobile apps in question. Additionally, interview data examining students perceptions of
learning might offer insights into how a specific mobile app is or is not supporting learning.
Creating MAES-S is a first step to developing a reliable and valid metric for assessing the quality of
educational mobile apps. Future studies need to focus on refining the scale items. We argue that
interview data could significantly improve the revision process by collecting data about why students
and teachers differ in their assessment of app quality and what students are learning when they use
mobile apps.
REFERENCES
[1] E. C. Bouck, R. Satsangi, & S. Flanagan, “Focus on inclusive education: Evaluating apps for
students with disabilities,” Success, Childhood Education, vol. 92, no. 4, pp. 324-328., 2016.
[2] S. Papadakis, M. Kalogiannakis, & N. Zaranis, “Designing and creating an educational app rubric
for preschool teachers,” Education and Information Technology, pp. 1-19, 2017.
[3] F. Martin & J. Ertzberger, J., “Here and now mobile learning: An experimental study on the use of
mobile technology,” Computers & Education, vol. 68, pp. 76-85, 2013.
[4] Statista, “Compound annual growth rate of free and paid education app downloads worldwide from
2012 to 2017,” 2018. Retrieved from https://www.statista.com/statistics/273971/cagr-of-free-and-
paid-education-app-downloads-worldwide/
[5] Technavio, “Global education apps market- market study 2015-2019”, 2015. Retrieved from
http://www.reportsnreports.com/reports/426935-global-education-apps-market-market-study-
2015-2019.html
[6] O. Chergui, A. Begdouri, D. & Groux-Leclet, “A classification of educational mobile use for learners
and teachers,” International Journal of Information and Education Technology, vol. 7, no. 5, pp. 324-
330, 2017.
[7] T. Cherner, J. Dix, J., & C. Lee, “Cleaning up that mess: A framework for classifying educational
apps,” Contemporary Issues in Technology and Teacher Education, vol. 14, no. 2, pp. 158-193,
2014.
[8] O. T. Murray & N. R. Olcese, “Teaching and learning with iPads, ready or not?” TechTrends, vol.
55, no. 6, pp. 42-48, 2011.
1106
[9] S. Alon, H. An & D. Fuentes, “Teaching mathematics with Tablet PCs: A professional development
program targeting primary school teachers” in Tablets in K-12 Education: Integrated Experiences
and Implications (G. Christou, S. Maromoustakos, K. Mavrou, M. Meletiou-Mavrothers, & G.
Stylianou eds.), pp. 175-197, Hershey, PA: IGI Global, 2015.
[10] M. Ebner, “Mobile applications for math education – how should they be done?” in Mobile Learning
and Mathematics. Foundations, Design, and Case Studies (H. Crompton & J. Traxler eds.), pp. 20-
32, New York: Routledge, 2015.
[11] N. Grandgenett, J. Harris & M. Hofer, M., “An activity-based approach to technology integration in
the mathematics classroom,” NCSM Journal of Mathematics Education Leadership, vol. 13, no. 1,
pp. 19–28, 2011.
[12] B. Handal, C. Campbell, M. Cavanagh & P. Petocz, “Characterising the perceived value of
mathematics educational apps in preservice teachers,” Mathematics Education Research Journal,
vol. 28, no. 1, pp. 199–221, 2016.
[13] J. M. Zydney & Z. Warner, Z., “Mobile apps for science learning: Review of research,” Computers
& Education, vol. 94, pp. 1-17, 2016.
[14] P. M. O’Shea & J. B. Elliot, “Augmented reality in education: An exploration and analysis of currently
available educational apps” in Immersive Learning Research Network. iLRN 2016. Communications
in Computer and Information Science, Vol 621 (C. Allison, L. Morgado, J. Pirker, D. Beck, J. Richter
& C. Gütl eds.), pp. 147-159, Switzerland: Springer, 2016.
[15] E. Pechenkina, “Developing a typology of mobile apps in Higher Education: A national case-study,”
Australasian Journal of Educational Technology, vol. 33, no. 4, pp. 134-146, 2017.
[16] T. Orehovacki, G. Bubas & A. Kovacic, “Taxonomy of web 2.0 applications with educational
potential” in Transformation in Teaching: Social Media Strategies in Higher Education (C. Cheal, J.
Coughlin & S. Moore eds.), pp. 43-72, Santa Rosa, CA: Informing Science Press, 2012.
[17] R. H. Kay, “Evaluating learning, design, and engagement in web-based learning tools (WBLTs):
The WBLT Evaluation Scale,” Computers in Human Behaviour, vol. 27, no. 5, pp. 1849-1856, 2011.
[18] R. H. Kay & L. Knaack, L., “Assessing learning, quality and engagement in learning objects: the
learning object Evaluation scale for students (LOES-S),” Education Technology Research and
Development, vol. 57, no. 2, pp. 147-168, 2009.
[19] R. H. Kay & L. Knaack, “A multi-component model for assessing learning objects: The learning
object evaluation metric (LOEM),” Australasian Journal of Educational Technology, vol. 24, no. 5,
pp. 574-591, 2008.
[20] R. H. Kay, L. Knaack, L. & D. Petrarca, “Exploring teacher perceptions of web-based learning tools,”
Interdisciplinary Journal of E-Learning and Learning Objects, vol. 5, pp. 27-50, 2009.
1107