Conference PaperPDF Available

Evaluating the Learning, Design and Engagement Value of Mobile Applications: The Mobile App Evaluation Scale


Abstract and Figures

As of 2017, an estimated 750,000 apps were available in the domain of education. Consequently, teachers have a monumental task in evaluating and selecting effective educational apps to be used in their classrooms. A number of studies have proposed, discussed and examined frameworks or classification schemes for evaluating mobile apps, but to date, no studies have developed and tested a scale for evaluating the quality of educational mobile apps. The purpose of the current study was to develop a scale to evaluate the design (n=4 items), engagement (n=4 items) and learning value (n=5 items) of educational mobile apps. The scale was tested with 722 grade 7 to 10 students (female = 339, male = 382), 33 teachers (female=25, male=8), and 32 unique mobile apps focusing on mathematics and science. The analysis revealed that the Mobile App Evaluation Scale (MAES) demonstrated good internal reliability and construct validity for each of the three constructs. Limited support for convergent validity and predictive validity was observed.
Content may be subject to copyright.
R. Kay, A. LeSage, D. Tepylo
University of Ontario Institute of Technology (CANADA)
As of 2017, an estimated 750,000 apps were available in the domain of education. Consequently,
teachers have a monumental task in evaluating and selecting effective educational apps to be used in
their classrooms. A number of studies have proposed, discussed and examined frameworks or
classification schemes for evaluating mobile apps, but to date, no studies have developed and tested a
scale for evaluating the quality of educational mobile apps. The purpose of the current study was to
develop a scale to evaluate the design (n=4 items), engagement (n=4 items) and learning value (n=5
items) of educational mobile apps. The scale was tested with 722 grade 7 to 10 students (female = 339,
male = 382), 33 teachers (female=25, male=8), and 32 unique mobile apps focusing on mathematics
and science. The analysis revealed that the Mobile App Evaluation Scale (MAES) demonstrated good
internal reliability and construct validity for each of the three constructs. Limited support for convergent
validity and predictive validity was observed.
Keywords: mobile apps, scale, evaluation, secondary school, mathematics, science, STEM.
An educational app is a software application that works on a mobile device and is designed to support
learning [1,2]. These apps have considerable potential for supporting, teaching, learning and
achievement [3]. As of 2017, over three-quarters of a million apps, free and paid, were available to
teachers [4, 5], making it exceedingly difficult for teachers to select the most effective tools.
Since 2011, at least 11 papers have proposed classification frameworks for apps in general education
[6-8], mathematics [9-12], science [13], augmented reality [14], and higher education [15]. Over 30
distinct categories have been proposed making it challenging and somewhat confusing for educators to
efficiently choose apps for specific educational needs. Other key problem areas for categorization
systems put forth include the absence of strong theoretical grounding [6, 8, 10, 11, 12], dated
frameworks [8], limited analysis of classification schemes [13-16], insufficient category descriptions
[7,8,10,11] and dated examples of educational apps [6, 8, 9, 11, 12].
Of particular concern with previous educational app classification schemes is the absence of a reliable
and valid metric [6,8,9,10,11,14,15]. Furthermore, only one of the 11 app categorization papers [13]
included prior research to support the design and creation of their app categories.
The purpose of the current study, then, was to develop and assess a scale created to evaluate the
design, engagement, and learning value of educational mobile apps. The framework for this scale
focusses on three key areas: design, engagement and learning. This framework was developed and
tested previously on learning objects, but not mobile apps [17-19]
2.1 Participants
After obtaining consent from their parents, 722 (382 females, 339 males, 1 other) students in grade 7
(n=191), grade 8 (n=142), grade 9 (n=346) and grade 10 (n=43) participated in this study, ranging in
age from 11 to 17 years old (M= 13.4, SD=1.0). Students had a mean score of 17.8 out of 21 (SD=3.2)
on a three item Likert scale assessing comfort level with computers (r=0.81). Thirty-three mathematics
(n=15) and science (n=18) teachers, (25 females, 8 males) with 5 to 23 years of teaching experience
(M= 6.3, SD=6.4) participated in the study.
Proceedings of ICERI2019 Conference
11th-13th November 2019, Seville, Spain
ISBN: 978-84-09-14755-7
2.2 Context, Procedure and Data Collection
Twenty-six unique mathematics or science-based apps were used in 33 classrooms. Each educational
app was used for 30 to 90 minutes. The apps were used by students working in pairs (n=17, 52%),
students working on their own (n=9, 27%), or teachers demonstrating the app in front of the class (n=7,
21%). Most teachers (n=24, 73%) followed the pre-made lesson plan that came with the educational
Student learning performance was determined by pre- and post-tests developed by the instructor or
provided by the educational app. Students were given a pre-test before they used an app and a post-
test after. After completing the post-test, students were asked to complete a Likert scale survey,
grounded in the work by Kay & Knaack [17-19]. The Mobile App Evaluation Scale for students (MAES-
S) consisted of 13, seven-point Likert scale items focusing on app design (n=4 items), engagement (n=4
items), and learning (n=5 items). Teachers also assessed the apps using the Mobile App Evaluation
Scale for teachers (MAES-T) based on Kay & Knaack’s preliminary research [20]. The MAES-T was
comprised of 11, seven-point Likert scale items focusing on app design (n=3 items), engagement (n=4
items), and learning (n=4 items).
2.3 Data Analysis
To establish the reliability and validity of the MAES-S, the following tests were conducted:
- reliability for each of the three MAES-S constructs (Person internal reliability coefficient);
- construct validity (factor analysis of the MAES-S scale items) and correlations among constructs)
- convergent validity (correlations between MAES-S and MAES-T constructs)
- predictive validity (correlation between the three MAES-S constructs and learning performance)
3.1 Overview
The main purpose of this study was to develop and assess the reliability and validity of the Mobile App
Evaluation Scale for students (MAES-S). Both students and teacher perspectives were assessed to
establish a more comprehensive evaluation scale. The reliability and validity metrics are discussed
3.2 Internal Reliability
The internal reliability metrics for the MAES-S constructs based on Cronbach’s alpha were r=0.85 for
design, r= 0.91 for engagement, and r=0.91 for learning (Table 1). These values are considered
acceptable for constructs developed in the domain of social sciences (Kline, 1999; Nunnally, 1978).
Table 1. Descriptions of MAES-S Scale (n=715).
No. of Items
Possible Range
Mean Score (SD)
Internal Reliability
7 to 28
21.6 (4.5)
7 to 28
19.7 (5.8)
7 to 35
25.5 (6.5)
3.3 Construct Validity
3.3.1 Principal Components Analysis
A principal components analysis was done to explore whether the MAES-S constructs (design,
engagement and learning) were three distinct factors. The KaiserMeyerOlkin measure of sampling
adequacy (0.936) and Bartlett’s test of sphericity (p < .001) indicated that the sample size was
acceptable. The principal components analysis was set to extract three factors (Table 2). The resulting
rotation supported the assumption that the MAES-S constructs were distinct.
Table 2. Varimax rotated loading on MAES-S Scale.
Scale Item
Factor 1
Factor 2
Factor 3
L1 – Feedback helped learning
L2 – Using app helped learning
L3 – Graphics helped learning
L4 – Overall app helped learning
L5 Helped review previous concepts
E1 – Made learning fun
E2 – Would like to use again
E3 – Engaging
E4 – Like overall theme
D1 – Easy to Use
D2 – Clear instructions
D3 – Well Organized
D4 – Help features were useful
Eigen value
3.3.2 Correlations Among MAES-S Constructs
The correlations among the design, engagement and learning constructs and the design were from
r=.61 to r=.74 (Table 3). The share variances, ranging from 37% to 55% were small enough to support
the assumption that each construct measured was distinct.
Table 3. Correlations Among of MAES-S Constructs.
3.4 Convergent Validity
Mean student perceptions of design were significantly correlated with teacher ratings of design (r=.19,
n=717, p < .001), mean student perceptions of engagement were significantly correlated with teacher
ratings of engagement (r=.16, n=717, p < .001) and mean student perceptions of learning were
significantly correlated with teacher ratings of learning (r=.19, n=714, p < .001). Overall, the correlations
were small and showed a moderate degree of consistency between student and teacher evaluations.
3.5 Predictive Validity
Increased learning performance, as measured by the difference between pre- and post-test scores, was
significantly and positively correlated with student perceptions of engagement (r=.09, n=591, p < .05)
and learning (r=.13, n=589, p < .01), but not design (r=.08, n=593, n.s.), however, these correlations
were small.
This study evaluated the reliability and validity of a scale created to assess the quality of educational
mobile apps. Specifically, Kay & Knaack’s [18-20] framework for assessing the quality of learning
objects was used, focussing on three constructs: design, engagement and learning. Each of the
constructs appeared to be internal reliable with alpha coefficients between 0.85 and 0.90. Furthermore,
each of the constructs appeared to measure a distinct quality based on the principal components factor
analysis. Construct ratings were significantly correlated with each other, but relatively low shared
variance also supported the assumption that the constructs were unique.
Convergently validity for the MAES-S was not strongly supported. Correlations between student and
teacher perceptions of design, engagement and learning were significant but quite low. In other words,
students and teachers were not aligned with respect to their ratings of these three constructs. Since
the impetus for created the MAES-S was to help teachers select high-quality educational apps for
student learning, the MAES-S may need to be revised. At the very least, qualitative data, perhaps in
the form of interviews, needs to be collected and analysed to determine why students and teachers rate
mobiles apps differently.
Perhaps the most concerning finding is that correlations among learning performance and student
ratings of design, engagement and learning, while significant, were quite small. Ideally, one would want
a scale for selecting high quality educational mobile apps to predict learning success. However, the
pre- and post-tests used for the current study were pre-designed by the creators of the educational apps
and typically used a multiple-choice format. Future studies should use a more rigorous and varied set
of learning assessment metrics specifically aligned with the intended learning goals for using the
educational mobile apps in question. Additionally, interview data examining students perceptions of
learning might offer insights into how a specific mobile app is or is not supporting learning.
Creating MAES-S is a first step to developing a reliable and valid metric for assessing the quality of
educational mobile apps. Future studies need to focus on refining the scale items. We argue that
interview data could significantly improve the revision process by collecting data about why students
and teachers differ in their assessment of app quality and what students are learning when they use
mobile apps.
[1] E. C. Bouck, R. Satsangi, & S. Flanagan, “Focus on inclusive education: Evaluating apps for
students with disabilities,” Success, Childhood Education, vol. 92, no. 4, pp. 324-328., 2016.
[2] S. Papadakis, M. Kalogiannakis, & N. Zaranis, Designing and creating an educational app rubric
for preschool teachers,” Education and Information Technology, pp. 1-19, 2017.
[3] F. Martin & J. Ertzberger, J., Here and now mobile learning: An experimental study on the use of
mobile technology,” Computers & Education, vol. 68, pp. 76-85, 2013.
[4] Statista, “Compound annual growth rate of free and paid education app downloads worldwide from
2012 to 2017,” 2018. Retrieved from
[5] Technavio, Global education apps market- market study 2015-2019”, 2015. Retrieved from
[6] O. Chergui, A. Begdouri, D. & Groux-Leclet, “A classification of educational mobile use for learners
and teachers,” International Journal of Information and Education Technology, vol. 7, no. 5, pp. 324-
330, 2017.
[7] T. Cherner, J. Dix, J., & C. Lee, “Cleaning up that mess: A framework for classifying educational
apps,” Contemporary Issues in Technology and Teacher Education, vol. 14, no. 2, pp. 158-193,
[8] O. T. Murray & N. R. Olcese, Teaching and learning with iPads, ready or not?TechTrends, vol.
55, no. 6, pp. 42-48, 2011.
[9] S. Alon, H. An & D. Fuentes, “Teaching mathematics with Tablet PCs: A professional development
program targeting primary school teachersin Tablets in K-12 Education: Integrated Experiences
and Implications (G. Christou, S. Maromoustakos, K. Mavrou, M. Meletiou-Mavrothers, & G.
Stylianou eds.), pp. 175-197, Hershey, PA: IGI Global, 2015.
[10] M. Ebner, “Mobile applications for math education how should they be done?” in Mobile Learning
and Mathematics. Foundations, Design, and Case Studies (H. Crompton & J. Traxler eds.), pp. 20-
32, New York: Routledge, 2015.
[11] N. Grandgenett, J. Harris & M. Hofer, M., “An activity-based approach to technology integration in
the mathematics classroom,” NCSM Journal of Mathematics Education Leadership, vol. 13, no. 1,
pp. 1928, 2011.
[12] B. Handal, C. Campbell, M. Cavanagh & P. Petocz, “Characterising the perceived value of
mathematics educational apps in preservice teachers,” Mathematics Education Research Journal,
vol. 28, no. 1, pp. 199221, 2016.
[13] J. M. Zydney & Z. Warner, Z., Mobile apps for science learning: Review of research,” Computers
& Education, vol. 94, pp. 1-17, 2016.
[14] P. M. O’Shea & J. B. Elliot, “Augmented reality in education: An exploration and analysis of currently
available educational apps” in Immersive Learning Research Network. iLRN 2016. Communications
in Computer and Information Science, Vol 621 (C. Allison, L. Morgado, J. Pirker, D. Beck, J. Richter
& C. Gütl eds.), pp. 147-159, Switzerland: Springer, 2016.
[15] E. Pechenkina, Developing a typology of mobile apps in Higher Education: A national case-study,”
Australasian Journal of Educational Technology, vol. 33, no. 4, pp. 134-146, 2017.
[16] T. Orehovacki, G. Bubas & A. Kovacic, “Taxonomy of web 2.0 applications with educational
potential” in Transformation in Teaching: Social Media Strategies in Higher Education (C. Cheal, J.
Coughlin & S. Moore eds.), pp. 43-72, Santa Rosa, CA: Informing Science Press, 2012.
[17] R. H. Kay, “Evaluating learning, design, and engagement in web-based learning tools (WBLTs):
The WBLT Evaluation Scale,” Computers in Human Behaviour, vol. 27, no. 5, pp. 1849-1856, 2011.
[18] R. H. Kay & L. Knaack, L., Assessing learning, quality and engagement in learning objects: the
learning object Evaluation scale for students (LOES-S),” Education Technology Research and
Development, vol. 57, no. 2, pp. 147-168, 2009.
[19] R. H. Kay & L. Knaack, “A multi-component model for assessing learning objects: The learning
object evaluation metric (LOEM),” Australasian Journal of Educational Technology, vol. 24, no. 5,
pp. 574-591, 2008.
[20] R. H. Kay, L. Knaack, L. & D. Petrarca, Exploring teacher perceptions of web-based learning tools,”
Interdisciplinary Journal of E-Learning and Learning Objects, vol. 5, pp. 27-50, 2009.
... rt, accessibility, design, purpose, usability, stability, portability, multimodal options, functionality, communication, performance, gamification, interoperability, navigations, working mode, design elements, dependency on technology, and social interactions (Baran et al., 2017;Bentrop, 2014;T. Chen et al., 2019;Green et al., 2014;Israelson, 2015;R. Kay et al., 2019;Lubniewski et al., 2017;Martín-Monje et al., 2014;McQuiggan et al., 2015;Reeves, 1994;Rosell-Aguilar, 2017;Schrock, 2011;Tahir & Arif, 2014). The frequently cited evaluation criteria were technology accessibility, design, usability, functionality, gamification, and communication regarding category A. On the other hand, we refined some of ...
... The list of such criteria is shown as follows: customization, clear instruction, help options, gaining attention, authenticity, cost and ethics, interactive elements, content quality, critical thinking, consistent learning, relevance, accuracy, curriculumoriented, appropriate language, and modalities based contents (Baran et al., 2017;Bentrop, 2014;X. Chen, 2016;Cherner et al., 2016;Handal et al., 2014;Israelson, 2015;Kay et al., 2019;J. S. Lee & Kim, 2015;McQuiggan et al., 2015;More & Travers, 2013;Papadakis et al., 2020;Wang et al., 2019). ...
Full-text available
Aim/Purpose The goal of this writing was not to promote any particular assessment tool. We aimed to critically explore the numerous assessment techniques that are accessible to app stakeholders with an emphasis on their strengths, shortcomings, and trustworthiness. We underline the importance of a relatively good and research-based tool that can readily assess the existing Learning Apps (LAs). Background A thorough and comprehensive literature review of LAs and their assessment tools was the primary goal of reporting the state of the art through this SLR (Systematic Literature Review) writing.
Full-text available
Mobile applications (apps) are used in higher education (HE) in a variety of ways, including as learning tools, study organisers, for marketing, and recruitment of new students. Purposed with easing student transition into university life, organiser apps have a capacity to assist students with various aspects of university experience, freeing up time and energy for study, while apps used as learning tools can help students solidify the content of lectures, self-test their knowledge of the subject, and collaborate with peers. Despite the proliferation of HE apps, there is still no systematic understanding of this field, with a number of important questions remaining unanswered, such as what types of apps are most commonly found in HE, what their complex uses are, and how their affordances and functionalities are deployed by universities and students. This study addresses this gap. After analysing 177 apps affiliated with Australian universities, a typology of HE apps is proposed. Study management and navigation apps emerge as the most common types of apps offered to students, with augmented and/or virtual reality apps forming another key category. New insights are offered pertaining to the complex terrain of HE mobile apps, and problematic areas arising from this research, such as safety, student support, privacy, and equity, are discussed.
Full-text available
International studies indicate that the use of smart mobile devices and their accompanying educational applications (apps) can revolutionize young children’s learning experiences. Although there is a vast array of educational apps for preschoolers, they are not actually educational in their majority. In this context, it is important for preschool teachers to be able to assess each app for its effectiveness in educational practice. To evaluate educational mobile apps, this paper presents a rubric (abbreviated as REVEAC) in four areas: contents, design, functionality, and technical quality, each having multiple aspects. In this paper, we discuss the known problems in using educational apps and we present a review of relevant literature, as well as the process of formulating REVEAC. Finally, the paper concludes with a brief discussion of REVEAC limitations and future work.
Full-text available
In recent years, mobile learning has been a fast growing concept. Through the great number of conducted experiments, many researchers and institutions recognize the potential of mobile technologies as a learning tool. However, a wide adoption of the concept itself is still challenging. In fact, one of the difficulties faced by a teacher is the choice of the appropriate tool and, more generally, the appropriate use of mobile for a specific educational context. In this paper, we propose a generic classification of educational mobile uses based on learning strategies paradigm, and taking into account both learners’ and teachers’ activities. We validate our findings by some statistics of the most downloaded educational mobile apps from the most popular mobile App Stores
Full-text available
Math education in elementary schools is a necessity. In this publication we introduce different math applications for iPhone and iPad developed by students at Graz University of Technology. Both, the technical as well as the pedagogical strategy of these apps are described. Furthermore, a close look at the HCI guidelines are taken and finally enhanced with some crucial facts that in principle an app is able to serve as a learning app for elementary school children. It can be summarized that the successful use of math apps in classroom is more than just a playing with the first app that comes along; it is about a careful design of a didactical approach based on an appropriate learning strategy.
Full-text available
Learning objects are interactive web-based tools that support the learning of specific concepts by enhancing, amplifying, and/or guiding the cognitive processes of learners. Research on the impact, effectiveness, and usefulness of learning objects is limited, partially because comprehensive, theoretically based, reliable, and valid evaluation tools are scarce, particularly in the K-12 environment. The purpose of the following study was to investigate a Learning Object Evaluation Scale for Students (LOES-S) based on three key constructs gleaned from 10 years of learning object research: learning, quality or instructional design, and engagement. Tested on over 1100 middle and secondary school students, the data generated using the LOES-S showed acceptable internal reliability, face validity, construct validity, convergent validity and predictive validity.
This study validated the semantic items of three related scales aimed at characterising the perceived worth of mathematics-education-related mobile applications (apps). The technological pedagogical content knowledge (TPACK) model was used as the conceptual framework for the analysis. Three hundred and seventy-three preservice students studying primary school education from two public and one private Australian universities participated in the study. The respondents examined three different apps using a purposively designed instrument in regard to either their explorative, productive or instructive instructional role. While construct validity could not be established due to a broad range of variability in responses implying a high degree of subjectivity in respondents’ judgments, the qualitative analysis was effective in establishing content validity.
This review examined articles on mobile apps for science learning published from 2007 to 2014. A qualitative content analysis was used to investigate the science mobile app research for its mobile app design, underlying theoretical foundations, and students' measured outcomes. This review found that mobile apps for science learning offered a number of similar design features, including technology-based scaffolding, location-aware functionality, visual/audio representations, digital knowledge-construction tools, digital knowledge-sharing mechanisms, and differentiated roles. Many of the studies cited a specific theoretical foundation, predominantly situated learning theory, and applied this to the design of the mobile learning environment. The most common measured outcome was students' basic scientific knowledge or conceptual understanding. A number of recommendations came out of this review. Future studies need to make use of newer, available technologies; isolate the testing of specific app features; and develop additional strategies around using mobile apps for collaboration. Researchers need to make more explicit connections between the instructional principles and the design features of their mobile learning environment in order to better integrate theory with practice. In addition, this review noted that stronger alignment is needed between the underlying theories and measured outcomes, and more studies are needed to assess students' higher-level cognitive outcomes, cognitive load, and skill-based outcomes such as problem solving. Finally, more research is needed on how science mobile apps can be used with more varied science topics and diverse audiences.