Conference PaperPDF Available

On the Automatic Assessment of Computational Thinking Skills: A Comparison with Human Experts

Authors:

Abstract and Figures

Programming and computational thinking skills are promoted in schools worldwide. However, there is still a lack of tools that assist learners and educators in the assessment of these skills. We have implemented an assessment tool, called Dr. Scratch, that analyzes Scratch projects with the aim to assess the level of development of several aspects of computational thinking. One of the issues to address in order to show its validity is to compare the (automatic) evaluations provided by the tool with the (manual) evaluations by (human) experts. In this paper we compare the assessments provided by Dr. Scratch with over 450 evaluations of Scratch projects given by 16 experts in computer science education. Our results show strong correlations between automatic and manual evaluations. As there is an ample debate among educators on the use of this type of tools, we discuss the implications and limitations, and provide recommendations for further research.
Content may be subject to copyright.
On the Automatic Assessment of
Computational Thinking Skills: A
Comparison with Human Experts
Jesús Moreno-León
Programamos & Universidad
Rey Juan Carlos
Seville, Spain
jesus.moreno@programamos.es
Casper Harteveld
Northeastern University
Boston, MA, USA
c.harteveld@neu.edu
Marcos Román-González
Universidad Nacional de
Educación a Distancia
Madrid, Spain
mroman@edu.uned.es
Gregorio Robles
Universidad Rey Juan Carlos
Fuenlabrada (Madrid), Spain
grex@gsyc.urjc.es
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s). Copyright is held by the
author/owner(s).
CHI’17 Extended Abstracts, May 6–11, 2017, Denver, CO, USA.
ACM ISBN 978-1-4503-4656-6/17/05.
http://dx.doi.org/10.1145/3027063.3053216
Abstract
Programming and computational thinking skills are pro-
moted in schools worldwide. However, there is still a lack of
tools that assist learners and educators in the assessment
of these skills. We have implemented an assessment tool,
called Dr. Scratch, that analyzes Scratch projects with the
aim to assess the level of development of several aspects
of computational thinking. One of the issues to address in
order to show its validity is to compare the (automatic) eval-
uations provided by the tool with the (manual) evaluations
by (human) experts. In this paper we compare the assess-
ments provided by Dr. Scratch with over 450 evaluations of
Scratch projects given by 16 experts in computer science
education. Our results show strong correlations between
automatic and manual evaluations. As there is an ample
debate among educators on the use of this type of tools, we
discuss the implications and limitations, and provide recom-
mendations for further research.
Author Keywords
Computational thinking; programming; assessment; Scratch
ACM Classification Keywords
K.3.2 [Computer and Information Science Education]: Com-
puter science education; D.2.4 [Software/Program Verifica-
tion]: Validation
Introduction
In the last years programming is being promoted in schools
worldwide [2, 9], as part of a movement that aims to pro-
mote Computational Thinking (CT) skills among young
learners [3]. CT is the process involved in formulating a
problem and expressing its solution so that a computer can
effectively carry it out. It is based on an iterative process
with three stages: formulation of a problem (abstraction),
expression of a solution (automation, which includes its im-
plementation by means of programming), and execution
and evaluation of a solution (analysis). The term computa-
tional thinking was first used by Papert [19] in the 1980s,
although it has been popularized recently by Wing [24].
Level of development for
each CT dimension as-
sessed by Dr. Scratch [16]
I. Logical Thinking:
1. If
2. If else
3. Logic operations
II. Data Representation:
1. Modifiers of object proper-
ties
2. Variables
3. Lists
III. User Interactivity:
1. Green flag
2. Keyboard, mouse, ask and
wait
3. Webcam, input sound
IV. Flow Control:
1. Sequence of blocks
2. Repeat, forever
3. Repeat until
V. Abstraction and Problem
Decomposition:
1. More than one script and
more than one sprite
2. Use of custom blocks
3. Use of ’clones’ (instances
of sprites)
Many efforts have been placed in the creation of tools to
teach these skills, such as Alice [7], Kodu [14] or Scratch [20],
a block-based, visual programming environment with more
than 13 million users and over 16 million projects shared
in its open online repository. However, not much attention
has been put so far in the development of learning sup-
port tools, such as assessment and recommender systems.
In the opinion of the authors, learners and educators can
be supported with tools that help and guide them, in the
same way that professional software developers benefit
from using tools (e.g., the ones that analyze their code such
as -like tools and other types of checkers that ana-
lyze code quality [12, 23]). That is why we have built Dr.
Scratch [16], a free/libre/open source webtool that allows to
analyze Scratch projects to assess the level of development
in several aspects of computational thinking – such as ab-
straction and problem decomposition, data representation,
user interactivity, parallelism, synchronization, logical think-
ing and algorithmic notions of flow control – by statically
inspecting the source code of the projects.
Although already being used by several thousands of learn-
ers and educators monthly, the automatic assessment pro-
vided by Dr. Scratch has not been completely validated in a
scientific manner. For this reason we explore the following
question in this preliminary study: What is the correlation
between the automatic score provided by Dr. Scratch and
the one provided (manually) by expert evaluators? We ad-
dress this question with the analysis of the results of over
450 evaluations of Scratch projects by 16 experts acting as
members of the jury for a programming contest with over
50 participants. In consequence, in terms of contribution
to Human-Computer Interaction research, the general im-
plications of this investigation are for building automatic
assessment tools, in particular for educational programming
environments.
The structure of the paper is as follows: in the next section,
we offer some background information on Dr. Scratch, the
automatic assessment tool. Then, we present the method-
ology used in our comparison. We end with discussing our
results, including their limitations and possible paths for fu-
ture research.
Background
Dr. Scratch is inspired by Scrape [25], a visualizer of the
blocks used in Scratch projects, and is based on Hair-
ball [4], a static code analyzer for Scratch projects that de-
tects potential issues in the programs, such as code that is
never executed, messages that no sprite receives or object
attributes not correctly initialized [8].
The Hairball architecture, based on plug-ins, is ideal to add
new features. For instance, Hermans et al. have studied
how bad smells affect negatively the learning process [10,
1]; we have thus developed two plug-ins to detect several
bad programming habits (i.e., bad smells) educators fre-
quently detect in their work as instructors with high school
students [15].
However, the fact that Hairball is run from the command-
line, as it is a set of Python scripts that evaluators have to
manually run, makes it not suitable for many educators that
are not confident with such an environment, let alone for
young students. For this reason, we decided to create a
web-based service called Dr. Scratch that facilitates the
analysis of Scratch projects.
Level of development for
each CT dimension as-
sessed by Dr. Scratch [16]
VI. Parallelism:
1. Two scripts on green flag
2. Two scripts on key
pressed or sprite clicked
3. Two scripts on receive
message, video/audio input,
backdrop change
VII. Synchronization:
1. Wait
2. Message broadcast, stop
all, stop program
3. Wait until, when backdrop
changes, broadcast and wait
Figure 1: Source code of ’Catch
me if you can’. Available at
https://scratch.mit.edu/projects/
138397021/
In addition, we reviewed prior work proposing theories and
methods for assessing the development of programming
and CT skills of learners [22, 21, 5, 13] to come up with a
CT score system. As summarized in the sidebar, the CT
score that Dr. Scratch assigns to projects is based on the
degree of development of seven dimensions of the CT com-
petence: abstraction and problem decomposition, logical
thinking, synchronization, parallelism, algorithmic notions
of flow control, user interactivity and data representation.
These dimensions are statically evaluated by inspecting
the source code of the analyzed project and given a punc-
tuation from 0 to 3, resulting in a total evaluation (mastery
score) that ranges from 0 to 21 when all seven dimensions
are aggregated. Figure 1, which shows the source code of
a single-script Scratch project, can be used to illustrate the
assessment of the tool. Dr. Scratch would assign 6 points
of mastery score to this project: 2 points for flow control,
because it includes a forever loop; 2 points for user interac-
tivity, as players interact with the sprite by using the mouse;
1 point for logical thinking, because of the if statement; and
1 point for data representation, since orientation and posi-
tion properties of the sprite are modified. The rest of the CT
dimensions would be measured with 0 points.
As part of the initial validation process of Dr. Scratch, work-
shops have been organized in schools and high schools to
study the use that students of different ages and with di-
verse backgrounds make of the tool [16]. The scores of Dr.
Scratch have been as well compared with other classic soft-
ware engineering complexity metrics [17]. The results of
these investigations have shown to be very promising and
justify additional validations, such as the one presented in
this paper.
Making an analogy between writing programs and writing
text, this current study is remarkably similar to an investiga-
tion that examined the reliability of a system for automatic
evaluation of written texts [18]. In that study two judges in-
dependently assessed 78 summaries written by students.
These judges’ scores were averaged and then were com-
pared with the assessments provided by an automatic sys-
tem, finding a correlation r= +.82.
Methodology
In order to answer our research question, we organized
a programming contest for Spanish students during the
months of October to December 2015 in collaboration with
the Spanish Foundation for Science and Technology (FE-
CyT) and Google.
The main objective of this contest was to promote scientific
vocations among youngsters. Participating students, from
primary and secondary education, had to create a Scratch
project explaining a scientific concept. To stimulate and en-
courage students, the following questions were provided in
the instructions of the contest: “Could you come up with a
great way to explain the water cycle with Scratch? Or per-
haps you can create a project to present the planets of the
solar system? Or maybe you can author a game to help
us know more about the noble elements?”. Although 87
projects were submitted, only 53 of them met the criteria to
participate1. Figures 2 and 3 show screenshots of two of
the projects accepted for the contest, in which concepts re-
lated to the energy of objects in movement and density for
solids and liquids have been respectively included.
Figure 2: S: a game in which
players have to place a ball on a
base by interacting with its speed,
height, kinetic and potential energy.
Available at https://scratch.mit.edu/
projects/88281683/
Figure 3: To float or not to float: a
game in which players have to
guess if different objects would
float on diverse liquids based on
their density. Available at https://
scratch.mit.edu/projects/85723084/
The jury who evaluated the Scratch projects was formed
by 16 specialists from different backgrounds with a solid
knowledge of computer science education: policy makers,
non-profit organizations promoting CT skills in schools, pri-
mary and secondary teachers, researchers and companies
with programming and robotics programs.
We formed four groups of experts based on their experi-
ence with Scratch, so that the average years of experi-
ence with Scratch ranges from 3.5 to 4 years. Then, we
divided the projects assigned to each group in a way that
each project was at least assessed by six experts (at least
by three experts from two groups). The evaluations were
performed in two weeks, with a different approach in each
of the weeks. With the aim of not influencing the experts’
open evaluation during the first week, we did not offer many
guidelines, so as to let them perform the evaluation in a
very open way based on criteria that they, as experts, deem
appropriate. During the second week we asked them to
grade the projects based on the criteria we chose. So, dur-
ing the first week, experts were asked to evaluate a set of
projects filling out a form with only two fields: an overall
score and general comments. During the second week,
we asked them to evaluate a different set of projects and
to provide, in addition to the overall score, an assessment
for the technical mastery of the project, its creativity and
originality, and the use of aesthetic and sensory effects. All
the previous fields could be punctuated from 1 to 10, and
1The two requirements to participate were: i) projects had to be pub-
licly shared in the repository, and ii) they should be, at least vaguely, re-
lated to some scientific concept.
a text box allowed to add comments if desired. Instructions
and recommendations given to experts were minimal to not
bias their assessment as we wanted them to establish the
criteria they considered more appropriate based on their
experience. We recorded 317 overall score evaluations and
160 technical complexity evaluations of the 53 projects that
were considered valid. Some experts did not complete their
“assignments”; on average we have around 6 global evalu-
ations per project and 3 technical mastery evaluations per
project. These have been compared with the score pro-
vided by Dr. Scratch for each of the projects.
A list with the URLs of the Scratch projects, information on
the experts (name, affiliation and experience with Scratch),
how we grouped the experts, an English translation of the
emails sent to experts, the assessment questionnaires, and
the results of the evaluations are publicly available in the
replication package of the paper2.
Findings
In this section we study the relationship between Dr. Scratch
scores and the evaluations provided by experts, both for the
overall score and the technical mastery. Dr. Scratch does
not assess creativity or aesthetics, and these aspects are
therefore out of the scope of this work. Even though the
variables are considered quasi-interval, since data does
not behave according to a normal distribution for all the
variables, the analysis is based on Spearman’s rho non-
parametric correlations.
Relationship between Dr. Scratch score and experts’ overall score
The 317 evaluations from experts collected in weeks 1 and
2 had scores that ranged from 1 to 10 points. These have
been compared with the ones provided by Dr. Scratch,
2https://github.com/kgblll/ReplicationPackage-2017-CHI-LBW
Statistics Value
mean 1.121
min 0
25% 0.898
50% 1.072
75% 1.380
max 2.250
Table 1: Summary statistics (mean, minimum, maximum and
quartiles) of the standard deviations of the overall evaluations
provided by experts.
ranging from 0 to 21 points, resulting in a strong correla-
tion: r= .682, p(r) < .0001. Figure 4 presents the scatter
plot for this relationship, where a monotonic direct correla-
tion between Dr. Scratch scores and experts’ evaluations is
depicted.
Scores provided by the experts, with different backgrounds
and expertise, are not completely uniform. Table 1 shows
the summary statistics of the standard deviation of the eval-
uations for each project, where we can see that there are
projects where experts coincided in their assessments (min
= 0). In other cases the standard deviation is as high as
2.250 points, being the mean of the standard deviations M
= 1.121.
If we compute the mean of the evaluations of the experts
for each of the projects, and then compare them with the
Dr. Scratch scores, a strong correlation is found: r= .834,
p(r) < .0001. The monotonic positive correlation between
variables is clearly shown in Figure 5.
Figure 4: Scatter plot for experts
overall evaluation (x-axis) and Dr.
Scratch assessment (y-axis).
Darker points represent a higher
number of cases (i.e. more
evaluations).
Figure 5: Scatter plot for the mean
of experts overall evaluation
(x-axis) and Dr. Scratch
assessment (y-axis). Darker points
represent a higher number of
cases (i.e. more evaluations).
Relationships between the Dr. Scratch score and the experts
technical mastery score
During the second week of the investigation, members of
the jury were asked to evaluate the technical mastery of
the projects. If we compare the 160 evaluations with the
Dr. Scratch score, a strong correlation is found: r= .779,
p(r) < .0001. Figure 6 presents the scatter plot for this re-
lationship, showing a monotonic direct correlation between
variables.
If the means of the expert evaluations are computed and
compared with the Dr. Scratch scores, the correlation is
stronger: r= .824, p(r) < .0001. As can be seen in Figure 7,
which presents the scatter plot of the relationship, there is a
positive correlation between these measurements.
Discussion, limitations and future research
The organization of a programming contest has allowed us
to compare the scores of an automatic assessment tool,
Dr. Scratch, with the evaluations of a group of experts in
computer science education. This analysis showed strong
correlations, which could be considered as a validation of
the metrics used by the tool. We argue, in consequence,
that this investigation represents a step in the validation of
Dr. Scratch as a tool to support learners, educators and
researchers in the assessment of programming and CT
skills.
When expert’s evaluations are considered individually, the
relationship with Dr. Scratch assessments is stronger for
the technical mastery of the projects than for the overall
score. However, when project evaluations are averaged,
the relationship is slightly stronger for the overall score,
although the difference is minor. This is explained by the
strong relationship between overall and techical scores; in
fact, an ad-hoc calculation of this relationship indicates a
very strong correlation r= .942, which could be of inter-
est for future research. In any case, according to the as-
sessment research literature [6], Dr. Scratch is ideally con-
vergent with expert evaluators, as the correlation found is
greater than r= .70 when considering the experts’ technical
mastery scores and the ones provided by Dr. Scratch.
Several fundamental aspects of programming, such as de-
bugging, design or remixing skills, are not assessed by Dr.
Scratch. Other crucial aspects of CT skills, such as origi-
nality, creativity or correctness, are not taken into account
either. Furthermore, the fact that a programming construct
appears in a project does not necessarily mean that the au-
thor understands it [5], as it could have been copied from
another project, for instance. In consequence, Dr. Scratch
should not be understood as a replacement of evaluators or
mentors, but as a supporting tool that assists them in some
of the assessment tasks. Automatic assessment of learning
outcomes in programming is an emerging area that needs
further attention by educators and researchers.
Figure 6: Scatter plot for experts
technical mastery evaluation
(x-axis) and Dr. Scratch
assessment (y-axis). Darker points
represent a higher number of
cases (i.e. more evaluations).
Figure 7: Scatter plot for the mean
of experts technical mastery
evaluation (x-axis) and Dr. Scratch
assessment (y-axis). Darker points
represent a higher number of
cases (i.e. more evaluations).
Several cases were found in which there is a notorious dif-
ference between assessments. These discrepancies were
discussed in a non-structured way with some members of
the jury. The main reason for the detected differences was
because of the functionality of the projects. While experts
took into account if projects achieve their goals as stated
in the instructions or the usability of the project, this is not
the case for Dr. Scratch, unable to evaluate such issues.
In addition, at the time of running the contest, Dr. Scratch
measured all scripts in a project, even the ones that were
never executed. Experts, on the contrary, stated that those
scripts should not be considered, as they would not have
influence on the functionality. This has been modified in
newer versions of Dr. Scratch [11].
Being early research, this investigation is based on a low
number of projects, limited to 53, which can be seen as
a threat for the generalizability of the results (external va-
lidity), and has not followed formal interviews with judges
(internal validity). Further research should address these
issues to offer more solid scientific evidence.
Thus, we plan to replicate the investigation with a higher
number of participating students. We also intend to count
with more judges. We would like to study potential differ-
ences in the correlations between measurements in terms
of types of projects (highly vs. lowly ranked by experts
projects), years of experience of the expert or his/her back-
ground field of expertise, etc. In addition to the scores pro-
vided by experts, we would like to carry out semi-structured
interviews with them to discuss the discrepancies between
their assessment and the one by Dr. Scratch, in the hope
that these interviews may provide us reliable, comparable
qualitative data.
All in all, in this paper we have presented evidence that
the automatic assessment of CT skills is a promising area.
We think that further research could result in tools that as-
sist learners and educators, in the same way professional
software developers benefit from modern development-
supporting tools.
Acknowledgments
We would like to thank the participants in the competition,
the expert judges and, as sponsors of the contest, FECyT
and Google. This work has been funded in part by the Re-
gion of Madrid under project “eMadrid - Investigación y De-
sarrollo de tecnologías para el e-learning en la Comunidad
de Madrid” (S2013/ICE-2715). The last author also wants
to acknowledge EU Funded SENECA project, a Marie
Skłodowska-Curie action.
References
[1] Efthimia Aivaloglou and Felienne Hermans. 2016. How
Kids Code and How We Know: An Exploratory Study
on the Scratch Repository. In Proceedings of the 2016
ACM Conference on International Computing Educa-
tion Research (ICER ’16). ACM, New York, NY, USA,
53–61. http://dx.doi.org/10.1145/2960310.2960325
[2] Anja Balanskat and Katja Engelhardt. 2015. Comput-
ing Our Future: Computer Programming and Coding-
Priorities, School Curricula and Initiatives Across Eu-
rope. Technical Report. European Schoolnet.
[3] Valerie Barr and Chris Stephenson. 2011. Bringing
computational thinking to K-12: What is Involved and
what is the role of the Computer Science education
community? ACM Inroads 2, 1 (2011), 48–54.
[4] Bryce Boe, Charlotte Hill, Michelle Len, Greg
Dreschler, Phillip Conrad, and Diana Franklin. 2013.
Hairball: Lint-inspired Static Analysis of Scratch
Projects. In Proceeding of the 44th ACM Technical
Symposium on Computer Science Education (SIGCSE
’13). ACM, New York, NY, USA, 215–220.
[5] Karen Brennan and Mitchel Resnick. 2012. New
frameworks for studying and assessing the develop-
ment of Computational Thinking. In Proceedings of
the 2012 annual meeting of the American Educational
Research Association, Vancouver, Canada. 1–25.
[6] Kevin D Carlson and Andrew O Herdman. 2012. Un-
derstanding the impact of convergent validity on re-
search results. Organizational Research Methods 15,
1 (2012), 17–32.
[7] Stephen Cooper, Wanda Dann, and Randy Pausch.
2000. Alice: a 3-D tool for introductory programming
concepts. Journal of Computing Sciences in Colleges
15, 5 (2000), 107–116.
[8] Diana Franklin, Phillip Conrad, Bryce Boe, Katy Nilsen,
Charlotte Hill, Michelle Len, Greg Dreschler, Gerardo
Aldana, Paulo Almeida-Tanaka, Brynn Kiefer, Chelsea
Laird, Felicia Lopez, Christine Pham, Jessica Suarez,
and Robert Waite. 2013. Assessment of Computer
Science Learning in a Scratch-based Outreach Pro-
gram. In Proceeding of the 44th ACM Technical Sym-
posium on Computer Science Education (SIGCSE
’13). ACM, New York, NY, USA, 371–376.
[9] Google. 2015. Searching for Computer Science: Ac-
cess and Barriers in U.S. K-12 Education. Technical
Report. Gallup. https://services.google.com/fh/files/misc/
searching-for-computer-science_report.pdf
[10] Felienne Hermans and Efthimia Aivaloglou. 2016.
Do code smells hamper novice programming? A
controlled experiment on Scratch programs. In
2016 IEEE 24th International Conference on Pro-
gram Comprehension (ICPC). IEEE, 1–10.
http://dx.doi.org/10.1109/ICPC.2016.7503706
[11] Amy K. Hoover, Jackie Barnes, Borna Fatehi, Jesús
Moreno-León, Gillian Puttick, Eli Tucker-Raymond, and
Casper Harteveld. 2016. Assessing Computational
Thinking in Students’ Game Designs. In Proceedings
of the 2016 Annual Symposium on Computer-Human
Interaction in Play Companion Extended Abstracts
(CHI PLAY Companion ’16). ACM, New York, NY,
USA, 173–179.
[12] Stephen C Johnson. 1977. Lint, a C program checker.
Technical Report Computer Science 65. Bell Laborato-
ries.
[13] Kyu Han Koh, Ashok Basawapatna, Vicki Bennett, and
Alexander Repenning. 2010. Towards the Automatic
Recognition of Computational Thinking for Adaptive
Visual Language Learning. In 2010 IEEE Symposium
on Visual Languages and Human-Centric Computing.
59–66. http://dx.doi.org/10.1109/VLHCC.2010.17
[14] Matt MacLaurin. 2009. Kodu: End-user Programming
and Design for Games. In Proceedings of the 4th Inter-
national Conference on Foundations of Digital Games
(FDG ’09). ACM, New York, NY, USA, Article 2.
http://dx.doi.org/10.1145/1536513.1536516
[15] Jesús Moreno and Gregorio Robles. 2014. Automatic
detection of bad programming habits in Scratch: A
preliminary study. In 2014 IEEE Frontiers in Education
Conference (FIE) Proceedings. 1–4. http://dx.doi.
org/10.1109/FIE.2014.7044055
[16] Jesús Moreno-León, Gregorio Robles, and Marcos
Román-González. 2015. Dr. Scratch: Automatic Anal-
ysis of Scratch Projects to Assess and Foster Com-
putational Thinking. RED. Revista de Educación a
Distancia 15, 46 (2015), 23.
[17] Jesús Moreno-León, Gregorio Robles, and Marcos
Román-González. 2016. Comparing computational
thinking development assessment scores with soft-
ware complexity metrics. In 2016 IEEE Global Engi-
neering Education Conference (EDUCON). 1040–
1045. http://dx.doi.org/10.1109/EDUCON.2016.
7474681
[18] Ricardo Olmos, Guillermo Jorge-Botana, José M.
Luzón, Jesús I. Martín-Cordero, and José Antonio
León. 2016. Transforming {LSA} space dimensions
into a rubric for an automatic assessment and feed-
back system. Information Processing & Management
52, 3 (2016), 359–373.
[19] Seymour Papert. 1980. Mindstorms: Children, com-
puters, and powerful ideas. Basic Books, Inc.
[20] Mitchel Resnick, John Maloney, Andrés Monroy-
Hernández, Natalie Rusk, Evelyn Eastmond, Karen
Brennan, Amon Millner, Eric Rosenbaum, Jay Silver,
Brian Silverman, and Yasmin Kafai. 2009. Scratch:
Programming for All. Commun. ACM 52, 11 (Nov.
2009), 60–67.
[21] Linda Seiter and Brendan Foreman. 2013. Modeling
the Learning Progressions of Computational Think-
ing of Primary Grade Students. In Proceedings of the
Ninth Annual International ACM Conference on Inter-
national Computing Education Research (ICER ’13).
ACM, New York, NY, USA, 59–66.
[22] Amanda Wilson, Thomas Hainey, and Thomas Con-
nolly. 2012. Evaluation of computer games developed
by primary school children to gauge understanding of
programming concepts. In European Conference on
Games Based Learning. Academic Conferences Inter-
national Limited, 549.
[23] Cindy Wilson and Leon J Osterweil. 1985. Omega–A
Data Flow Analysis Tool for the C Programming Lan-
guage. IEEE Transactions on Software Engineering
11, 9 (1985), 832.
[24] Jeannette M Wing. 2006. Computational Thinking.
Commun. ACM 49, 3 (2006), 33–35.
[25] Ursula Wolz, Christopher Hallberg, and Brett Tay-
lor. 2011. Scrape: A tool for visualizing the code of
Scratch programs. In Poster presented at the 42nd
ACM Technical Symposium on Computer Science Ed-
ucation. Dallas, TX.
... The second strength of this study was its scope. Studies e xamine different components of Scratch education, such as academic performance (Korkmaz, 2018;Tan, Samsudin, Ismail & Ahmad, 2020); computational thinking (Oluk & Korkmaz, 2016;Kwon, Lee & Chung, 2018); preprogramming period (e.g., Java) (Malan & Leitner, 2007;Maloney et al., 2008); and analysis of Scratch projects (Kwon, Lee, & Chung, 2018;Moreno-León et al., 2017;Oluk & Korkmaz, 2016). This study analyzed 220 original Scratch projects. ...
... ;Altanis & Retalis, 2019;Kwon, Lee & Chung, 2018;Moreno-León et al., 2017;Oluk & Korkmaz, 2016). For example,Moreno-León et al. (2017) compared Scratch projects evaluated by Dr. Scratch (automatically) and experts (manually). ...
... ;Altanis & Retalis, 2019;Kwon, Lee & Chung, 2018;Moreno-León et al., 2017;Oluk & Korkmaz, 2016). For example,Moreno-León et al. (2017) compared Scratch projects evaluated by Dr. Scratch (automatically) and experts (manually). Dr. Scratch evaluates the level of development of various aspects of computational thinking. ...
Article
This paper analyzed Scratch projects developed by undergraduate students. The sample consisted of 22 child development students (18 women and four men) in the 2018-2019 academic year. The study adopted an action research design within the scope of a course titled “Teaching Science and Mathematics in Preschool Education.” The research was conducted within 14 weeks. In the first four weeks, we provided participants with training on why and how to use Scratch in science and mathematics teaching. In the following ten weeks, participants designed Scratch projects every week based on age groups, topics, and learning outcomes of their choice. Participants evaluated their projects themselves and also received feedback from peers and academics. Each participant designed ten Scratch projects (five for math and five for science). The data consisted of 220 Scratch projects and design logs. The data were analyzed using content analysis. In the first weeks, participants knew little about the content of Scratch and used one or two characters and mostly control and looks blocks. In the following weeks, they learned more about Scratch and used different Blocks. Anahtar Kelimeler Scratch, science education, mathematics education, preschool education, coding
... Dr Scratch assesses Scratch projects on different dimensions, including CT skills, coding, creativity, and collaboration. Moreover, Moreno-León et al. (2017) found a strong correlation (r=0.682) between Dr Scratch's automated assessment and human expert evaluation of students' programs created with Scratch, demonstrating its value as an alternative and reliable CT assessment tool. The tool also provides feedback on how to improve these skills by identifying the strengths and weaknesses of colored code blocks. ...
... examine, conceptualize, and analyze further CT practices in terms of program quality via visual coding(Moreno-León & Robles, 2015;Troiano et al., 2020).Boom et al. (2022) have further categorized CT practices and scales analyses as follows: 0 are judged as (not evident), 1 (Basic), 2 (Developing), and 3 (Proficient), and such scores are regarded as satisfactory criterion validity.Various scores can be aggregated from 0 to 21, and divided as follows: a) 8 and 14 for general development, b) lower than 8 for generally basic, and c) up to 14 for general proficiency. There is a common agreement among researchers(Boom et al., 2022;Moreno-León et al., 2017) that Dr Scratch is regarded as useful to giving information about visual coding and CT skills measurement, whereas effective and efficient programming for problem-solving contexts were considered by other measures.22 ...
Article
Full-text available
There is substantial evidence that incorporating interactive environments for game-based instruction has a significant potential to support the development of computational thinking and programming skills in primary education students. However, it is not clear whether a simulation game (SG) with different user interface elements, created via three-dimensional (3D) virtual worlds and visual programming environments to project various problem-solving exercises in a simulated reality, can significantly influence students to think and practice "computationally" their solution plans into code. The current study aims to identify any possible added value of each instructional approach by measuring students' game experience and learning performance. This quasi-experimental study involved ninety participants (n=90) aged between 10 and 11 years that consisted of two comparison conditions. The experimental group (n=45) received training using OpenSimulator in combination with Scratch4SL, whereas the control group (n=45) was solely trained using Scratch. The findings indicate considerable differences in students' game experience and satisfaction, but no statistically significant difference in their learning performance and knowledge gain was identified. This study provides several design implications for user interface and game elements to inform educational practitioners and instructors about the benefits that each approach can offer students for better knowledge acquisition and deeper disciplinary understanding.
... The shared feature of both interventions was biological evolution taught through computational thinking. These activities naturally incorporate biology and computation yet many activity descriptions found in standard educational manuals are unclear for instructors and the provided examples are outdated (Moreno-León et al. 2017). ...
Article
Full-text available
Research on exploring the relationship between computational thinking and domain specific knowledge gains (i.e. biological evolution) are becoming more common in science education research. The mechanisms behind these relationships are not well understood, particularly between computational practices and biological evolution content knowledge. Increased computational complexity (i.e. simple to complex) may support a greater comprehension of scales or levels of biological organization (i.e. micro to macro) within the context of biological evolution learning. We made use of quantitative methods from qualitative work in the form of coding and relational analysis to identify which biological levels of organization students addressed, how students made connections between these levels and the level of computational complexity displayed during evolution learning with the use of two computational interventions. The aim of this study was not only exploring the biological levels and biological level connections made during the computational thinking interventions, but also analysis of the differences between these two interventions. The results illuminated that use of specific biological levels, biological level connections and differences in computational complexity were distinguishable and there were significant differences between the interventions. These factors may contribute to better understanding of biological evolution knowledge gains.
... The Scratch project analysis is a form of portfolio assessment that is commonly employed. Dr. Scratch's rubrics are frequently used by teachers and researchers to analyze student Scratch projects (Moreno-León, 2017;Zeevaarders & Aivaloglou, 2021). The rubric includes seven indicators, namely problem abstraction and decomposition, parallelism, logical thinking, synchronization, algorithmic ideas flow control, user interactivity, and data representation. ...
Article
Full-text available
The primary aim of this research was to conduct a systematic review on the assessment of computational thinking skills. The employed research method involved a thorough exploration of diverse databases through Google Scholar, employing the keyword "computational thinking" to retrieve pertinent articles. A total of 96 articles were chosen as research samples and subjected to analysis using content analysis techniques to scrutinize education level and evaluation tools variables. The research revealed that the education level variable was classified into four tiers: elementary school (26.17%), junior high school (29.91%), senior high school (19.63%), and college (24.30%). Simultaneously, the evaluation tool variable was categorized into four segments, comprising traditional tools (22.73%), portfolios (33.33%), interviews (15.91%), and surveys (28.03%). Computational thinking (CT) is predominantly assessed among children due to their developmental stage, fostering receptiveness to novel concepts. This facilitates the teaching of fundamental CT principles, such as programming basics, logic, and algorithms. Regarding evaluation tools, portfolios are frequently employed to assess CT as they can depict a student's proficiency in solving intricate problems, showcasing evidence of their work and completed projects for a more holistic assessment
... Robles, 2015). In their study,Moreno-León et al. (2017) found a strong correlation (r=0.682) between Dr. Scratch automated assessment and measurement by human experts as far as students' programs using Scratch are concerned, providing it as an alternative and valid CT assessment tool. It consists of the following 7-item scale: abstraction and problem decomposition, parallelism, logical thinking, synchronization, algorithmic notions of flow control, user interactivity, and data representation. ...
Article
Full-text available
Background: Owing to the exponential growth of three-dimensional (3D) environments among researchers and educators to create simulation games (SGs) in primary education, there is a growing interest to examine their potential support in computer science courses instead of visual programming environments. Objectives: This study explores the relationships between students’ learning performance, focusing on computational thinking (CT) and programming skills development, in association with their cognitive load (mental load and mental effort) and emotions (happiness, anger, anxiety, sadness) by playing a SG. Methods: A total of two-hundred and ninety participants (n=290) in fifth-grade classes (10-11 years old) of Greek primary schools completed all pre-and post-intervention tests. A quasi-experimental study was conducted over a 14-week timetable course and in two comparison conditions, in which a SG was created by using OpenSimulator&Scratch4SL and Scratch for the experimental group (n=145) and the control group (n=145), respectively. To further investigate the effectiveness of the proposed SG in each teaching intervention, an exploration of relationships between students’ learning performance, cognitive load, and emotions through multiple analyses, depending on correlation, t-tests, correlation, and hierarchal regression were delivered. Results and Conclusions: The findings indicate that the proposed SG created by using OpenSimulator&Scratch4SL positively affected students’ emotions and cognitive load, whereas there was no significant difference in learning gain between the two groups. Implications: This study provides empirical evidence on the effects of SGs on students’ knowledge acquisition, highlighting the importance of considering both cognitive and emotional components in the design of these games. The findings offer valuable insights for implications and design guidelines.
... Les auteurs ont réussi à montrer que ce feedback peut être bénéfique pour améliorer les performances de codage des élèves, et donc potentiellement la maîtrise sous-jacente des compétences liées à la pensée informatique (Moreno-León & Robles, 2015). Il existe par ailleurs de fortes corrélations entre les évaluations réalisées automatiquement via Dr. Scratch et les évaluations réalisées par des experts humains (Moreno-León, Román-González, Harteveld & Robles, 2017). ...
Article
Malgré l’arrivée de la programmation informatique dans les cursus scolaires, il subsiste de nombreuses incertitudes sur les moyens mis en œuvre pour évaluer son apprentissage. L’une des finalités principales de l’apprentissage de la programmation serait la maîtrise de la pensée informatique, dont le développement constituerait un enjeu éducatif majeur pour les décennies à venir. Le présent article propose donc de passer en revue les outils d’évaluation des compétences en pensée informatique et leurs limites. Diverses approches sont discutées : échelles auto-évaluatives, outils d’analyse du code produit par l’élève, tâches de résolution de problèmes. L’importance de distinguer la compréhension des notions et la capacité à résoudre des problèmes dans la construction de ces outils est abordée. L’objectif de cet article est de fournir aux chercheurs comme aux enseignants une synthèse concernant les différentes approches disponibles pour évaluer le développement de la pensée informatique en contexte scolaire. Cette synthèse aura des retombées sur les recherches à venir consacrées à l’évaluation de la pensée informatique et pourra alimenter la réflexion engagée sur les pratiques à l’école.
Article
Due to its task-based, gamified, and interactive features, unplugged programming activity has been widely employed in education and teaching as an activity away from electronic screens and other digital devices. There is ongoing debate over how to help K-9 pupils develop their computational thinking through unplugged programming activities. Based on findings from 15 experimental and quasi-experimental studies conducted domestically and abroad between 2006 and 2023, this study adopts meta-analysis to quantitatively analyze the influence of unplugged programming activities on K-9 students’ computational thinking, focusing on an in-depth analysis of variables such as different students’ grade, gender, experimental period, research discipline, and unplugged programming activities categories. The results show that unplugged programming activity has a positive effect on the computational thinking of K-9 students (Hedges’ g = .631, 95% CI .463, .799, P < .001), and its promotion effect is affected by the variables such as students’ grade, learning time, subjects studied, and unplugged programming activities categories. Based on the conclusions of the meta-analysis, this paper proposes several suggestions. For example, it suggests paying attention to the cultivation of computational thinking among middle school students, and mastering the appropriate timing for unplugged programming activities to avoid the plateau effect. Additionally, it recommends strengthening the integration of unplugged programming activities across different disciplines, as well as promoting the use of storytelling and collaborative unplugged programming activities in the classroom.
Preprint
Full-text available
Research on exploring the relationship between computational thinking and domain specific knowledge gains (i.e. biological evolution) are becoming more common in science education research. The mechanisms behind these relationships are not well understood, particularly between computational practices and biological evolution content knowledge. Increased computational complexity (i.e. simple to complex) may support a greater comprehension of scales or levels of biological organization (i.e. micro to macro) within the context of biological evolution learning. We made use of both qualitative and quantitative methods to identify which biological levels of organization students addressed, how students made connections between these levels and the level of computational complexity displayed during evolution learning with the use of two computational interventions. The aim of this study was not only exploring the biological levels and biological level connections made during the computational thinking interventions, but also analysis of the differences between these two interventions. The results illuminated that use of specific biological levels, biological level connections and differences in computational complexity were distinguishable and there were significant differences between the interventions. These factors may contribute to better understanding of biological evolution knowledge gains.
Book
The security of Web applications is one noteworthy component that is often overlooked within the creation of Web apps. Web application security is required for securing websites and online services against distinctive security threats. The vulnerabilities of the Web applications are for the most part the outcome of a need for sanitization of input/output which is frequently utilized either to misuse source code or to pick up unauthorized access. An attacker can misuse vulnerabil�ities in an application’s code. The security of Web applications may be a central component of any Web-based commerce. The security of Web applications deals particularly with the security encompassing websites, Web applications, and Web administrations such as APIs. This paper gives a testing approach for vulnerability evaluation of Web applications to address the extent of security issues. We illustrate the vulnerability assessment tests on Web applications. Showing how with an aggregation of tools, the vulnerability testing broadcast for Web applications can be enhanced.
Article
Full-text available
Educational research has used the information extracted from facial expressions to explain learning performance in various educational settings like collaborative learning. Leveraging this, we extracted the emotions based upon two different theoretical frameworks from videos with children aged 13–16 while collaborating to create games using Scratch. The two sets of emotions are based on the control value theory (happiness, sadness, anger, surprise) and the education-specific expressions (frustration, boredom, confusion, delight). We computed the groups’ objective performance, which was calculated based on their created artifacts. We divided them into high and low performance and compared them based on individual emotions’ duration and the transitions among the emotions. We also used the subjective indication of their perceived performance from a self-reported questionnaire, divided them into another performance category, and did a similar analysis with the objective performance. Results show that the objective performance is better explained by the education-specific emotions and the negative valance emotions from the control value theory-based emotions. On the other hand, subjective performance is better explained by the control value theory based on emotions. Based on the results, we suggest implications both for the instructors and students.
Conference Paper
Full-text available
Designing games requires a complex sequence of planning and executing actions. This paper suggests that game design requires computational thinking, and discusses two methods for analyzing computational thinking in games designed by students in the visual programming language Scratch. We present how these two analyses produce different narratives of computational thinking for our case studies, and reflect on how we plan to move forward with our larger analysis.
Conference Paper
Full-text available
Block-based programming languages like Scratch, Alice and Blockly are becoming increasingly common as introductory languages in programming education. There is substantial research showing that these visual programming environments are suitable for teaching programming concepts. But, what do people do when they use Scratch? In this paper we explore the characteristics of Scratch programs. To this end we have scraped the Scratch public repository and retrieved 250,000 projects. We present an analysis of these projects in three different dimensions. Initially, we look at the types of blocks used and the size of the projects. We then investigate complexity, used abstractions and programming concepts. Finally we detect code smells such as large scripts, dead code and duplicated code blocks. Our results show that 1) most Scratch programs are small, however Scratch programs consisting of over 100 sprites exist, 2) programming abstraction concepts like procedures are not commonly used and 3) Scratch programs do suffer from code smells including large scripts and unmatched broadcast signals.
Conference Paper
Full-text available
The development of computational thinking skills through computer programming is a major topic in education, as governments around the world are introducing these skills in the school curriculum. In consequence, educators and students are facing this discipline for the first time. Although there are many technologies that assist teachers and learners in the learning of this competence, there is a lack of tools that support them in the assessment tasks. This paper compares the computational thinking score provided by Dr. Scratch, a free/libre/open source software assessment tool for Scratch, with McCabe’s Cyclomatic Complexity and Halstead’s metrics, two classic software engineering metrics that are globally recognized as a valid measurement for the complexity of a software system. The findings, which prove positive, significant, moderate to strong correlations between them, could be therefore considered as a validation of the complexity assessment process of Dr. Scratch.
Article
Full-text available
The purpose of this article is to validate, through two empirical studies, a new method for automatic evaluation of written texts, called Inbuilt Rubric, based on the Latent Semantic Analysis (LSA) technique, which constitutes an innovative and distinct turn with respect to LSA application so far. In the first empirical study, evidence of the validity of the method to identify and evaluate the conceptual axes of a text in a sample of 78 summaries by secondary school students is sought. Results show that the proposed method has a significantly higher degree of reliability than classic LSA methods of text evaluation, and displays very high sensitivity to identify which conceptual axes are included or not in each summary. A second study evaluates the method's capacity to interact and provide feedback about quality in a real online system on a sample of 924 discursive texts written by university students. Results show that students improved the quality of their written texts using this system, and also rated the experience very highly. The final conclusion is that this new method opens a very interesting way regarding the role of automatic assessors in the identification of presence/absence and quality of elaboration of relevant conceptual information in texts written by students with lower time costs than the usual LSA-based methods.
Conference Paper
Full-text available
Using the Scratch environment as a tool to teach programming skills or develop computational thinking is increasingly common in all levels of education, well-documented case studies from primary school to university can be found. However, there are reports that indicate that students learning to program in this environment show certain habits that are contrary to the basic programming recommendations. In our work as instructors with high school students, we have detected these and other bad practices, such as the repetition of code and object naming, on a regular basis. This paper focuses on verifying whether these issues can generally be found in the Scratch community, by analyzing a large number of projects available on the Scratch community website. To test this hypothesis, we downloaded 100 projects and analyzed them with two plug-ins we developed to automatically detect these bad practices. The plug-ins extend the functionality of the Hairball tool, a static code analyzer for Scratch projects. The results obtained show that, in general, projects in the repository also incur in the investigated malpractices. Along with suggestions for future work, some ideas that might assist to address such situations are proposed in the conclusions of the paper.
Article
Full-text available
Using the Scratch environment as a tool to teach programming skills or develop computational thinking is increasingly common in all levels of education, well-documented case studies from primary school to university can be found. However, there are reports that indicate that students learning to program in this environment show certain habits that are contrary to the basic programming recommendations. In our work as instructors with high school students, we have detected these and other bad practices, such as the repetition of code and object naming, on a regular basis. This paper focuses on verifying whether these issues can generally be found in the Scratch community, by analyzing a large number of projects available on the Scratch community website. To test this hypothesis, we downloaded 100 projects and analyzed them with two plug-ins we developed to automatically detect these bad practices. The plug-ins extend the functionality of the Hairball tool, a static code analyzer for Scratch projects. The results obtained show that, in general, projects in the repository also incur in the investigated malpractices. Along with suggestions for future work, some ideas that might assist to address such situations are proposed in the conclusions of the paper.
Article
Full-text available
One of the barriers to entry of computer programming in schools is the lack of tools that support educators in the assessment of student projects. In order to amend this situation this paper presents Dr. Scratch, a web application that allows teachers and students to automatically analyze projects coded in Scratch, the most used programming language in primary and secondary education worldwide, to check if they have been properly programmed, learn from their mistakes and get feedback to improve their code and develop their Computational Thinking (CT) skills. One of the goals of Dr. Scratch, besides supporting teachers in the evaluation tasks, is to act as a stimulus to encourage students to keep on improving their programming skills. Aiming to check its effectiveness regarding this objective, workshops with students in the range from 10 to 14 years were run in 8 schools, in which over 100 learners analyzed one of their Scratch projects with Dr. Scratch, read the information displayed as feedback by Dr. Scratch, and tried to improve their projects using the guidelines and tips offered by the tool. Our results show that at the end of the workshop, students increased their CT score and, consequently, improved their coding skills.
Article
Under the Curriculum for Excellence (CfE) in Scotland, newer approaches such as games-based learning and games-based construction are being adopted to motivate and engage students. Construction of computer games is seen by some to be a highly motivational and practical approach at engaging children at Primary Education (PE) level in computer programming concepts. Games-based learning (GBL) and gamesbased construction both suffer from a dearth of empirical evidence supporting their validity as teaching and learning approaches. To address this issue, this paper will present the findings of observational research at PE level using Scratch as a tool to develop computer games using rudimentary programming concepts. A list of criteria will be compiled for reviewing the implementation of each participant to gauge the level of programming proficiency demonstrated. The study will review 29 games from Primary 4 to Primary 7 level and will present the overall results and results for each individual year. This study will contribute to the empirical evidence in gamesbased construction by providing the results of observational research across different levels of PE and will provide pedagogical guidelines for assessing programming ability using a games-based construction approach.