Conference PaperPDF Available

Complementary Tools for Computational Thinking Assessment


Abstract and Figures

Computational thinking (CT) is emerging as a key set of problem-solving skills that must be developed by the new generations of digital learners. However, there is still a lack of consensus on a formal CT definition, on how CT should be integrated in educational settings, and specially on how CT can be properly assessed. The latter is an extremely relevant and urgent topic because without reliable and valid assessment tools, CT might lose its potential of making its way into educational curricula. In response, this paper is aimed at presenting the convergent validity of one of the major recent attempts to assess CT from a summative-aptitudinal perspective: the Computational Thinking Test (CTt). The convergent validity of the CTt is studied in middle school Spanish samples with respect to other two CT assessment tools, which are coming from different perspectives: the Bebras Tasks, built from a skill-transfer approach; and Dr. Scratch, an automated tool designed from a formative-iterative approach. Our results show statistically significant, positive and moderately intense, correlations between the CTt and a selected set of Bebras Tasks (r=0.52); and between the CTt and Dr. Scratch (predictive value r=0.44; concurrent value r=0.53). These results support the statement that CTt is partially convergent with Bebras Tasks and with Dr. Scratch. Finally, we discuss if these three tools are complementary and may be combined in middle school.
Content may be subject to copyright.
Complementary Tools for Computational Thinking Assessment
1 Universidad Nacional de Educación a Distancia, Spain
2 & Universidad Rey Juan Carlos, Spain
3 Universidad Rey Juan Carlos, Spain
Computational thinking (CT) is emerging as a key set of
problem-solving skills that must be developed by the new
generations of digital learners. However, there is still a lack
of consensus on a formal CT definition, on how CT should
be integrated in educational settings, and specially on how
CT can be properly assessed. The latter is an extremely
relevant and urgent topic because without reliable and
valid assessment tools, CT might lose its potential of
making its way into educational curricula. In response, this
paper is aimed at presenting the convergent validity of one
of the major recent attempts to assess CT from a
summative-aptitudinal perspective: the Computational
Thinking Test (CTt). The convergent validity of the CTt is
studied in middle school Spanish samples with respect to
other two CT assessment tools, which are coming from
different perspectives: the Bebras Tasks, built from a skill-
transfer approach; and Dr. Scratch, an automated tool
designed from a formative-iterative approach. Our results
show statistically significant, positive and moderately
intense, correlations between the CTt and a selected set of
Bebras Tasks (r=0.52); and between the CTt and Dr.
Scratch (predictive value r=0.44; concurrent value r=0.53).
These results support the statement that CTt is partially
convergent with Bebras Tasks and with Dr. Scratch.
Finally, we discuss if these three tools are complementary
and may be combined in middle school.
Computational thinking assessment, Computational
Thinking Test, Dr. Scratch, Bebras Tasks, middle school.
Computational thinking (CT) is considered in many
countries as a key set of problem-solving skills that must
be acquired and developed by today’s generation of
learners (Bocconi et al., 2016). However, there is still a
lack of consensus on a formal CT definition (Kalelioglu,
Gülbahar, & Kukul, 2016), on how CT should be
integrated in educational settings (Lye & Koh, 2014), and
especially on how CT can be properly assessed (Grover,
2015; Grover & Pea, 2013). Regarding to the latter, even
though computing is being included into K-12 schools all
around the world, the issue of assessing student’s CT
remains a thorny one (Grover, Cooper, & Pea, 2014).
Hence, CT assessment is an extremely relevant and urgent
topic to address, because “without attention to assessment,
CT can have little hope of making its way successfully into
any K-12 curriculum”, and consequently “measures that
would enable educators to assess what the child has learned
need to be validated” (Grover & Pea, 2013, p. 41).
Moreover, from a psychometric approach, CT is still a
poorly defined psychological construct as its nomological
network has not been completely established; that is, the
correlations between CT and other psychological
constructs have not been completely reported by the
scientific community yet (Román-González, Pérez-
González, & Jiménez-Fernández, 2016). Furthermore,
there is still a large gap of tests relating to CT that have
undergone a comprehensive psychometric validation
process (Mühling, Ruf, & Hubwieser, 2015). As Buffum et
al. (2015) say: “developing (standardized) assessments of
student learning is an urgent area of need for the relatively
young computer science education community” (Buffum et
al., 2015, p. 622)
In order to shed some light on this issue, one of the major
attempts to develop a solid psychometric tool for CT
assessment is the Computational Thinking Test (CTt)
(Román-González, 2015). This is a multiple-choice test
that has demonstrated to be valid and reliable (α=0.80;
rxx=0.70) in middle school subjects, and which has
contributed to the nomological network of CT in regard to
other cognitive (Román-González, Pérez-González, &
Jiménez-Fernández, 2016) and non-cognitive (Román-
González, Pérez-González, Moreno-León, & Robles, 2016)
key psychological constructs. Continuing this research line,
now we investigate the convergent validity of the CTt, that
is, the correlations between this test and other tools aimed
at assessing CT. Thus, our general research question is:
RQ (general): What is the convergent validity of the CTt?
1.1. Computational thinking assessment tools
Focusing on K-12 education, especially in middle school
and without being exhaustive, we find several CT
assessment tools developed from different perspectives:
CT Summative tools. We can differentiate between: a)
Aptitudinal tests such as the aforementioned
Computational Thinking Test (which is further described in
2.1.), the Test for Measuring Basic Programming Abilities
(Mühling et al., 2015), or the Commutative Assessment
Test (Weintrop & Wilensky, 2015). And b) Content-
knowledge assessment tools such as the summative tools of
Meerbaum-Salant et al. (2013) in the Scratch context, or
those used for measuring the students’ understanding of
computational concepts after introducing a new computing
curriculum (e.g., in Israel, Zur-Bargury, Pârv, & Lanzberg,
CT Formative-iterative tools. They provide feedback,
usually in an automatic way, for learners to improve their
CT skills. These tools are specifically designed for a
particular programming environment. Thus, we find Dr.
Scratch (Moreno-León & Robles, 2015) or Ninja Code
Village (Ota, Morimoto, & Kato, 2016) for Scratch; the
ongoing work of Grover et al. (2016) for Blockly; or the
Computational Thinking Patterns CTP-Graph (Koh,
Basawapatna, Bennett, & Repenning, 2010) for
CT Skill-Transfer tools. They are aimed at assessing the
students’ transfer of their CT skills to different types of
problems: for example, the Bebras Tasks (Dagiene &
Futschek, 2008) focused on measuring transfer to ‘real-
life’ problems; or the CTP-Quiz (Basawapatna, Koh,
Repenning, Webb, & Marshall, 2011), which evaluates the
transfer of CT to the context of scientific simulations.
CT Perceptions-Attitudes scales, such as the
Computational Thinking Scales (CTS) (Korkmaz, Çakir, &
Özden, 2017), which uses five-point Likert scales and has
been recently validated with Turkish students.
CT Vocabulary assessments. They are aimed at measuring
elements and dimensions of CT verbally expressed by
children (i.e. ‘computational thinking language’; e.g.
Grover, 2011).
Using only one type from the aforementioned assessment
tools can lead to misunderstand the development of CT
skills by students. In this sense, Brennan and Resnick
(2012) have stated that looking at student-created programs
alone could provide an inaccurate sense of students’
computational competencies, and they underscore the need
for multiple means of assessment. Therefore, as it has been
pointed out by relevant researchers (Grover, 2015; Grover
et al., 2014), in order to reach a total and comprehensive
understanding of the CT of our students, different types of
complementary assessments tools must be systematically
combined (i.e. also called “systems of assessments”).
Following this idea, our paper is specifically aimed at
studying the convergent validity of the CTt with respect to
other assessment tools, which are coming from different
perspectives. Thus, our specific research questions are:
RQ (specific-1): What is the convergent validity between CTt
and Bebras Tasks? RQ (specific-2): What is the convergent
validity between CTt and Dr. Scratch?
Although the three instruments involved in our research are
aimed at assessing the same construct (i.e. CT), as they
approach the measurement from different perspectives, a
total convergence (r>0.7) is not expected among them, but
a partial one (0.4<r<0.7) (Carlson & Herdman, 2012).
Answering the aforementioned questions may contribute to
develop a comprehensive “system of assessment” for CT in
middle school settings.
2.1. Computational Thinking Test (CTt)
The Computational Thinking Test1 (CTt) is a multiple-
choice instrument composed by 28 items, which are
administered on-line (via non-mobile or mobile electronic
devices) in a maximum time of 45 minutes. Each item of
the CTt is presented either in a ‘maze’ or in a ‘canvas’
interface; and is designed according to the following three
dimensions (Román-González, 2015; Román-González,
Pérez-González, & Jiménez-Fernández, 2016):
Computational concept addressed: each item
addresses one or more of the following seven
computational concepts, ordered in increasing
difficulty: Basic directions and sequences; Loops–
repeat times; Loops–repeat until; If–simple
conditional; If/else–complex conditional; While
conditional; Simple functions. These
‘computational concepts’ are progressively nested
along the test, and are aligned with the CSTA
Computer Science Standards for the 7th and 8th
grade (Seehorn et al., 2011).
Style of answers: in each item, responses are
presented in any of these two styles: ‘visual
arrows’ or ‘visual blocks’.
Required task: depending on which cognitive
task is required for solving the item: ‘sequencing’
stating in an orderly manner a set of commands,
‘completion’ of an incomplete set of commands,
or ‘debugging’ an incorrect set of commands.
We show an example of a CTt item translated into English
in Figure 1, with its specifications detailed below.
Figure 1. CTt, item nº 8 (‘maze’): loops-repeat times
(nested); visual blocks; sequencing.
1 Sample copy available at:
2.2. Bebras Tasks
The Bebras Tasks are a set of activities designed within the
context of the Bebras International Contest2, a competition
born in Lithuania in 2003 which aims to promote the
interest and excellence of primary and secondary students
around the world in the field of Computer Science from a
CT perspective (Dagiene & Futschek, 2008; Dagiene &
Stupuriene, 2015). Each year, the contest launches a set of
Bebras Tasks, whose overall approach is the resolution of
‘real-life’ and significant problems, through the transfer
and projection of the students’ CT. These Bebras Tasks are
independent from any particular software or hardware, and
can be administered to individuals without any prior
programming experience. For all these features, the Bebras
Tasks have been pointed out to more than likely be an
embryo for a future PISA (Programme for International
Student Assessment) test in the field of Computer Science
(Hubwieser & Mühling, 2014). As an example, one of the
Bebras Tasks used in our research is shown in Figure 2.
Figure 2. Example of a Bebras Task (‘Water Supply’).
2.3. Dr. Scratch
Dr. Scratch3 (Moreno-León & Robles, 2015) is a free and
open source web application designed to analyze, in an
automated way, projects programmed with Scratch. In
addition, the tool provides feedback that middle school
students can use to improve their programming and CT
skills (Moreno-León, Robles, & Román-González, 2015).
Therefore, Dr. Scratch is an automated tool for the
formative assessment of Scratch projects.
As summarized in Table 1, the CT score that Dr. Scratch
assigns to a project is based on the level of development of
seven dimensions of the CT competence. These dimensions
are statically evaluated by inspecting the source code of the
analyzed project and given a punctuation from 0 to 3,
resulting in a total evaluation (‘mastery score’) that ranges
from 0 to 21 when all seven dimensions are aggregated.
Figure 3, which shows the source code of a Scratch project,
can be used to illustrate the assessment of the tool. Dr.
Scratch would assign 8 points of ‘mastery score’ to this
project: 2 points for logical thinking, since it includes an
‘if-else’ statement; 2 points for user interactivity, as players
interact with the sprite by using the mouse; 2 points for
data representation, because the project makes use of a
variable; 1 point for abstraction and problem
decomposition, since there are two scripts in the project;
and 1 point for flow control, because the programs are
formed by a sequence of instructions with no loops.
Parallelism and synchronization dimensions would be
measured with 0 points.
Table 1. Dr. Scratch’s score assignment.
CT dimension
Competence Level
(1 point)
(2 points)
(3 points)
Abstraction and
More than
one script
Use of custom
Use of ’clones’
of sprites)
thinking If If else Logic operations
Synchronization Wait
broadcast, stop
all, stop
Wait until, when
changes, broadcast
and wait
scripts on
green flag
Two scripts on
pressed or sprite
Two scripts on
video/audio input,
backdrop change
Flow control Sequence
of blocks Repeat, forever Repeat until
interactivity Green flag
mouse, ask and
Webcam, input
of object
Variables Lists
Figure 3. Source code of ‘Catch me if you can 2’.
Available at
Dr. Scratch is currently under validation process, although
its convergent validity with respect to other traditional
metrics of software complexity has been already reported
(Moreno-León, Robles, & Román-González, 2016).
The convergent validity of the CTt with respect to Bebras
Tasks and Dr. Scratch was investigated through two
different correlational studies, with two independent
3.1. First study: CTt * Bebras Tasks
Within the context of a broader pre-post evaluation of courses, the CTt and a selection of three Bebras
Tasks were concurrently administered to a sample of
n=179 Spanish middle school students (Table 2). This
occurred only in pre-test condition, i.e. students without
prior formal experience in programming and before
starting with
Table 2. Sample of the first study
7th Grade 8th Grade Total
Boys 88 15 103
Girls 60 16 76
Total 148 31 179
The three Bebras Tasks4 were selected attending to
following criteria: the activities were aimed to students in
the range of 11-14 y/o, and focused in different aspects of
CT. In Table 3, the correlations between the CTt score
(which ranges from 0 to 28), the score in each of the
Bebras Tasks (0 to 1), and the overall Bebras score for all
of them (0 to 3) are shown. As the normality of the
variables is not assured [p-value(Zk-s)>0.05], non-parametric
correlations are calculated (Spearman’s r).
Table 3. Correlations CTt * Bebras Tasks (n=179)
Task #1:
‘Water Supply’
Task #2:
‘Fast Laundry’
Task #3:
Whole Set o
CTt .419** .042 .490** .519**
** p-value
< 0.01
As it can be seen, the CTt has a positive, moderate, and
statistically significant correlation (r=0.52) with the whole
set of Bebras Tasks (Figure 4); and with Tasks #1 (‘Water
Supply’, related to logic-binary structures) and #3
(‘Abacus’, related to abstraction, decomposition and
algorithmic thinking). No correlation is found between the
CTt and Task #2 (‘Fast Laundry’, related to parallelism),
which is consistent with the fact that CTt does not involve
3.2. Second study: CTt * Dr. Scratch
The context of this study is an 8-weeks coding course in
the Scratch platform, following the Creative Computing
(Brennan, Balch, & Chung, 2014) curriculum and
involving three Spanish middle schools, with a total sample
of n=71 students from the 8th Grade (33 boys and 38 girls).
Before starting with the course, the CTt was administered
to the students in pre-test conditions (i.e. students without
prior formal experience in programming). After the coding
course, students took a post-test with the CTt and teachers
selected the most advanced project of each student, which
was analyzed with Dr. Scratch. These three measures
offered us the possibility to analyze the convergent validity
of the CTt and Dr. Scratch in predictive terms (CTtpre-
4 The Bebras Tasks used in our research, and their specifications,
can be reviewed with more detail in:
test*Dr. Scratch) and in concurrent terms (CTtpost-test*Dr.
Scratch). As the normality of the variables is not assured
either [p-value(Zk-s)>0.05], non-parametric correlations
(Spearman’s r) are calculated again (Table 4).
Table 4. Correlations CTt * Dr. Scratch (n=71)
CTt Pre-test CTt Post-test
Dr. Scratch (‘mastery score’) .444** .526**
** p-value (r) < 0.01
As it can be seen, the CTt has a positive, moderate, and
statistically significant correlation with Dr. Scratch, both in
predictive (r=0.44) and concurrent terms (r=0.53, see
Figure 5). As expected, the concurrent value is slightly
higher because no time is intermediating among the tools.
Figure 4. Scatterplot CTt * Set of Bebras Tasks.
Figure 5. Scatterplot CTt post-test*Dr. Scratch.
Returning to our specific research questions, we have
found that the CTt is partially convergent with the Bebras
Tasks and with Dr. Scratch (0.4<r<0.7). As we expected,
the convergence is not total (r>0.7) because, although the
three tools are assessing the same psychological construct
(i.e. CT), they do it from different perspectives:
summative-aptitudinal (CTt), skill-transfer (Bebras Tasks),
and formative-iterative (Dr. Scratch). On the one hand,
these empirical findings imply that none of these tools
should be used instead of any of the others, as the different
scores are only moderately correlated (i.e. a measure from
one of the tools cannot substitute completely the others);
otherwise, the three tools might be combined in middle
school contexts. On the other hand, from a theoretical point
of view, the three tools seem to be complementary, as the
weaknesses of the ones are the strengths of the others.
The CTt has some strengths such as: it can be collectively
administered in pure pre-test conditions, so it can be used
in massive screenings and early detection of students with
high abilities (or special needs) for programming tasks; and
it can be utilized for collecting quantitative data in pre-post
evaluations of the efficacy of curricula aimed at fostering
CT. However, it also has some obvious weakness: it
provides a static and decontextualized assessment, and it is
strongly focused on computational ‘concepts’ (Brennan &
Resnick, 2012), ignoring ‘practices’ and ‘perspectives’.
As a counterbalance of the previous weakness, the Bebras
Tasks provides a naturalistic and significant assessment,
which is contextualized in ‘real-life’ problems that can be
used not only for measuring but also for teaching and
learning CT. However, the psychometric properties of
these tasks are still far of being demonstrated, and some of
them are at risk of being too tangential to the core of CT.
Finally, Dr. Scratch complements the CTt as the former
includes ‘computational practices’ (Brennan & Resnick,
2012) that the others do not, such as iterating, testing,
remixing or modularizing. However, Dr. Scratch lacks the
possibility of being used in pure pre-test conditions, as it is
applied to Scratch projects after the student has learnt at
least some coding for a certain time.
All of the above leads us to affirm the complementarity of
the CTt, Bebras Tasks and Dr. Scratch in middle school
settings; and the possibility to build up a “system of
assessments” (Grover, 2015; Grover et al., 2014) with all
of them. Furthermore, we find evidence to consider an
analogous progression between the Bloom’s (revised)
taxonomy of cognitive processes (Krathwohl, 2002), and
the three assessment tools considered along this paper
(Figure 6).
Regarding the convergent validity of the CTt, another
correlation value might have been found with Bebras Tasks
if the researchers had selected a different set of them; also,
another correlation value might have been found with Dr.
Scratch if the teachers had selected a different set of
projects. Further research should lead us to explore the
convergent validity of the CTt with other assessment tools.
For example, we are currently designing an investigation to
study the convergence between the CTt and the
Computational Thinking Scales (CTS) (Korkmaz et al.,
2017), and another one that will study the convergence
between Dr. Scratch and Ninja Code Village (Ota et al.,
2016). As a major result of these future series of studies, it
will be possible to depict a map with the convergence
values between the main CT assessment tools all around
the world, which ultimately would take CT to be well and
seriously considered as a psychological construct.
Figure 6. Bloom’s taxonomy and CT assessment tools.
Basawapatna, A., Koh, K. H., Repenning, A., Webb, D. C.,
& Marshall, K. S. (2011). Recognizing
computational thinking patterns. In Proceedings of
the 42nd ACM technical symposium on Computer
science education (pp. 245–250).
Bocconi, S., Chioccariello, A., Dettori, G., Ferrari, A.,
Engelhardt, K., & others. (2016). Developing
Computational Thinking in Compulsory Education-
Implications for policy and practice.
Brennan, K., Balch, C., & Chung, M. (2014). Creative
computing. Harvard Graduate School of Education.
Brennan, K., & Resnick, M. (2012). New frameworks for
studying and assessing the development of
computational thinking. In Proceedings of the 2012
annual meeting of the American Educational
Research Association, Vancouv., Canada (pp. 1–25).
Buffum, P. S., Lobene, E. V, Frankosky, M. H., ... , &
Lester, J. C. (2015). A practical guide to developing
and validating computer science knowledge
assessments with application to middle school. In
Proceedings of the 46th ACM Technical Symposium
on Computer Science Education (pp. 622–627).
Carlson, K. D., & Herdman, A. O. (2012). Understanding
the impact of convergent validity on research results.
Organizational Research Methods, 15(1), 17–32.
Dagiene, V., & Futschek, G. (2008). Bebras international
contest on informatics and computer literacy: Criteria
for good tasks. In International Conference on
Informatics in Secondary Schools-Evolution and
Perspectives (pp. 19–30).
Dagiene, V., & Stupuriene, G. (2015). Informatics
education based on solving attractive tasks through a
contest. KEYCIT 2014: Key Competencies in
Informatics and ICT, 7, 97.
Grover, S. (2011). Robotics and engineering for middle
and high school students to develop computational
thinking. In annual meeting of the American
Educational Research Association, New Orleans, LA.
Grover, S. (2015). “Systems of Assessments” for Deeper
Learning of Computational Thinking in K-12. In
Proceedings of the 2015 Annual Meeting of the
American Educational Research Association (pp.
Grover, S., Bienkowski, M., Niekrasz, J., & Hauswirth, M.
(2016). Assessing Problem-Solving Process At Scale.
In Proceedings of the Third (2016) ACM Conference
on Learning@ Scale (pp. 245–248).
Grover, S., Cooper, S., & Pea, R. (2014). Assessing
computational learning in K-12. In Proceedings of
the 2014 conference on Innovation & technology in
computer science education (pp. 57–62).
Grover, S., & Pea, R. (2013). Computational Thinking in
K--12 A Review of the State of the Field.
Educational Researcher, 42(1), 38–43.
Hubwieser, P., & Mühling, A. (2014). Playing PISA with
bebras. In Proceedings of the 9th Workshop in
Primary and Secondary Computing Education (pp.
Kalelioglu, F., Gülbahar, Y., & Kukul, V. (2016). A
Framework for Computational Thinking Based on a
Systematic Research Review. Baltic Journal of
Modern Computing, 4(3), 583.
Koh, K. H., Basawapatna, A., Bennett, V., & Repenning,
A. (2010). Towards the automatic recognition of
computational thinking for adaptive visual language
learning. In Visual Languages and Human-Centric
Computing, 2010 IEEE Symposium on (pp. 59–66).
Korkmaz, Ö., Çakir, R., & Özden, M. Y. (2017). A validity
and reliability study of the Computational Thinking
Scales (CTS). Computers in Human Behavior.
Krathwohl, D. R. (2002). A revision of Bloom’s taxonomy:
An overview. Theory into Practice, 41(4), 212–218.
Lye, S. Y., & Koh, J. H. L. (2014). Review on teaching
and learning of computational thinking through
programming: What is next for K-12? Computers in
Human Behavior, 41, 51–61.
Meerbaum-Salant, O., Armoni, M., & Ben-Ari, M. (2013).
Learning computer science concepts with scratch.
Computer Science Education, 23(3), 239–264.
Moreno-León, J., & Robles, G. (2015). Dr. Scratch: A web
tool to automatically evaluate Scratch projects. In
Proceedings of the Workshop in Primary and
Secondary Computing Education (pp. 132–133).
Moreno-León, J., Robles, G., & Román-González, M.
(2015). Dr. Scratch: automatic analysis of scratch
projects to assess and foster computational thinking.
RED. Revista de Educación a Distancia, 15(46).
Moreno-León, J., Robles, G., & Román-González, M.
(2016). Comparing computational thinking
development assessment scores with software
complexity metrics. In Global Engineering
Education Conference, 2016 IEEE (pp. 1040–1045).
Mühling, A., Ruf, A., & Hubwieser, P. (2015). Design and
first results of a psychometric test for measuring
basic programming abilities. In Proceedings of the
Workshop in Primary and Secondary Computing
Education (pp. 2–10).
Ota, G., Morimoto, Y., & Kato, H. (2016). Ninja code
village for scratch: Function samples/function
analyser and automatic assessment of computational
thinking concepts. In Visual Languages and Human-
Centric Computing (VL/HCC), 2016 IEEE
Symposium on (pp. 238–239).
Román-González, M. (2015). Computational Thinking
Test: Design Guidelines and Content Validation. In
Proceedings of the 7th Annual International
Conference on Education and New Learning
Technologies (EDULEARN 2015) (pp. 2436–2444).
Román-González, M., Pérez-González, J.-C., & Jiménez-
Fernández, C. (2016). Which cognitive abilities
underlie computational thinking? Criterion validity
of the Computational Thinking Test. Computers in
Human Behavior.
Román-González, M., Pérez-González, J.-C., Moreno-
León, J., & Robles, G. (2016). Does Computational
Thinking Correlate with Personality?: The Non-
cognitive Side of Computational Thinking. In
Proceedings of the Fourth International Conference
on Technological Ecosystems for Enhancing
Multiculturality (pp. 51–58).
Seehorn, D., Carey, S., Fuschetto, B., Lee, I., Moix, D.,
O’Grady-Cunniff, D., … Verno, A. (2011). CSTA
K--12 Computer Science Standards: Revised 2011.
Weintrop, D., & Wilensky, U. (2015). Using Commutative
Assessments to Compare Conceptual Understanding
in Blocks-based and Text-based Programs. In ICER
(Vol. 15, pp. 101–110).
Zur-Bargury, I., Pârv, B., & Lanzberg, D. (2013). A
nationwide exam as a tool for improving a new
curriculum. In Proceedings of the 18th ACM
conference on Innovation and technology in
computer science education (pp. 267–272).
... provided a test battery for students aged 8-19, aiming at assessing how they solve problems based on reallife settings. However, the tests focused on informatics (i.e., information, computing, and data processing; Bilbao et al., 2014), which tended to be peripheral elements to CT (Román-González et al., 2017a). Additionally, although the tests have been widely applied in research studies (e.g., Bell et al., 2011;Chiazzese et al., 2019;Dolgopolovas et al., 2016), information about the test validation processes was limited (Dagiene & Stupuriene, 2016). ...
... Finally, in terms of the threedimensional framework proposed by Brennan and Resnick (2012), the focus of the test was mainly on CT concepts; the other dimensions (i.e., CT practices, and CT perspective), have not yet been fully addressed. This is also a gap noted in the literature, as few assessment tools for CT practices and perspectives have been developed (Román-González et al., 2017a), particularly for young students. We therefore suggest future research to explore how CT practices and perspectives can be properly assessed among young children. ...
Computational thinking (CT) has permeated primary and early childhood education in recent years. Despite the extensive effort in CT learning initiatives, few age-appropriate assessment tools targeting young children have been developed. In this study, we proposed Computational Thinking Test for Lower Primary (CTtLP), which was designed for lower primary school students (aged 6–10). Based on the evidence-centred design approach, a set of constructed-response items that are independent of programming platforms was developed. To validate the test, content validation was first performed via expert review and cognitive interviews, and refinements were made based on the comments. Then, a large-scale field test was administered with a sample of 1st–3rd graders (N = 1225), and the data was used for psychometric analysis based on both classical test theory (CTT) and item response theory (IRT). The CTT results provided robust criterion validity, internal consistency, and test–retest reliability values. Regarding IRT results, a three-parameter logistic model was selected according to the item fit indices, based on which fair item parameters and test information reliability were generated. Overall, the test items and the whole scale showed proper fit, suggesting that CTtLP was a suitable test for the target group. Analyses of the test performance were then put forward. Results reported that students’ performance improved with grade level, and no gender difference was detected. Based on the test responses, we also identified children’s challenges in understanding CT constructs, indicating that students tended to have difficulty in understanding loop control and executing multiple directions. The study provides a rigorously validated diagnostic test for measuring CT acquisition in lower primary school students and demonstrates a replicable design and validation process for future assessment practices, and findings on the difficulties children faced in CT conceptual understanding could shed light on CT primary and early childhood education.
... To the default analytical rubric of the application, the researchers added one more criterion. Assessment as a term refers to the judgement of students' performance in relation to specific goals, and a formative direction requires (a) feedback and (b) an indication of how the work can be improved to reach the required standard [14][15][16][17]. Using the term "feedback" we adopted the definition of Ramaprasad [18,19], who describes feedback as the distance between the actual and reference levels of the system parameters, and this is used to alter the gap in some way. ...
... This is why the development of CT skills is very important, and this research aims to boost this area. Integrating assessment with instruction could increase student engagement and improve learning outcomes [15,17,[57][58][59]. The present study focused on a student assessment procedure by adopting the Descriptive Assessment approach and creating an application consisting of rubrics with the addition of one CT criterion. ...
Full-text available
Computational Thinking (CT) has emerged as an umbrella term that refers to a broad set of problem-solving skills. New generations must conquer these skills in order to thrive in a computer-based world. Teachers, as agents of change, must also be familiar, trained and well-prepared in order to train children in CT. This paper examines STEM (Science, Technology, Engineering and Mathematics) and non-STEM teachers’ attitudes and readiness to adopt and utilize Computational Thinking concepts in the curriculum. The research was conducted through a descriptive assessment of students using thematically related criteria (rubrics) and a criterion on Computational Thinking usage and utilization. Fifteen teachers (n = 15) were invited to a focus group discussion in which they were asked to complete a questionnaire and, subsequently, to openly analyze their answers. The results show that the majority of teachers used computational thinking as an assessment criterion and stated that they did not face any significant problems with it. At the end of the focus group questions, they concluded that they consider participation in a training program regarding the concept and principles of computational thinking and the way they could integrate into the educational process necessary. Teachers expressed their confidence in using a set of criteria (rubric) to make students’ assessments more effective and stated that they can easily use at least one criterion for Computational Thinking.
... However, articles addressing CT through programming in context are scarce in the current literature (Grover & Pea, 2013). It is due to the increased frequency with which students taking CS courses in college must demonstrate their mastery of programming concepts on standardized tests (Brackmann et al., 2017;Katai & Toth, 2010;Román-González, Moreno-León et al., 2017;Román-González, Pérez-González, et al., 2017). Consequently, this study aims to scour the literature for previously published empirical studies promoting student participation in higher education's curriculum-based programming. ...
In today's world, the ability to think computationally is essential. The skillset expected of a computer scientist is no longer solely based on the old stereotype but also a crucial skill for adapting to the future. This perspective presents a new educational challenge for society. Everyone must have a positive attitude toward understanding and using these skills daily. One thousand two hundred seven documents about computational thinking (CT) may be found while searching the Scopus database from 1987 to 2023. Data from Scopus were analyzed using VOSviewer software. This study educates academics by delving into the fundamentals of what is known about the CT of visual and quantitative research skills. This approach allows for a more in-depth look at the literature and a better understanding of the research gap in CT. This bibliometrics analysis demonstrates that (1) research on CT is common to all sciences and will develop in the future; (2) the majority of articles on CT are published in journals in the fields of education, engineering, science and technology, computing and the social sciences; (3) the United States is the most dominant country in CT publications with a variety of collaborations; (4) keywords that often appear are CT, engineering, education, and mathematics, and (5) research on CT has developed significantly since 2013. Our investigation reveals the beginnings and progression of the academic field of research into CT. Furthermore, it offers a road map indicating how this study area will expand in the coming years.
... The BCTt is only one part of the set of three complementary assessment tools as researched by Román-González et al. (2017a), but it is sufficient to trial the direction of newly created learning materials. Given the relatively short amount of available time for the intervention higher levels of Bloom's (revised) taxonomy cannot be expected to be achieved. ...
Full-text available
This a Frontiers e-book, comprising all the articles featured in the Research Topic, entitled 'Stem, steam, computational thinking and coding: Evidence based research and practice in children’s development It can be also found online at:
Background and Objective . Teacher assessment research suggests that teachers have good conceptual understanding of CT. However, to model CT based problem-solving in their classrooms, teachers need to develop the ability to recognize when and how to apply CT skills. Does existing professional development (PD) equip teachers to know when and how to apply CT skills? What factors should PD providers consider while developing trainings for CT application skills? Method . This retrospective observational study used a binomial regression model to determine what factors predict teachers’ probability of performing well on a CT application skills test. Participants . Participants of this study were 129 in-service K-12 teachers from a community of practice in India. Findings . Results show that teachers who have received at least one CT training, who have a higher teaching experience, and are currently teaching CT will have a higher probability of applying CT skills correctly to problems irrespective of the subject they teach and their educational backgrounds. However, receiving higher number of CT PD trainings was a negative predictor of teachers’ performance. Implications . Implications for school administrators, professional development providers, and researchers are discussed. Teachers need ample opportunity to teach CT in their teaching schedules. Continuous professional development does not necessarily result in improved CT application skills unless careful consideration is given to the pedagogies used and to the resolution of misconceptions that teachers may have developed in prior training. Mixing plugged and unplugged pedagogical approaches may be beneficial to encourage transfer of CT application skills across different types of problems. Lastly, there is a need to develop valid and reliable instruments that measure CT application skills of teachers.
Context In many countries, it is now important to integrate learning-oriented views to foster computational thinking (CT) in the classroom. This has inspired ideas for new lesson plans, instructional strategies, teacher guidance, and, most importantly, new approaches to grading these skills. Aim: This article presents the results of a systematic review initially focused on identifying the various ways of assessing CT in school and their relationship to relevant CT skills. Method: We conducted a systematic review of the literature to assess CT in schools. This review applied a semi-automatic search for specific terms within the selected papers. These terms came from the analysis of several established definitions of CT. Results: We present a set of the most representative competencies and concepts developed in various experiences, in which the main topic is the assessment of CT, as well as some that have not been developed and that may be the subject of future works. Conclusions: The evaluation of CT in the school requires multiple approaches; it is a challenge to have a single method or strategy to evaluate everything that CT implies.
In this paper, we study the relationship between the content and goals of curriculum revisions toward the integration of computational thinking (CT) in compulsory schools in Denmark, Sweden, and England. Our analyses build on data consisting of a combination of official documents such as new curricula, white papers, and implementation strategies and interviews of experts who are either highly knowledgeable about or were involved in developing the curriculum revisions in these three countries. Our study found that there are strong connections between the CT content data practices and goal competitiveness in England. In Sweden, we found that the relationship between data practices and the goal of competitiveness is strongest. In Denmark, we found that the CT content codes related to data practices, modeling and simulation practices, and computational problem solving practices were all strongly represented, but all were weakly related to policy goals.KeywordsComputational thinkingCurriculum policyComparative educational research
Full-text available
Computer Literacy is a reality in current educational legislation. Within the STEAM competence approach, music education and the development of Computational Thinking (CT) are located in this discipline. In this work, unplugged musical activities are designed based on the Bebras challenges, and their effectiveness is evaluated in terms of CT development in students. A quasi-experimental study was carried out with pre-post test measures in a group of 220 Primary School students (experimental, N = 170; control, N = 50). The experimental group performed three blocks of unplugged musical activities. Computational Thinking Test using Bebras Problems (Lockwood and Moone, 2018) was used as an instrument. Variables of gender, course, environment and academic ability were taken into account. The results show a significant increase in CT in the experimental group at a general level and in the Bebras activities of medium and advanced levels of difficulty. Schoolchildren from the rural context showed higher scores in CT development compared to the urban one. No significant differences are observed in the rest of the variables. Finally, a higher level of correct answers was observed in "easy" activities, greater completion of "medium" activities and a decrease in both in "advanced" activities. La Alfabetización Computacional es una realidad en las legislaciones educativas actuales. Dentro del enfoque competencial STEAM, se ubica la educación musical y el desarrollo del Pensamiento Computacional (PC) en esta disciplina. En este trabajo se diseñan actividades musicales desenchufadas desde los desafíos de Bebras y se evalúa su eficacia en cuanto al desarrollo del PC en estudiantes. Se realizó un estudio cuasi-experimental con medidas pre-post test en un grupo de 220 escolares de Educación Primaria (experimental, N = 170; control, N = 50). El grupo experimental realizó tres bloques de actividades musicales desenchufadas. Como instrumento se utilizó Computational Thinking Test using Bebras Problems (Lockwood y Moone, 2018). Se tuvieron en cuenta variables de género, curso, entorno y capacidad académica. Los resultados muestran aumento significativo del PC en el grupo experimental a nivel general y en las actividades de Bebras de niveles medio y avanzado de dificultad. Los escolares de contexto rural mostraron puntuaciones mayores en el desarrollo del PC en comparación con el urbano. No se observan diferencias significativas en el resto de variables. Por último, se observó mayor nivel de respuestas correctas en actividades “fáciles”, mayor cumplimentación de actividades “medias” y descenso de ambas en actividades “avanzadas”.
Computational thinking is a type of problem-solving ability using logical thinking that students do with regular steps. This cognitive ability is one of the important skills in supporting students with mathematical concepts. However, the advantages of computational thinking do not seem to be paid much attention to by education, especially in Indonesia. This is because the learning approach does not emphasize the positive aspects that can improve students' computational thinking. As a result, the average computational thinking ability of students is low. This type of research uses an experimental method of pretest-posttest control group design. The population involved was class XII students at MA Daruttauhid Malang, which consisted of 22 students in the experimental class, and 24 students in the control class. The research data is in the form of pre-test scores before being given realistic mathematics learning treatment, and post-test score data. The results obtained showed that the computational thinking ability of students in the experimental class was higher than in the control class. To be clear, this fact is measured by calculating the N-Gain scores of students in the experimental class with a value of 0.7 (high category), and the N-Gain scores of control class students with a value of 0.5 (medium category). Abstrak Berpikir komputasional adalah jenis kemampuan pemecahan masalah menggunakan logika berpikir yang dilakukan siswa dengan langkah yang teratur. Kemampuan kognitif tersebut menjadi salah satu keterampilan penting dalam mendukung siswa terhadap konsep matematika. Namun keunggulan dari pemikiran komputasional, nampaknya tidak terlalu diperhatikan oleh Pendidikan, khususnya di Indonesia. Hal ini karena pendekatan pembelajaran kurang menekankan pada aspek positif yang dapat memberikan peningkatan siswa dalam berpikir secara komputasional. Akibatnya secara rata-rata kemampuan berpikir komputasional siswa menjadi rendah. Jenis penelitian ini menggunakan metode eksperimen jenis pretest-posttest control group design. Populasi yang terlibat ialah siswa kelas XII MA Daruttauhid Malang yang terdiri atas sebanyak 22 siswa pada kelas eksperimen, dan 24 siswa kelas kontrol. Data penelitian berupa skor pretest sebelum diberikan perlakuan pembelajaran matematika realistik, dan data skor posttest. Hasil penelitian yang diperoleh, menunjukkan bahwa kemampuan berpikir komputasional siswa pada kelas eksperimen lebih tinggi dibandingkan kelas kontrol. Untuk lebih jelasnya, fakta ini diukur dengan menghitung skor N-Gain siswa pada kelas eksperimen dengan nilai 0,7 (kategori tinggi), dan skor N-Gain siswa kelas kontrol bernilai 0,5 (kategori sedang).
Full-text available
In the past decade, Computational Thinking (CT) and related concepts (e.g. coding, programing, algorithmic thinking) have received increasing attention in the educational field. This has given rise to a large amount of academic and grey literature, and also numerous public and private implementation initiatives. Despite this widespread interest, successful CT integration in compulsory education still faces unresolved issues and challenges. This report provides a comprehensive overview of CT skills for schoolchildren, encompassing recent research findings and initiatives at grassroots and policy levels. It also offers a better understanding of the core concepts and attributes of CT and its potential for compulsory education. The study adopts a mostly qualitative approach that comprises extensive desk research, a survey of Ministries of Education and semi-structured interviews, which provide insights from experts, practitioners and policy makers. The report discusses the most significant CT developments for compulsory education in Europe and provides a comprehensive synthesis of evidence, including implications for policy and practice.
Conference Paper
Full-text available Computational thinking (CT) is being considered as a key set of problem-solving skills to be acquired by the new generations of digital citizens and workers in order to thrive in a computer-based world. However, from a psychometric point of view, CT is still a poorly defined psychological construct: there is no full consensus on a formal definition of CT or how to measure it; and its correlations with other psychological constructs, whether cognitive or non-cognitive, have not been completely established. In response to the latter, this paper aims to study specifically the correlations between CT and the several dimensions from the ‘Big Five’ model of human personality: Conscientiousness, Openness to Experience, Extraversion, Agreeableness, and Neuroticism. To do so, the Computational Thinking Test (CTt) and the Big Five Questionnaire-Children version (BFQ-C) are administered on a sample (n = 99) of Spanish students from 5th to 10th grade. Results show statistically significant correlations between CT and: Openness to Experience (r = 0.41), Extraversion (r = 0.30), and Conscientiousness (r = 0.27). These results are partially consistent with the literature about the links between cognitive and personality variables, and corroborate the existence of a non-cognitive side of CT. Hence, educational interventions aimed at fostering CT should take into account these non-cognitive issues in order to be comprehensive and successful.
Full-text available
Computational Thinking (CT) has become popular in recent years and has been recognised as an essential skill for all, as members of the digital age. Many researchers have tried to define CT and have conducted studies about this topic. However, CT literature is at an early stage of maturity, and is far from either explaining what CT is, or how to teach and assess this skill. In the light of this state of affairs, the purpose of this study is to examine the purpose, target population, theoretical basis, definition, scope, type and employed research design of selected papers in the literature that have focused on computational thinking, and to provide a framework about the notion, scope and elements of CT. In order to reveal the literature and create the framework for computational thinking, an inductive qualitative content analysis was conducted on 125 papers about CT, selected according to pre-defined criteria from six different databases and digital libraries. According to the results, the main topics covered in the papers composed of activities (computerised or unplugged) that promote CT in the curriculum. The targeted population of the papers was mainly K-12. Gamed-based learning and constructivism were the main theories covered as the basis for CT papers. Most of the papers were written for academic conferences and mainly composed of personal views about CT. The study also identified the most commonly used words in the definitions and scope of CT, which in turn formed the framework of CT. The findings obtained in this study may not only be useful in the exploration of research topics in CT and the identification of CT in the literature, but also support those who need guidance for developing tasks or programs about computational thinking and informatics.
Full-text available
The paper discusses the issue of supporting informatics (computer science) education through competitions for lower and upper secondary school students (8–19 years old). Competitions play an important role for learners as a source of inspiration, innovation, and attraction. Running contests in informatics for school students for many years, we have noticed that the students consider the contest experience very engaging and exciting as well as a learning experience. A contest is an excellent instrument to involve students in problem solving activities. An overview of infrastructure and development of an informatics contest from international level to the national one (the Bebras contest on informatics and computer fluency, originated in Lithuania) is presented. The performance of Bebras contests in 23 countries during the last 10 years showed an unexpected and unusually high acceptance by school students and teachers. Many thousands of students participated and got a valuable input in addition to their regular informatics lectures at school. In the paper, the main attention is paid to the developed tasks and analysis of students’ task solving results in Lithuania.
Conference Paper
Full-text available
We live in a society full of digital and programmable objects. In this context, being code-literate involves an inescapable requirement for any citizen of the XXI century. We believe that a person is code-literate when the ability to read and write in the language of computers and to think computationally has been developed. So it is not surprising that computational thinking (CT) is being located at the focus of educational innovation as a set of problem solving skills to be acquired by new generations of students. However, we still lack international consensus on a definition of CT, nor a clear idea of how to incorporate CT to our education systems at various levels. Similarly, there is a striking gap about how to measure and assess CT. In reply, this paper presents the design of a Computational Thinking Test aimed at Spanish students between 12 and 13 years old (grades K-7 & K-8): we describe the guidelines on which whole test and each of its items have been designed, as well as the content validation process through expert's judgment procedure. Through this process, the initial version of the test (40 items length) was depurated to a 28 items version, which is currently being applied to the target population. Finally, possible limitations of the test and possible concurrency of the same with other international evidence on computational thinking assessment are discussed.
It is possible to define Computational Thinking briefly as having the knowledge, skill and attitudes necessary to be able to use the computers in the solution of the life problems for production purposes. In this study, a scale has been developed for the purpose of determining the levels of computational thinking skills (CTS) of the students. CTS is a five-point likert type scale and consists of 29 items that could be collected under five factors. The study group of this work consists of 726 students educated at the levels of associate degree and undergraduate degree with formal education in Amasya University for the first application. For the second application 580 students who were educated in pedagogical formation education via distance education in Amasya University. The validity and reliability of the scale have been studied by conducting exploratory factor analysis, confirmatory factor analysis, item distinctiveness analyses, internal consistency coefficients and constancy analyses. As a result of the conducted analyses, it has been concluded that the scale is a valid and reliable measurement tool that could measure the computational thinking skills of the students. In addition; the digital age individuals are expected to have the computational thinking skill, and at what degree they have these skills, the revelation of whether the levels they have are sufficient or not are a requirement. Within this frame, it could be said that the scale could make significant contributions to the literature.
Conference Paper
Ninja Code Village is a comprehensive learning-support environment for the Scratch, visual programming language. It provides more than 60 sample functions that are commonly used in Scratch projects, and analyses automatically which functions are used in a project in order to foster students' competencies in abstraction, modelling, and decomposition. It also provides automatic assessment of computational thinking concepts such as conditional statements, loops, data, and parallelism in order to develop students' programming skills.
Computational thinking (CT) is being located at the focus of educational innovation, as a set of problem-solving skills that must be acquired by the new generations of students to thrive in a digital world full of objects driven by software. However, there is still no consensus on a CT definition or how to measure it. In response, we attempt to address both issues from a psychometric approach. On the one hand, a Computational Thinking Test (CTt) is administered on a sample of 1,251 Spanish students from 5th to 10th grade, so its descriptive statistics and reliability are reported in this paper. On the second hand, the criterion validity of the CTt is studied with respect to other standardized psychological tests: the Primary Mental Abilities (PMA) battery, and the RP30 problem-solving test. Thus, it is intended to provide a new instrument for CT measurement and additionally give evidence of the nature of CT through its associations with key related psychological constructs. Results show statistically significant correlations at least moderately intense between CT and: spatial ability (r = 0.44), reasoning ability (r = 0.44), and problem-solving ability (r = 0.67). These results are consistent with recent theoretical proposals linking CT to some components of the Cattel-Horn-Carroll (CHC) model of intelligence, and corroborate the conceptualization of CT as a problem-solving ability. Available at:
Conference Paper
Authentic problem solving tasks in digital environments are often open-ended with ill-defined pathways to a goal state. Scaffolds and formative feedback during this process help learners develop the requisite skills and understanding, but require assessing the problem- solving process. This paper describes a hybrid approach to assessing process at scale in the context of the use of computational thinking practices during programming. Our approach combines hypothesis- driven analysis, using an evidence-centered design framework, with discovery-driven data analytics. We report on work-in-progress involving novices and expert programmers working on Blockly games.
Scratch is a visual programming environment that is widely used by young people. We investigated if Scratch can be used to teach concepts of computer science (CS). We developed learning materials for middle-school students that were designed according to the constructionist philosophy of Scratch and evaluated them in a few schools during two years. Tests were constructed based upon a novel combination of the revised Bloom taxonomy and the Structure of the Observed Learning Scratch is a visual programming environment that is widely used by young people. We investigated if Scratch can be used to teach concepts of computer science (CS). We developed learning materials for middle-school students that were designed according to the constructionist philosophy of Scratch and evaluated them in a few schools during two years. Tests were constructed based upon a novel combination of the revised Bloom taxonomy and the Structure of the Observed Learning Outcome taxonomy. These instruments were augmented with qualitative tools, such as observations and interviews. The results showed that students could successfully learn important concepts of CS, although there were problems with some concepts such as repeated execution, variables, and concurrency. We believe that these problems can be overcome by modifications to the teaching process that we suggest. Outcome taxonomy. These instruments were augmented with qualitative tools, such as observations and interviews. The results showed that students could successfully learn important concepts of CS, although there were problems with some concepts such as repeated execution, variables, and concurrency. We believe that these problems can be overcome by modifications to the teaching process that we suggest.