ArticlePDF Available

Abstract and Figures

Peer assessment has been the subject of considerable research interest over the last three decades, with numerous educational researchers advocating for the integration of peer assessment into schools and instructional practice. Research synthesis in this area has, however, largely relied on narrative reviews to evaluate the efficacy of peer assessment. Here, we present a meta-analysis (54 studies, k = 141) of experimental and quasi-experimental studies that evaluated the effect of peer assessment on academic performance in primary, secondary, or tertiary students across subjects and domains. An overall small to medium effect of peer assessment on academic performance was found (g = 0.31, p < .001). The results suggest that peer assessment improves academic performance compared with no assessment (g = 0.31, p = .004) and teacher assessment (g = 0.28, p = .007), but was not significantly different in its effect from self-assessment (g = 0.23, p = .209). Additionally, meta-regressions examined the moderating effects of several feedback and educational characteristics (e.g., online vs offline, frequency, education level). Results suggested that the effectiveness of peer assessment was remarkably robust across a wide range of contexts. These findings provide support for peer assessment as a formative practice and suggest several implications for the implementation of peer assessment into the classroom.
Content may be subject to copyright.
The Impact of Peer Assessment on Academic
Performance: A Meta-analysis of Control Group Studies
Kit S. Double
&Joshua A. McGrane
&Therese N. Hopfenbeck
#The Author(s) 2019
Peer assessment has been the subject of considerable research interest over the last three
decades, with numerous educational researchers advocating for the integration of peer
assessment into schools and instructional practice. Research synthesis in this area has,
however, largely relied on narrative reviews to evaluate the efficacy of peer assessment.
Here, we present a meta-analysis (54 studies, k= 141) of experimental and quasi-
experimental studies that evaluated the effect of peer assessment on academic performance
in primary, secondary, or tertiary students across subjects and domains. An overall small to
medium effect of peer assessment on academic performance was found (g=0.31,p< .001).
The results suggest that peer assessment improves academic performance compared with no
assessment (g=0.31,p= .004) and teacher assessment (g= 0.28, p= .007), but was not
significantly different in its effect from self-assessment (g=0.23,p= .209). Additionally,
meta-regressions examined the moderating effects of several feedback and educational
characteristics (e.g., online vs offline, frequency, education level). Results suggested that
the effectiveness of peer assessment was remarkably robust across a wide range of contexts.
These findings provide support for peer assessment as a formative practice and suggest
several implications for the implementation of peer assessment into the classroom.
Keywords Peer assessment .Meta-analysis .Experimental design .Effect size .Feedback .
Formative assessment
Feedback is often regarded as a central component of educational practice and crucial to students
learning and development (Fyfe & Rittle-Johnson, 2016; Hattie and Timperley 2007; Hays, Kornell,
&Bjork,2010;Paulus,1999). Peer assessment has been identified as one method for delivering
Educational Psychology Review
Electronic supplementary material The online version of this article (
09510-3) contains supplementary material, which is available to authorized users.
*Kit S. Double
Department of Education, University of Oxford, Oxford, England
(2020) 32:481509
Published online: 10 December 2019
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
feedback efficiently and effectively to learners (Topping 1998; van Zundert et al. 2010). The use of
students to generate feedback about the performance of their peers is referred to in the literature using
various terms, including peer assessment, peer feedback, peer evaluation, and peer grading. In this
article, we adopt the term peer assessment, as it more generally refers to the method of peers
assessing or being assessed by each other, whereas the term feedback is used when we refer to the
actual content or quality of the information exchanged between peers. This feedback can be
delivered in a variety of forms including written comments, grading, or verbal feedback (Topping
1998). Importantly, by performing both the role of assessor and being assessed themselves, students
learning can potentially benefit more than if they are just assessed (Reinholz 2016).
Peer assessments tend to be highly correlated with teacher assessments of the same students
(Falchikov and Goldfinch 2000;Lietal.2016; Sanchez et al. 2017). However, in addition to
establishing comparability between teacher and peer assessment scores, it is important to deter-
mine whether peer assessment also has a positive effect on future academic performance. Several
narrative reviews have argued for the positive formative effects of peer assessment (e.g., Black
and Wiliam 1998a; Topping 1998; van Zundert et al. 2010) and have additionally identified a
number of potentially important moderators for the effect of peer assessment. This meta-analysis
will build upon these reviews and provide quantitative evaluations for some of the instructional
features identified in these narrative reviews by utilising them as moderators within our analysis.
Evaluating the Evidence for Peer Assessment
Empirical Studies
Despite the optimism surrounding peer assessment as a formative practice, there are relatively few
control group studies that evaluate the effect of peer assessment on academic performance (Flórez and
Sammons 2013; Strijbos and Sluijsmans 2010). Most studies on peer assessment have tended to focus
on either studentsor teacherssubjective perceptions of the practice rather than its effect on academic
performance (e.g., Brown et al. 2009; Young and Jackman 2014). Moreover, interventions involving
peer assessment often confound the effect of peer assessment with other assessment practices that are
theoretically related under the umbrella of formative assessment (Black and Wiliam 2009). For
instance, Wiliam et al. (2004) reported a mean effect size of .32 in favor of a formative assessment
intervention but they were unable to determine the unique contribution of peer assessment to students
achievement, as it was one of more than 15 assessment practices included in the intervention.
However, as shown in Fig. 1, there has been a sharp increase in the number of studies related to
peer assessment, with over 75% of relevant studies published in the last decade. Although it is still
far from being the dominant outcome measure in research on formative practices, many of these
recent studies have examined the effect of peer assessment on objective measures of academic
performance (e.g., Gielen et al. 2010a; Liu et al. 2016;Wangetal.2014a). The number of studies of
peer assessment using control group designs also appears to be increasing in frequency (e.g., van
Ginkel et al. 2017;Wangetal.2017). These studies have typically compared the formative effect of
peer assessment with either teacher assessment (e.g., Chaney and Ingraham 2009;Sippeland
Jackson 2015;vanGinkeletal.2017) or no assessment conditions (e.g., Kamp et al. 2014;L.Li
and Steckelberg 2004; Schonrock-Adema et al. 2007). Given the increase in peer assessment
research, and in particular experimental research, it seems pertinent to synthesise this new body of
research, as it provides a basis for critically evaluating the overall effectiveness of peer assessment
and its moderators.
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Previous Reviews
Efforts to synthesise peer assessment research have largely been limited to narrative reviews, which
have made very strong claims regarding the efficacy of peer assessment. For example, in a review of
peer assessment with tertiary students, Topping (1998) argued that the effects of peer assessment are,
as good as or better than the effects of teacher assessment(p. 249). Similarly, in a review on peer
and self-assessment with tertiary students, Dochy et al. (1999) concluded that peer assessment can
have a positive effect on learning but may be hampered by social factors such as friendships,
collusion, and perceived fairness. Reviews into peer assessment have also tended to focus on
determining the accuracy of peer assessments, which is typically established by the correlation
between peer and teacher assessments for the same performances. High correlations have been
observed between peer and teacher assessments in three meta-analyses to date (r= .69, .63, and .68
respectively; Falchikov and Goldfinch 2000;H.Lietal.2016; Sanchez et al. 2017). Given that peer
assessment is often advocated as a formative practice (e.g., Black and Wiliam 1998a; Topping
1995 2000 2005 2010 2015 2020
Ye a r
Fig. 1 Number of records returned by year. The following search terms were used: peer assessmentor peer
grading or peer evaluationor peer feedback. Data were collated by searching Web of Science (www. for the following keywords: peer assessmentor peer gradingor peer evaluationor
peer feedbackand categorising by year
Educational Psychology Review (2020) 32:481509 483
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1998), it is important to expand on these correlational meta-analyses to examine the formative effect
that peer assessment has on academic performance.
In addition to examining the correlation between peer and teacher grading, Sanchez et al.
(2017) additionally performed a meta-analysis on the formative effect of peer grading (i.e., a
numerical or letter grade was provided to a student by their peer) in intervention studies. They
found that there was a significant positive effect of peer grading on academic performance for
primary and secondary (grades 3 to 12) students (g= .29). However, it is unclear whether their
findings would generalise to other forms of peer feedback (e.g., written or verbal feedback)
and to tertiary students, both of which we will evaluate in the current meta-analysis.
Moderators of the Effectiveness of Peer Assessment
Theoretical frameworks of peer assessment propose that it is beneficial in at least two respects.
Firstly, peer assessment allows students to critically engage with the assessed material, to compare
and contrast performance with their peers, and to identify gaps or errors in their own knowledge
(Topping 1998). In addition, peer assessment may improve the communication of feedback, as peers
may use similar and more accessible language, as well as reduce negative feelings of being evaluated
by an authority figure (Liu et al. 2016). However, the efficacy of peer assessment, like traditional
feedback, is likely to be contingent on a range of factors including characteristics of the learning
environment, the student, and the assessment itself (Kluger and DeNisi 1996; Ossenberg et al.
2018). Some of the characteristics that have been proposed to moderate the efficacy of feedback
include anonymity (e.g., Rotsaert et al. 2018;YuandLiu2009), scaffolding (e.g., Panadero and
Jonsson 2013), quality and timing of the feedback (Diab 2011), and elaboration (e.g., Gielen et al.
2010b). Drawing on the previously mentioned narrative reviews and empirical evidence, we now
briefly outline the evidence for each of the included theoretical moderators.
It is somewhat surprising that most studies that examine the effect of peer assessment tend to only
assess the impact on the assessee and not the assessor (van Popta et al. 2017). Assessing may confer
several distinct advantages such as drawing comparisons with peerswork and increased familiarity
with evaluative criteria. Several studies have compared the effect of assessing with being assessed.
Lundstrom and Baker (2009) found that assessing a peers written work was more beneficial for
their own writing than being assessed by a peer. Meanwhile, Graner (1987) found that students who
were receiving feedback from a peer and acted as an assessor did not perform better than students
who acted as an assessor but did not receive peer feedback. Reviewing peerswork is also likely to
help students become better reviewers of their own work and to revise and improve their own work
(Rollinson 2005). While, in practice, students will most often act as both assessor and assessee
during peer assessment, it is useful to gain a greater insight into the relative impact of performing
each of these roles for both practical reasons and to help determine the mechanisms by which peer
assessment improves academic performance.
Peer Assessment Type
The characteristics of peer assessment vary greatly both in practice and within the research literature.
Because meta-analysis is unable to capture all of the nuanced dimensions that determine the type,
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
intensity, and quality of peer assessment, we focus on distinguishing between what we regard as the
most prevalent types of peer assessment in the literature: grading, peer dialogs, and written
assessment. Each of these peer assessment types is widely used in the classroom and often in
various combinations (e.g., written qualitative feedback in combination with a numerical grade).
While these assessment types differ substantially in terms of their cognitive complexity and
comprehensiveness, each has shown at least some evidence of impactive academic performance
(e.g., Sanchez et al. 2017; Smith et al. 2009; Topping 2009).
Peer assessment is often implemented in conjunction with some form of scaffolding, for example,
rubrics, and scoring scripts. Scaffolding has been shown to improve both the quality peer assessment
and increase the amount of feedback assessors provide (Peters, Körndle & Narciss, 2018). Peer
assessment has also been shown to be more accurate when rubrics are utilised. For example,
Panadero, Romero, & Strijbos (2013) found that students were less likely to overscore their peers.
Increasingly, peer assessment has been performed online due in part to the growth in online learning
activities as well as the ease by which peer assessment can be implemented online (van Popta et al.
2017). Conducting peer assessment online can significantly reduce the logistical burden of
implementing peer assessment (e.g., Tannacito and Tuzi 2002). Several studies have shown that peer
assessment can effectively be carried out online (e.g., Hsu 2016;LiandGao2016). Van Popta et al.
(2017) argue that the cognitive processes involved in peer assessment, such as evaluating, explaining,
and suggesting, similarly play out in online and offline environments. However, the social processes
involved in peer assessment are likely to substantively differ between online and offline peer
assessment (e.g., collaborating, discussing), and it is unclear whether this might limit the benefits
of peer assessment through one or the other medium. To the authorsknowledge, no prior studies
have compared the effects of online and offline peer assessment on academic performance.
Because peer assessment is fundamentally a collaborative assessment practice, interpersonal
variables play a substantial role in determining the type and quality of peer assessment
(Strijbos and Wichmann 2018). Some researchers have argued that anonymous peer assess-
ment is advantageous because assessors are more likely to be honest in their feedback, and
interpersonal processes cannot influence how assessees receive the assessment feedback
(Rotsaert et al. 2018). Qualitative evidence suggests that anonymous peer assessment results
in improved feedback quality and more positive perceptions towards peer assessment (Rotsaert
et al. 2018; Vanderhoven et al. 2015). A recent qualitative review by Panadero and Alqassab
(2019) found that three studies had compared anonymous peer assessment to a control group
(i.e., open peer assessment) and looked at academic performance as the outcome. Their review
found mixed evidence regarding the benefit of anonymity in peer assessment with one of the
included studies finding an advantage of anonymity, but the other two finding little benefit of
anonymity. Others have questioned whether anonymity impairs the development of cognitive
and interpersonal development by limiting the collaborative nature of peer assessment (Strijbos
and Wichmann 2018).
Educational Psychology Review (2020) 32:481509 485
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Peers are often novices at providing constructive assessment and inexperienced learners tend to
provide limited feedback (Hattie and Timperley 2007). Several studies have therefore suggested
that peer assessment becomes more effective as studentsexperience with peer assessment
increases. For example, with greater experience, peers tend to use scoring criteria to a greater
extent (Sluijsmans et al. 2004). Similarly, training peer assessment over time can improve the
quality of feedback they provide, although the effects may be limited by the extent of a students
relevant domain knowledge (Alqassab et al. 2018). Frequent peer assessment may also increase
positive learner perceptions of peer assessment (e.g., Sluijsmans et al. 2004). However, other
studies have found that learner perceptions of peer assessment are not necessarily positive
(Alqassab et al. 2018). This may suggest that learner perceptions of peer assessment vary
depending on its characteristics (e.g., quality, detail).
Current Study
Given the previous reliance on narrative reviews and the increasing research and teacher
interest in peer assessment, as well as the popularity of instructional theories advocating for
peer assessment and formative assessment practices in the classroom, we present a quantitative
meta-analytic review to develop and synthesise the evidence in relation to peer assessment.
This meta-analysis evaluates the effect of peer assessment on academic performance when
compared to no assessment as well as teacher assessment. To do this, the meta-analysis only
evaluates intervention studies that utilised experimental or quasi-experimental designs, i.e.,
only studies with control groups, so that the effects of maturation and other confounding
variables are mitigated. Control groups can be either passive (e.g., no feedback) or active (e.g.,
teacher feedback). We meta-analytically address two related research questions:
Q1 What effect do peer assessment interventions have on academic performance relative to
the observed control groups?
Q2 What characteristics moderate the effectiveness of peer assessment?
Working Definitions
The specific methods of peer assessment can vary considerably, but there are a number of
shared characteristics across most methods. Peers are defined as individuals at similar (i.e.,
within 12 grades) or identical education levels. Peer assessment must involve assessing or
being assessed by peers, or both. Peer assessment requires the communication (either written,
verbal, or online) of task-relevant feedback, although the style of feedback can differ markedly,
from elaborate written and verbal feedback to holistic ratings of performance.
We took a deliberately broad definition of academic performance for this meta-analysis
including traditional outcomes (e.g., test performance or essay writing) and also practical skills
(e.g., constructing a circuit in science class). Despite this broad interpretation of academic
performance, we did not include any studies that were carried out in a professional/
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
organisational setting other than professional skills (e.g., teacher training) that were being
taught in a traditional educational setting (e.g., a university).
Selection Criteria
To be included in this meta-analysis, studies had to meet several criteria. Firstly, a study needed
to examine the effect of peer assessment. Secondly, the assessment could be delivered in any
form (e.g., written, verbal, online), but needed to be distinguishable from peer-coaching/peer-
tutoring. Thirdly, a study needed to compare the effect of peer assessment with a control group.
Pre-post designs that did not include a control/comparison group were excluded because we
could not discount the effects of maturation or other confounding variables. Moreover, the
comparison group could take the form of either a passive control (e.g., a no assessment
condition) or an active control (e.g., teacher assessment). Fourthly, a study needed to examine
the effect of peer assessment on a non-self-reported measure of academic performance.
In addition to these criteria, a study needed to be carried out in an educational context or be
related to educational outcomes in some way. Any level of education (i.e., tertiary, secondary,
primary) was acceptable. A study also needed to provide sufficient data to calculate an effect
size. If insufficient data was available in the manuscript, the authors were contacted by email to
request the necessary data (additional information was provided for a single study). Studies
also needed to be written in English.
Literature Search
The literature search was carried out on 8 June 2018 using PsycInfo,Google Scholar,and
ERIC. Google Scholar was used to check for additional references as it does not allow for the
exporting of entries. These three electronic databases were selected due to their relevance to
educational instruction and practice. Results were not filtered based on publication date, but
ERIC only holds records from 1966 to present. A deliberately wide selection of search terms
was used in the first instance to capture all relevant articles. The search terms included peer
gradingor peer assessmentor peer evaluationor peer feedback, which were paired with
learningor performanceor academic achievementor academic performanceor grades.
All peer assessment-related search terms were included with and without hyphenation. In
addition, an ancestry search (i.e., back-search) was performed on the reference lists of the
included articles. Conference programs for major educational conferences were searched.
Finally, unpublished results were sourced by emailing prominent authors in the field and
through social media. Although there is significant disagreement about the inclusion of
unpublished data and conference abstracts, i.e., grey literature(Cook et al. 1993), we opted
to include it in the first instance because including only published studies can result in a meta-
analysis over-estimating effect sizes due to publication bias (Hopewell et al. 2007). It should,
however, be noted that none of the substantive conclusions changed when the analyses were
re-run with the grey literature excluded.
The database search returned 4072 records. An ancestry search returned an additional 37
potentially relevant articles. No unpublished data could be found. After duplicates were
removed, two reviewers independently screened titles and abstracts for relevance. A kappa
statistic was calculated to assess inter-rater reliability between the two coders and was found to
be .78 (89.06% overall agreement, CI .63 to .94), which is above the recommended minimum
levels of inter-rater reliability (Fleiss 1971). Subsequently, the full text of articles that were
Educational Psychology Review (2020) 32:481509 487
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
deemed relevant based on their abstracts was examined to ensure that they met the selection
criteria described previously. Disagreements between the coders were discussed and, when
necessary, resolved by a third coder. Ultimately, 55 articles with 143 effect sizes were found
that met the inclusion criteria and included in the meta-analysis. The search process is depicted
in Fig. 2.
Data Extraction
A research assistant and the first author extracted data from the included papers. We took an
iterative approach to the coding procedure whereby the coders refined the classification of each
variable as they progressed through the included studies to ensure that the classifications best
characterised the extant literature. Below, the coding strategy is reviewed along with the
classifications utilised. Frequency statistics and inter-rater reliability for the extracted data
for the different classifications are presented in Table 1. All extracted variable showed at least
moderate agreement except for whether the peer assessment was freeform or structured, which
showed fair agreement (Landis and Koch 1977).
Records idenfied through
database searching
(n = 4,072 )
Addional records idenfied
through other sources
(n = 7 )
Records aer duplicates removed
(n = 3,736 )
Records screened
(n = 3,736)
Records excluded
(n = 3,483 )
Full-text arcles assessed
for eligibility
(n = 253 )
Full-text arcles excluded,
with reasons
(n = 198 )
Studies included in
quantave synthesis
(n = 55 )
Fig. 2 Flow chart for the identification, screening protocol, and inclusion of publications in the meta-analyses
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Table 1 Frequencies of extracted variables
Count Proportion Count Proportion
Studies Effect sizes
Publication type (kappa = 1)
Conference 1 1.85% 1 0.71%
Dissertation 8 14.81% 14 9.93%
Journal 43 79.63% 123 87.23%
Report 2 3.7% 3 2.13%
Education level (kappa = 1)
Tertiary 29 54.72% 83 59.29%
Secondary 13 24.53% 22 15.71%
Primary 11 20.75% 35 25%
Accounting 1 1.85% 12 8.51%
Education 4 7.41% 8 5.67%
IT 59.26%85.67%
Language 3 5.56% 21 14.89%
Medicine 2 3.70% 7 4.96%
Performing Arts 1 1.85% 1 0.71%
Politics 1 1.85% 1 0.71%
Psychology 2 3.70% 3 2.13%
Reading 1 1.85% 6 4.26%
Research Methods 1 1.85% 3 2.13%
Science 8 14.81% 19 13.48%
Statistics 3 5.56% 4 2.84%
Writing 2240.74%4834.04%
Role (kappa = .59)
Both 49 89.09% 109 78.42%
Reviewee 2 3.64% 10 7.19%
Reviewer 4 7.27% 20 14.39%
Comparison group (kappa = .62)
No assessment 23 35.95% 59 42.14%
Self-assessment 10 15.62% 16 11.43%
Teacher assessment 31 48.44% 65 46.43 %
No 20 35.71% 60 42.55%
Yes 3664.29%8157.45%
Dialog (kappa = .57)
No 36 65.45% 92 65.25%
Yes 19 34.55 49 34.75%
Grading (kappa = .52)
No 18 32.73% 46 32.62%
Yes 3767.27%9567.38%
Freeform (kappa = .22)
No 45 83.33% 112 79.43%
Yes 9 16.67% 29 20.57%
Online (kappa = .92)
No 32 59.26% 102 72.34%
Yes 2240.74%3927.66%
Anonymous (kappa = .40)
No 29 55.77% 77 57.04%
Yes 2344.23%5842.96%
Frequency (kappa = .55)
Multiple 34 61.82% 98 69.50%
Single 21 38.18% 43 30.50%
Transfer (kappa = . 43)
Far 1828.12%2618.44%
Near 23 35.94% 64 45.39%
None 23 435.94% 51 36.17%
Allocation (kappa = .56)
Classroom 41 75.93% 107 75.89%
Individual 11 20.37% 31 21.99%
Year/semester 2 3.70% 3 2.13%
Note: different count totals for some variables are the result of missing data. Kappa correlation coefficients are
displayed for each category, which indicate the degree of inter-rater reliability for the data extraction stage
Educational Psychology Review (2020) 32:481509 489
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Publication Type
Publications were classified into journal articles, conference papers, dissertations, reports, or
unpublished records.
Education Level
Education level was coded as either graduate tertiary, undergraduate tertiary, secondary, or
primary. Given the small number of studies that utilised graduate samples (N= 2), we
subsequently combined this classification with undergraduate to form a general tertiary
category. In addition, we recorded the grade level of the students. Generally speaking, primary
education refers to the ages of 612, secondary education refers to education from 1318, and
tertiary education is undertaken after the age of 18.
Age and Sex
The percentage of students in a study that were female was recorded. In addition, we recorded
the mean age from each study. Unfortunately, only 55.5% of studies recorded participantssex
and only 18.5% of studies recorded mean age information.
The subject area associated with the academic performance measure was coded. We also
recorded the nature of the academic performance variable for descriptive purposes.
Assessment Role
Studies were coded as to whether the students acted as peer assessors, assessees, or both
assessors and assessees.
Comparison Group
Four types of comparison group were found in the included studies: no assessment, teacher
assessment, self-assessment, and reader-control. In many instances, a no assessment condition
could be characterised as typical instruction; that is, two versions of a course were runone
with peer assessment and one without peer assessment. As such, while no specific teacher
assessment comparison condition is referenced in the article, participants would most likely
have received some form of teacher feedback as is typical in standard instructional practice.
Studies were classified as having teacher assessment on the basis of a specific reference to
teacher feedback being provided.
Studies were classified as self-assessment controls if there was an explicit reference to a
self-assessment activity, e.g., self-grading/rating. Studies that only included revision, e.g.,
working alone on revising an assignment, were classified as no assessment rather than self-
assessment because they did not necessarily involve explicit self-assessment. Studies
where both the comparison and intervention groups received teacher assessment (in
addition to peer assessment in the case of the intervention group) were coded as no
assessment to reflect the fact that the comparison group received no additional assessment
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
compared to the peer assessment condition. In addition, Philippakos and MacArthur
(2016) and Cho and MacArthur (2011) were notable in that they utilised a reader-
control condition whereby students read, but did not assess peerswork. Due to the small
frequency of this control condition, we ultimately classified them as no assessment
Peer Assessment Type
Peer assessment was characterised using coding we believed best captured the theoretical
distinctions in the literature. Our typology of peer assessment used three distinct components,
which were combined for classification:
1. Did the peer feedback include a dialog between peers?
2. Did the peer feedback include written comments?
3. Did the peer feedback include grading?
Each study was classified using a dichotomous present/absent scoring system for each of the
three components.
Studies were dichotomously classified as to whether a specific rubric, assessment script, or
scoring system was provided to students. Studies that only provided basic instructions to
students to conduct the peer feedback were coded as freeform.
Was the Assessment Online?
Studies were classified based on whether the peer assessment was online or offline.
Studies were classified based on whether the peer assessment was anonymous or identified.
Frequency of Assessment
Studies were coded dichotomously as to whether they involved only a single peer assessment
occasion or, alternatively, whether students provided/received peer feedback on multiple
The level of transfer between the peer assessment task and the academic performance measure
was coded into three categories:
1. No transferthe peer-assessed task was the same as the academic performance measure.
For example, a students assignment was assessed by peers and this feedback was utilised
to make revisions before it was graded by their teacher.
Educational Psychology Review (2020) 32:481509 491
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2. Near transferthe peer-assessed task was in the same or very similar format as the
academic performance measure, e.g., an essay on a different, but similar topic.
3. Far transferthe peer-assessed task was in a different form to the academic performance
task, although they may have overlapping content. For example, a students assignment
was peer assessed, while the final course exam grade was the academic performance
We recorded how participants were allocated to a condition. Three categories of allocation were
found in the included studies: random allocation at the class level, at the student level, or at the year/
semester level. As only two studies allocated students to conditions at the year/semester level, we
combined these studies with the studies allocated at the classroom level (i.e., as quasi-experiments).
Statistical Analyses of Effect Sizes
Effect Size Estimation and Heterogeneity
A random effects, multi-level meta-analysis was carried out using R version 3.4.3 (R Core
Team 2017). The primary outcome was standardised mean difference between peer assessment
and comparison (i.e., control) conditions. A common effect size metric, Hedgesg,was
calculated. A positive Hedges g value indicates comparatively higher values in the dependent
variable in the peer assessment group (i.e., higher academic performance). Heterogeneity in the
effect sizes was estimated using the I2statistic. I2is equivalent to the percentage of variation
between studies that is due to heterogeneity (Schwarzer et al. 2015). Large values of the I2
statistics suggest higher heterogeneity between studies in the analysis.
Meta-regressions were performed to examine the moderating effects of the various factors
that differed across the studies. We report the results of these meta-regressions alongside sub-
groups analyses. While it was possible to determine whether sub-groups differed significantly
from each other by determining whether the confidence interval around their effect sizes
overlap, sub-groups analysis may also produce biased estimates when heteroscedasticity or
multicollinearity are present (Steel and Kammeyer-Mueller 2002). We performed meta-
regressions separately for each predictor to test the overall effect of a moderator.
Finally, as this meta-analysis included students from primary school to graduate school, which
are highly varied participant and educational contexts, we opted to analyse the data both in complete
form, as well as after controlling for each level of education. As such, we were able to look at the
effect of each moderator across education levels and for each education level separately.
Robust Variance Estimation
Often meta-analyses include multiple effect sizes from the same sample (e.g., the effect of peer
assessment on two different measures of academic performance). Including these dependent effect
sizes in a meta-analysis can be problematic, as this can potentially bias the results of the analysis in
favour of studies that have more effect sizes. Recently, Robust Variance Estimation (RVE) was
developed as a technique to address such concerns (Hedges et al. 2010). RVE allows for the
modelling of dependence between effect sizes even when the nature of the dependence is not
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
specifically known. Under such situations, RVE results in unbiased estimates of fixed effects when
dependent effect sizes are included in the analysis (Moeyaert et al. 2017). A correlated effects
structure was specified for the meta-analysis (i.e., the random error in the effects from a single paper
were expected to be correlated due to similar participants, procedures). A rho value of .8 was
specified for the correlated effects (i.e., effects from the same study) as is standard practice when the
correlation is unknown (Hedges et al. 2010). A sensitivity analysis indicated that none of the results
varied as a function of the chosen rho. We utilised the robumetapackage (Fisher et al. 2017)to
perform the meta-analyses. Our approach was to use only summative dependent variables when
they were provided (e.g., overall writing quality score rather than individual trait measures), but to
utilise individual measures when overall indicators were not available. When a pre-post design was
used in a study, we adjusted the effect size for pre-intervention differences in academic performance
as long as there was sufficient data to do so (e.g., ttests for pre-post change).
Overall Meta-analysis of the Effect of Peer Assessment
Prior to conducting the analysis, two effect sizes (g= 2.06 and 1.91) were identified as outliers
and removed using the outlier labelling rule (Hoaglin and Iglewicz 1987). Descriptive
characteristics of the included studies are presented in Table 2. The meta-analysis indicated
that there was a significant positive effect of peer assessment on academic performance (g=
0.31, SE = .06, 95% CI = .18 to .44, p< .001). A density graph of the recorded effect sizes is
provided in Fig. 3. A sensitivity analysis indicated that the effect size estimates did not differ
with different values of rho. Heterogeneity between the studieseffect sizes was large, I2=
81.08%, supporting the use of a meta-regression/sub-groups analysis in order to explain the
observed heterogeneity in effect sizes.
Meta-Regressions and Sub-Groups Analyses
Effect sizes for sub-groups are presented in Table 3. The results of the meta-regressions are
Education Level
A meta-regression with tertiary students as the reference category indicated that there
was no significant difference in effect size as a function of education level. The effect of
peer assessment was similar for secondary students (g=.44, p< .001) and primary
school students (g=.41, p= .006) and smaller for tertiary students (g= .21, p=.043).
There is, however, a strong theoretical basis for examining effects separately at different
education levels (primary, secondary, tertiary), because of the large degree of heteroge-
neity across such a wide span of learning contexts (e.g., pedagogical practices, intellec-
tual and social development of the students). We therefore will proceed by reporting the
data both as a whole and separately for each of the education levels for all of the
moderators considered here. Education level is contrast coded such that tertiary is
compared to the average of secondary and primary and secondary and primary are
compared to each other.
Educational Psychology Review (2020) 32:481509 493
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Comparison Group
A meta-regression indicated that the effect size was not significantly different when comparing peer
assessment with teacher assessment, than when comparing peer assessment with no assessment (b=
Table 2 Descriptive characteristics of the included studies
Authors Year Pub. type Subject Country Ed. level
Hwang et al. 2018 Journal Science Taiwan Primary
Gielen et al. 2010 Journal Writing Belgium High school
Wang et al. 2017 Journal IT Taiwan High school
Hwang et al. 2014 Journal Science Taiwan Primary
Khonbi & Sadeghi 2013 Journal Education Iran Undergraduate
Karegianes et al. 1980 Journal Writing USA High school
Philippakos & MacArthur 2016 Journal Writing USA Primary
Cho & MacArthur 2011 Journal Science USA Undergraduate
Benson 1979 Dissertation Writing USA High school
Liu et al. 2016 Journal Writing Taiwan Primary
Wang et al. 2014 Journal Writing Taiwan Primary
Sippel & jackson 2015 Journal Language USA Undergraduate
Erfani & Nikbin 2015 Journal Writing Iran Undergraduate
Crowe et al. 2015 Journal Research methods USA Undergraduate
Anderson & Flash 2014 Journal Science USA Undergraduate
Papadopoulos et al. 2012 Journal IT Undergraduate
Hussein & Al Ashri 2013 Report Writing Egypt High school
Demetriadis et al. 2011 Journal IT Germany Undergraduate
Olson 1990 Journal Writing USA Primary
Diab 2011 Journal Writing Lebanon Undergraduate
Enders et al. 2010 Journal Statistics USA Undergraduate
Rudd II et al. 2009 Journal Science USA Undergraduate
Chaney & Ingraham 2009 Journal Accounting USA Undergraduate
Xie et al. 2008 Journal Politics USA Undergraduate
Schönrock-Adema 2007 Journal Medicine Netherlands Undergraduate
Li & Steckelberg 2004 Conference IT USA Undergraduate
McCurdy & Shapiro 1992 Journal Reading USA Primary
van Ginkel et al. 2017 Journal Science Netherlands Undergraduate
Kamp et al. 2014 Journal Science Netherlands Undergraduate
Kurihara 2017 Journal Writing Japan High school
Ha & Storey 2006 Journal Writing China Undergraduate
van den Boom 2007 Journal Psychology Netherlands Undergraduate
Ozogul et al. 2008 Journal Education USA Undergraduate
Sun et al. 2015 Journal Statistics USA Undergraduate
Li & Gao 2016 Journal Education USA Undergraduate
Sadler & Good 2006 Journal Science USA High school
Califano 1987 Dissertation Writing USA Primary
Farrell 1977 Dissertation Writing USA High school
AbuSeileek & Abualshar 2014 Journal Writing Undergraduate
Bangert 1996 Dissertation Statistics USA Undergraduate
Birjandi & Tamjid 2012 Journal Writing Undergraduate
Chang et al. 2012 Journal Science Taiwan Undergraduate
English et al. 2006 Journal Medicine UK Undergraduate
Hsia 2016 Journal Performing Arts Taiwan High school
Hsu 2016 Journal IT High school
Lin 2009 Dissertation Writing Taiwan Undergraduate
Montanero et al. 2014 Journal Writing Spain Primary
Bhullar 2014 Journal Psychology USA Undergraduate
Prater & Bermudez 1993 Journal Writing USA Primary
Rijlaarsdam & Schoonen 1988 Report Writing Netherlands High school
Ruegg 2018 Journal Writing Japan Undergraduate
Sadeghi & Khonbi 2015 Journal Education Iran Undergraduate
Horn 2009 Dissertation Writing USA Primary
Pierson 1966 Dissertation Writing USA High school
Wise 1992 Dissertation Writing USA High school
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
.02, 95% CI .26 to .31, p= .865). The difference between peer assessment vs. no assessment and
peer assessment vs. self-assessment was also not significant (b=.03, CI .44to.38,p= .860), see
Tab le 4. An examination of sub-groups suggested that peer assessment had a moderate positive
effect compared to no assessment controls (g=.31, p= .004) and teacher assessment (g=.28, p=
.007) and was not significantly different compared with self-assessment (g=.23, p= .209). The
meta-regression was also re-run with education level as a covariate but the results were unchanged.
Assessment Role
Meta-regressions indicated that the participants role was not a significant moderator of the
effect size; see Table 4. However, given the extremely small number of studies where
participants did not act as both assessees (n= 2) and assessors (n= 4), we did not perform a
sub-groups analysis, as such analyses are unreliable with small samples (Fisher et al. 2017).
Subject Area
Given that many subject areas had few studies (see Table 1) and the writing subject area made up
the majority of effect sizes (40.74%), we opted to perform a meta-regression comparing writing
with other subject areas. However, the effect of peer assessment did not differ between writing (g=
.30,p= .001) and other subject areas (g=.31,p= .002); b=.003, 95% CI .25 to .25, p= .979.
Similarly, the results did not substantially change when education level was entered into the model.
−1 10
Effect Size
Fig. 3 A density plot of effect sizes
Educational Psychology Review (2020) 32:481509 495
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Peer Assessment Type
The effect of peer assessment did not differ significantly when peer assessment included a
written component (g=.35,p< .001) than when it did not (g=.20,p=.015) , b=.144, 95%
CI .10 to .39, p= .241. Including education as a variable in the model did not change the effect
written feedback. Similarly, studies with a dialog component (g=.21,p= .033) did not differ
significantly from those that did not (g=.35,p< .001), b=.137, 95% CI .39to.12,p= .279.
Studies where peer feedback included a grading component (g=.37,p< .001) did not
differ significantly from those that did not (g=.17,p= .138). However, when education level
Table 3 Results of the sub-groups analysis
Nk g SE I2p
Publication type
Dissertation 8 14 0.21 0.13 64.65% 0.138
Journal 43 123 0.31 0.07 83.23% < .001
Conference/report 2 3 0.82 0.22 9.08% 0.168
Education level
Primary school 11 35 0.41 0.12 68.36% 0.006
Secondary 13 22 0.44 0.1 69.70% 0.001
Tertiary 29 83 0.21 0.10 85.17% 0.043
Comparison group
Teacher assessment 31 65 0.27 0.09 83.82% 0.007
No assessment 23 59 0.31 0.1 78.02% 0.004
Self-assessment 10 16 0.23 0.17 74.57% 0.209
Yes 3681 0.350.0884.04%<.001
No 20 60 0.2 0.08 68.96% 0.014
Yes 1949 0.210.0970.74%0.034
No 36 92 0.35 0.08 84.12% < .001
Yes 3795 0.370.0783.48%<.001
No 18 46 0.17 0.11 72.60% 0.138
Yes 9 29 0.42 0.16 68.68% 0.03
No 45 112 0.29 0.07 82.28% < .001
Yes 2239 0.380.1283.46%0.003
No 33 102 0.24 0.08 80.18% 0.004
Yes 2358 0.270.1182.73%0.019
No 29 77 0.25 0.08 70.97% 0.004
Multiple 34 98 0.37 0.07 81.28% < .001
Single 21 43 0.2 0.11 80.69% 0.103
Far 18 26 0.2 0.13 89.45% 0.124
Near 23 64 0.42 0.08 72.93% < .001
None 23 51 0.29 0.11 84.19% 0.017
Classroom 41 107 0.31 0.07 78.97% < .001
Individual 11 31 0.21 0.13 68.59% 0.14
N= Number of studies, k= number of effects, g=Hedges g, SE = standard error in the effect size, I2=
heterogeneity within the group, p=pvalue
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
was included in the model, the model indicated significant interaction effect between grading
in tertiary students and the average effect of grading in primary and secondary students (b=
.395, 95% CI .06 to .73, p= .022). A follow-up sub-groups analysis showed that grading was
beneficial for academic performance in tertiary students (g=.55,p= .009), but not secondary
Table 4 Results of the meta-reg ressions
Variable b SE CI low. CI upp. p
Publication type
Intercept 0.3 0.12 0.02 0.57 0.038
Published article 0.02 0.14 0.29 0.32 0.911
Education level
Intercept 0.21 0.1 0.01 0.41 0.043
Primary 0.2 0.15 0.12 0.53 0.198
Secondary 0.24 0.14 0.05 0.53 0.103
Intercept 0.31 0.09 0.13 0.5 0.002
Writing -.0 03 0.12 0.25 0.25 0.979
Intercept 0.31 0.07 0.17 0.45 < .001
Reviewee 0.25 0.12 1.6 1.1 0.272
Reviewer 0.06 0.29 0.87 1 0.838
Intercept 0.31 0.11 0.08 0.53 0.01
Self-assessment 0.03 0.19 0.44 0.38 0.86
Teacher 0.02 0.14 0.26 0.31 0.864
Intercept 0.22 0.08 0.04 0.4 0.017
Yes 0.14 0.12 0.1 0.39 0.241
Intercept 0.36 0.08 0.19 0.52 < .001
Yes 0.14 0.12 0.39 0.12 0.279
Intercept 0.17 0.11 0.07 0.41 0.161
Yes 0.21 0.14 0.07 0.48 0.145
Intercept 0.42 0.16 0.06 0.79 0.028
Structured 0.13 0.17 0.51 0.25 0.455
Intercept 0.25 0.07 0.09 0.4 0.002
Yes 0.16 0.13 0.1 0.42 0.215
Intercept 0.26 0.08 0.1 0.42 0.002
Yes 0.03 0.12 0.22 0.28 0.811
Intercept 0.37 0.07 0.22 0.52 < .001
Single 0.17 0.14 0.45 0.11 0.223
Intercept 0.16 0.1 0.05 0.37 0.116
Near 0.27 0.13 0.01 0.52 0.042
None 0.14 0.14 0.15 0.43 0.334
Intercept 0.31 0.07 0.16 0.45 < .001
Individual 0.09 0.16 0.43 0.24 0.566
Year/ S e m e s t e r 0 . 5 1 0 . 3 2.47 3.48 0.317
b= unstandardised regression estimate, SE = standard error, CI low/UPP = lower and upper bound of the
confidence interval respectively, p=pvalue.
Educational Psychology Review (2020) 32:481509 497
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
school students (g=.002, p= .991) or primary school students (g=.08, p= .762). When the
three variables used to characterise peer assessment were entered simultaneously, the results
were unchanged.
The average effect size was not significantly different for studies where assessment was freeform,
i.e., where no specific script or rubric was given (g=.42,p= .030) compared to those where a
specific script or rubric was provided (g=.29,p< .001); b=.13, 95% CI .51 to .25, p= .455.
However, there were few studies where feedback was freeform (n=9,k=29). The results were
unchanged when education level was controlled for in the meta-regression.
Studies where peer assessment was online (g=.38,p= .003) did not differ from studies where
assessment was offline (g=.24,p= .004); b=.16, 95% CI .10to.42,p= .215. This result
was unchanged when education level was included in the meta-regression.
There was no significant difference in terms of effect size between studies where peer
assessment was anonymised (g=.27,p= .019) and those where it was not (g=.25,p=
.004); b=.03, 95% CI .22to.28,p= .811). Nor was the effect significant when education
level was controlled for.
Studies where peer assessment was performed just a single time (g= .19, p= .103) did not differ
significantly from those where it was performed multiple times (g= .37, p< .001); b=-.17, 95%
CI .45 to .11, p= .223. Although it is worth noting that the results of the sub-groups analysis
suggest that the effect of peer assessment was not significant when only considering studies that
applied it a single time. The result did not change when education was included in the model.
There was no significant difference in effect size between studies utilising far transfer (g= .21, p=
.124) than those with near (g= .42, p< .001) or no transfer (g= .29, p= .017). Although it is worth
noting that the sub-groups analysis suggests that the effect of peer assessment was only significant
when there was no transfer to the criterion task. As shown in Table 4, this was also not significant
when analysed using meta-regressions either with or without education in the model.
Studies that allocated participants to experimental condition at the student level (g=.21,p=
.14) did not differ from those that allocated condition at the classroom/semester level (g=.31,
p<.001andg=.79,p= .223 respectively), see Table 4for meta-regressions.
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Publication Bias
Risk of publication bias was assessed by inspecting the funnel plots (see Fig. 4)ofthe
relationship between observed effects and standard error for asymmetry (Schwarzer et al.
2015). Eggers test was also run by including standard error as a predictor in a meta-regression.
Based on the funnel plots and a non-significant Eggers test of asymmetry (b=.886,p=.226),
risk of publication bias was judged to be low
Proponents of peer assessment argue that it is an effective classroom technique for improving
academic performance (Topping 2009). While previous narrative reviews have argued for the
benefits of peer assessment, the current meta-analysis quantifies the effect of peer assessment
interventions on academic performance within educational contexts. Overall, the results
suggest that there is a positive effect of peer assessment on academic performance in primary,
secondary, and tertiary students. The magnitude of the overall effect size was within the small
to medium range for effect sizes (Sawilowsky 2009). These findings also suggest that that the
benefits of peer assessment are robust across many contextual factors, including different
feedback and educational characteristics.
Recently, researchers have increasingly advocated for the role of assessment in promoting
learning in educational practice (Wiliam 2018). Peer assessment forms a core part of theories
of formative assessment because it is seen as providing new information about the learning
process to the teacher or student, which in turn facilitates later performance (Pellegrino et al.
2001). The current results provide support for the position that peer assessment can be an
effective classroom technique for improving academic performance. The result suggest that
peer assessment is effective compared to both no assessment (which often involved teaching
as usual) and teacher assessment, suggesting that peer assessment can play an important
Fig. 4 A funnel plot showing the relationship between standard error and observed effect size for the academic
performance meta-analysis
Educational Psychology Review (2020) 32:481509 499
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
formative role in the classroom. The findings suggest that structuring classroom activities in a
way that utilises peer assessment may be an effective way to promote learning and optimise the
use of teaching resources by permitting the teacher to focus on assisting students with greater
difficulties or for more complex tasks. Importantly, the results indicate that peer assessment
can be effective across a wide range of subject areas, education levels, and assessment types.
Pragmatically, this suggests that classroom teachers can implement peer assessment in a
variety of ways and tailor the peer assessment design to the particular characteristics and
constraints of their classroom context.
Notably, the results of this quantitative meta-analysis align well with past narrative reviews
(e.g., Black and Wiliam 1998a;Topping1998; van Zundert et al. 2010). The fact that both
quantitative and qualitative syntheses of the literature suggest that peer assessment can be
beneficial provides a stronger basis for recommending peer assessment as a practice. However,
several of the moderators of the effectiveness of peer feedback that have been argued for in the
available narrative reviews (e.g., rubrics; Panadero and Jonsson 2013) have received little
support from this quantitative meta-analysis. As detailed below, this may suggest that the
prominence of such feedback characteristics in narrative reviews is more driven by theoretical
considerations rather than quantitative empirical evidence. However, many of these moderat-
ing variables are complex, for example, rubrics can take many forms, and due to this
complexity may not lend themselves as well to quantitative synthesis/aggregation (for a
detailed discussion on combining qualitative and quantitative evidence, see Gorard 2002).
Mechanisms and Moderators
Indeed, the current findings suggest that the feedback characteristics deemed important by
current theories of peer assessment may not be as significant as first thought. Previously,
individual studies have argued for the importance of characteristics such as rubrics (Panadero
and Jonsson 2013), anonymity (Bloom & Hautaluoma, 1987), and allowing students to
practice peer assessment (Smith, Cooper, & Lancaster, 2002). While these feedback charac-
teristics have been shown to affect the efficacy of peer assessment in individual studies, we
find little evidence that they moderate the effect of peer assessment when analysed across
studies. Many of the current models of peer assessment rely on qualitative evidence, theoretical
arguments, and pedagogical experience to formulate theories about what determines effective
peer assessment. While such evidence should not be discounted, the current findings also point
to the need for better quantitative and experimental studies to test some of the assumptions
embedded in these models. We suggest that the null findings observed in this meta-analysis
regarding the proposed moderators of peer assessment efficacy should be interpreted cautious-
ly, as more studies that experimentally manipulate these variables are needed to provide more
definitive insight into how to design better peer assessment procedures.
While the current findings are ambiguous regarding the mechanisms of peer assessment, it
is worth noting that without a solid understanding of the mechanisms underlying peer
assessment effects, it is difficult to identify important moderators or optimally use peer
assessment in the classroom. Often the research literature makes somewhat broad claims about
the possible benefits of peer assessment. For example, Topping (1998,p.256)suggestedthat
peer assessment may, promote a sense of ownership, personal responsibility, and motiva-
tion[and] might also increase variety and interest, activity and interactivity, identification
and bonding, self-confidence, and empathy for others. Others have argued that peer
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
assessment is beneficial because it is less personally evaluativewith evidence suggesting that
teacher assessment is often personally evaluative (e.g., good boy, that is correct)whichmay
have little or even negative effects on performance particularly if the assessee has low self-
efficacy (Birney, Beckmann, Beckmann & Double 2017; Double and Birney 2017,2018;
Hattie and Timperley 2007). However, more research is needed to distinguish between the many
proposed mechanisms for peer assessments formative effects made within the extant literature,
particularly as claims about the mechanisms of the effectiveness of peer assessment are often
evidenced by student self-reports about the aspects of peer assessment they rate as useful. While
such self-reports may be informative, more experimental research that systematically manipulates
aspects of the design of peer assessment is likely to provide greater clarity about what aspects of
peer assessment drive the observed benefits.
Our findings did indicate an important role for grading in determining the effectiveness
of peer feedback. We found that peer grading was beneficial for tertiary students but not
beneficial for primary or secondary school students. This finding suggests that grading
appears to add little to the peer feedback process in non-tertiary students. In contrast, a
recent meta-analysis by Sanchez et al. (2017) on peer grading found a benefit for non-
tertiary students, albeit based on a relatively small number of studies compared with the
current meta-analysis. In contrast, the present findings suggest that there may be signif-
icant qualitative differences in the performance of peer grading as students develop. For
example, the criteria students use to assesses ability may change as they age (Stipek and
Iver 1989). It is difficult to ascertain precisely why grading has positive additive effects in
only tertiary students, but there are substantial differences in pedagogy, curriculum,
motivation of learning, and grading systems that may account for these differences. One
possibility is that tertiary students are more grade orientatedand therefore put more
weight on peer assessment which includes a specific grade. Further research is needed to
explore the effects of grading at different educational levels.
One of the more unexpected findings of this meta-analysis was the positive effect of peer
assessment compared to teacher assessment. This finding is somewhat counterintuitive given
the greater qualifications and pedagogical experience of the teacher. In addition, in many of the
studies, the teacher had privileged knowledge about, and often graded the outcome assessment.
Thus, it seems reasonable to expect that teacher feedback would better align with assessment
objectives and therefore produce better outcomes. Despite all these advantages, teacher
assessment appeared to be less efficacious than peer assessment for academic performance.
It is possible that the pedagogical disadvantages of peer assessment are compensated for by
affective or motivational aspects of peer assessment, or by the substantial benefits of acting as
an assessor. However, more experimental research is needed to rule out the effects of potential
methodological issues discussed in detail below.
A major limitation of the current results is that they cannot adequately distinguish between
the effect of assessing versus being an assessee. Most of the current studies confound
giving and receiving peer assessment in their designs (i.e., the students in the peer
assessment group both provide assessment and receive it), and therefore, no substantive
conclusions can be drawn about whether the benefits of peer assessment extend from
giving feedback, receiving feedback, or both. This raises the possibility that the benefit of
Educational Psychology Review (2020) 32:481509 501
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
peer assessment comes more from assessing, rather than being assessed (Usher 2018).
Consistent with this, Lundstrom and Baker (2009) directly compared the effects of giving
and receiving assessment on studentswriting performance and found that assessing was
more beneficial than being assessed. Similarly, Graner (1987) found that assessing papers
without being assessed was as effective for improving writing performance as assessing
papers and receiving feedback.
Furthermore, more true experiments are needed, as there is evidence from these results that
they produce more conservative estimates of the effect of peer assessment. The studies
included in this meta-analysis were not only predominantly randomly allocated at the class-
room level (i.e., quasi-experiments), but in all but one case, were not analysed using appro-
priate techniques for analysing clustered data (e.g., multi-level modelling). This is problematic
because it makes disentangling classroom-level effects (e.g., teacher quality) from the inter-
vention effect difficult, which may lead to biased statistical inferences (Hox 1998). While
experimental designs with individual allocation are often not pragmatic for classroom inter-
ventions, online peer assessment interventions appear to be obvious candidates for increased
true experiments. In particular, carefully controlled experimental designs that examine the
effect of specific assessment characteristics, rather than black-boxstudies of the effectiveness
of peer assessment, are crucial for understanding when and how peer assessment is most likely
to be effective. For example, peer assessment may be counterproductive when learning novel
tasks due to studentsinadequate domain knowledge (Könings et al. 2019).
While the current results provide an overall estimate of the efficacy of peer assessment in
improving academic performance when compared to teacher and no assessment, it should be
noted that these effects are averaged across a wide range of outcome measures, including
science project grades, essay writing ratings, and end-of-semester exam scores. Aggregating
across such disparate outcomes is always problematic in meta-analysis and is a particular
concern for meta-analyses in educational research, as some outcome measures are likely to be
more sensitive to interventions than others (William, 2010). A further issue is that the effect of
moderators may differ between academic domains. For example, some assessment character-
istics may be important when teaching writing but not mathematics. Because there were too
few studies in the individual academic domains (with the exception of writing), we are unable
to account for these differential effects. The effects of the moderators reported here therefore
need to be considered as overall averages that provide information about the extent to which
the effect of a moderator generalises across domains.
Finally, the findings of the current meta-analysis are also somewhat limited by the fact that
few studies gave a complete profile of the participants and measures used. For example, few
studies indicated that ability of peer reviewer relative to the reviewee and age difference
between the peers was not necessarily clear. Furthermore, it was not possible to classify the
academic performance measures in the current study further, such as based on novelty, or to
code for the quality of the measures, including their reliability and validity, because very few
studies provide comprehensive details about the outcome measure(s) they utilised. Moreover,
other important variables such as fidelity of treatment were almost never reported in the
included manuscripts. Indeed, many of the included variables needed to be coded based on
inferences from the included studiestext and were not explicitly stated, even when one would
reasonably expect that information to be made clear in a peer-reviewed manuscript. The
observed effect sizes reported here should therefore be taken as an indicator of average efficacy
based on the extant literature and not an indication of expected effects for specific
implementations of peer assessment.
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Overall, our findings provide support for the use of peer assessment as a formative practice for
improving academic performance. The results indicate that peer assessment is more effective
than no assessment and teacher assessment and not significantly different in its effect from
self-assessment. These findings are consistent with current theories of formative assessment
and instructional best practice and provide strong empirical support for the continued use of
peer assessment in the classroom and other educational contexts. Further experimental work is
needed to clarify the contextual and educational factors that moderate the effectiveness of peer
assessment, but the present findings are encouraging for those looking to utilise peer assess-
ment to enhance learning.
Acknowledgements The authors would like to thank Kristine Gorgen and Jessica Chan for their help coding the
studies included in the meta-analysis.
Effect Size Calculation
Standardised mean differences were calculated as a measure of effect size. Standardised mean
difference (d) was calculated using the following formula, which is typically used in meta-
analyses (e.g., Lipsey and Wilson 2001).
Spooled ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
As standardized mean difference (d) is known to have a slight positive bias (Hedges 1981), we
applied a correction to bias-correct estimates (resulting in what is often referred to as Hedgesg).
4df 1
For studies where there was insufficient information to calculate Hedgesgusing the above
method, we used the online effect size calculator developed by Lipsey and Wilson (2001)
available For pre-post design studies where adjust-
ed means were not provided, we used the critical value relevant to the difference between peer
feedback and control groups from the reported pre-intervention adjusted analysis (e.g., Analysis
Educational Psychology Review (2020) 32:481509 503
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
of Covariances) as suggested by Higgins and Green (2011). For pre-post designs studies where
both pre and post intervention means and standard deviations were provided, we used an effect
size estimate based on the mean pre-post change in the peer feedback group minus the mean pre-
post change in the control group, divided by the pooled pre-intervention standard deviation as
such an approach minimised bias and improves estimate precision (Morris 2008).
Variance estimates for each effect size were calculated using the following formula:
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International
License (, which permits unrestricted use, distribution, and repro-
duction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a
link to the Creative Commons license, and indicate if changes were made.
References marked with an * were included in the meta-analysis
*AbuSeileek, A. F., & Abualsha'r, A. (2014). Using peer computer-mediated corrective feedback to support EFL
learners'. Language Learning & Technology, 18(1), 76-95.
Alqassab, M., Strijbos, J. W., & Ufer, S. (2018). Training peer-feedback skills on geometric construction tasks:
Role of domain knowledge and peer-feedback levels. European Journal of Psychology of Education, 33(1),
*Anderson, N. O., & Flash, P. (2014). The power of peer reviewing to enhance writing in horticulture:
Greenhouse management. International Journal of Teaching and Learning in Higher Education, 26(3),
*Bangert,A.W.(1995).Peer assessment: an instructional strategy for effectively implementing performance-
based assessments. (Unpublished doctoral dissertation). University of South Dakota.
*Benson, N. L. (1979). The effects of peer feedback during the writing process on writing performance,
revision behavior, and attitude toward writing. (Unpublished doctoral dissertation). University of
Colorado, Boulder.
*Bhullar, N., Rose, K. C., Utell, J. M., & Healey, K. N. (2014). The impact of peer review on writing in
apsychology course: Lessons learned. Journal on Excellence in College Teaching, 25(2), 91-106.
*Birjandi, P., & Hadidi Tamjid, N. (2012). The role of self-, peer and teacher assessment in promoting Iranian
EFL learnerswriting performance. Assessment & Evaluation in Higher Education, 37(5), 513533.
Birney, D. P., Beckmann, J. F., Beckmann, N., & Double, K. S. (2017). Beyond the intellect: Complexity and
learning trajectories in Ravens Progressive Matrices depend on self-regulatory processes and conative
dispositions. Intelligence, 61,6377.
Black, P., & Wiliam, D. (1998a). Assessment and classroom learning. Assessment in Education: Principles,
Policy & Practice, 5(1), 774.
Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment,
Evaluation and Accountability (formerly: Journal of Personnel Evaluation in Education), 21(1), 5.
Bloom, A. J., & Hautaluoma, J. E. (1987). Effects of message valence, communicator credibility, and source
anonymity on reactions to peer feedback. The Journal of Social Psychology, 127(4), 329338.
Brown, G. T., Irving, S. E., Peterson, E. R., & Hirschfeld, G. H. (2009). Use of interactiveinformal assessment
practices: New Zealand secondary students' conceptions of assessment. Learning and Instruction, 19(2), 97
*Califano, L. Z. (1987). Teacher and peer editing: Their effects on students' writing as measured by t-unit length,
holistic scoring, and the attitudes of fifth and sixth grade students (Unpublished doctoral dissertation),
Northern Arizona University.
*Chaney, B. A., & Ingraham, L. R. (2009). Using peer grading and proofreading to ratchet student expectations in
preparing accounting cases. American Journal of Business Education, 2(3), 39-48.
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
*Chang, S. H., Wu, T. C., Kuo, Y. K., & You, L. C. (2012). Project-based learning with an online peer assessment
system in a photonics instruction for enhancing led design skills. Turkish Online Journal of Educational
Technology-TOJET, 11(4), 236246.
*Cho, K., & MacArthur, C. (2011). Learning by reviewing. Journal of Educational Psychology, 103(1), 73.
Cho, K., Schunn, C. D., & Charney, D. (2006). Commenting on writing: Typology and perceived helpfulness of
comments from novice peer reviewers and subject matter experts. Written Communication, 23(3), 260294.
Cook, D. J., Guyatt, G. H., Ryan, G., Clifton, J., Buckingham, L., Willan, A., et al. (1993). Should unpublished
data be included in meta-analyses?: Current convictions and controversies. JAMA, 269(21), 27492753.
*Crowe, J. A., Silva, T., & Ceresola, R. (2015). The effect of peer review on student learning outcomes in a
research methods course. Teaching Sociology, 43(3), 201213.
*Diab, N. M. (2011). Assessing the relationship between different types of student feedback and the quality of
revised writing. Assessing Writing, 16(4), 274 -292.
Demetriadis, S., Egerter, T., Hanisch, F., & Fischer, F. (2011). Peer review-based scripted collaboration to support
domain-specific and domain-general knowledge acquisition in computer science. Computer Science
Education, 21(1), 2956.
Dochy, F., Segers, M., & Sluijsmans, D. (1999). The use of self-, peer and co-assessment in higher education: A
review. Studies in Higher Education, 24(3), 331350.
Double, K. S., & Birney, D. (2017). Are you sure about that? Eliciting confidence ratings may influence
performa nce on Ravens progressive matrices. Thinking & Reasoning, 23(2), 190206.
Double, K. S., & Birney, D. P. (2018). Reactivity to confidence ratings in older individuals performing the latin
square task. Metacognition and Learning, 13(3), 309326.
*Enders, F. B., Jenkins, S., & Hoverman, V. (2010). Calibrated peer review for interpreting linear regression
parameters: Results from a graduate course. Journal of Statistics Education,18(2).
*English, R., Brookes, S. T., Avery, K., Blazeby, J. M., & Ben-Shlomo, Y. (2006). The effectiveness and
reliability of peer-marking in first-year medical students. Medical Education, 40(10), 965-972.
*Erfani, S. S., & Nikbin, S. (2015). The effect of peer-assisted mediation vs. tutor-intervention within dynamic
assessment framework on writing development of Iranian Intermediate EFL Learners. English Language
Teac h i ng, 8(4), 128141.
Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education: A meta-analysis comparing
peer and teacher marks. Review of Educational Research, 70(3), 287322.
*Farrell, K. J. (1977). A comparison of three instructional approaches for teaching written composition to high
school juniors: teacher lecture, peer evaluation, and group tutoring (Unpublished doctoral dissertation),
Boston University, Boston.
Fisher, Z., Tipton, E., & Zhipeng, Z. (2017). robumeta: Robust variance meta-regression (Version 2). Retrieved
from = robumeta
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378.
Flórez, M. T., & Sammons, P. (2013). Assessment for learning: Effects and impact: CfBT Education Trust.
England: Reading.
Fyfe, E. R., & Rittle-Johnson, B. (2016). Feedback both helps and hinders learning: The causal role of prior
knowledge. Journal of Educational Psychology, 108(1), 82.
Gielen, S., Peeters, E., Dochy, F., Onghena, P., & Struyven, K. (2010a). Improving the effectiveness of peer
feedback for learning. Learning and Instruction, 20(4), 304315.
*Gielen, S., Tops, L., Dochy, F., Onghena, P., & Smeets, S. (2010b). A comparative study of peer and teacher
feedback and of various peer feedback forms in a secondary school writing curriculum. British Educational
Research Journal,36(1), 143-162.
Gorard, S. (2002). Can we overcome the methodological schism? Four models for combining qualitative and
quantitative evidence. Research Papers in Education Policy and Practice, 17(4), 345361.
Graner, M. H. (1987). Revision workshops: An alternative to peer editing groups. The English Journal, 76(3), 4045.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81112.
Hays, M. J., Kornell, N., & Bjork, R. A. (2010). The costs and benefits of providing feedback during learning.
Psychonomic bulletin & review, 17(6), 797801.
Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. journal of.
Educational Statistics, 6(2), 107128.
Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta-regression with
dependent eff ect size estimates. Research Synthesis Methods, 1(1), 3965.
Higgins, J. P., & Green, S. (2011). Cochrane handbook for systematic reviews of interventions. The Cochrane
Collaboration. Version 5.1.0,
Hoaglin, D. C., & Iglewicz, B. (1987). Fine-tuning some resistant rules for outlier labeling. Journal of the
American Statistical Association, 82(400), 11471149.
Educational Psychology Review (2020) 32:481509 505
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hopewell, S., McDonald, S., Clarke, M. J., & Egger, M. (2007). Grey literature in meta-analyses of randomized
trials of health care interventions. Cochrane Database of Systematic Reviews.
*Horn, G. C. (2009). Rubrics and revision: What are the effects of 3 RD graders using rubrics to self-assess or
peer-assess drafts of writing? (Unpublished doctoral thesis), Boise State University
Hox, J. J. (1998). Multilevel modeling: When and why. In I. Balderjahn, R. Mathar, & M. Schader (Eds.),
Classification, data analysis, and data highways (pp. 147154). New Yor: Springer Verlag.
*Hsia, L. H., Huang, I., & Hwang, G. J. (2016). Aweb-based peer-assessment approach to improving junior high
school studentsperformance, self-efficacy and motivation in performing arts courses. British Journal of
Educational Technology, 47(4), 618632.
*Hsu, T. C. (2016). Effects of a peer assessment system based on a grid-based knowledge classification approach
on computer skills training. Journal of Educational Technology & Society,19(4), 100-111.
*Hussein, M. A. H., & Al Ashri, El Shirbini A. F. (2013). The effectiveness of writing conferences and peer
response groups strategies on the EFL secondary students' writing performance and their self efficacy (A
Comparative Study). Egypt: National Program Zero.
*Hwang, G. J., Hung, C. M., & Chen, N. S. (2014). Improving learning achievements, motivations and problem-
solving skills through a peer assessment-based game development approach. Educational Technology
Research and Development, 62(2), 129145.
*Hwang, G. J., Tu, N. T., & Wang, X. M. (2018). Creating interactive E-books through learning by design: The
impacts of guided peer-feedback on studentslearning achievements and project outcomes in science
courses. Journal of Educational Technology & Society, 21(1), 2536.
*Kamp, R. J., van Berkel, H. J., Popeijus, H. E., Leppink, J., Schmidt, H. G., & Dolmans, D. H. (2014). Midterm
peer feedback in problem-based learning groups: The effect on individual contributions and achievement.
Advances in Health Sciences Education, 19(1), 5369.
*Karegianes, M. J., Pascarella, E. T., & Pflaum, S. W. (1980). The effects of peer editing on the writing
proficiency of low-achieving tenth grade students. The Journal of Educational Research,73(4), 203-207.
*Khonbi, Z. A., & Sadeghi, K. (2013). The effect of assessment type (self vs. peer) on Iranian university EFL
studentscourse achievement. Procedia-Social and Behavioral Sciences,70, 1552-1564.
Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a
meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254.
Könings, K. D., van Zundert, M., & van Merriënboer, J. J. G. (2019). Scaffolding peer-assessment skills: Risk of
interference with learning domain-specific skills? Learning and Instruction, 60,8594.
*Kurihara, N. (2017). Do peer reviews help improve student writing abilities in an EFL high school classroom?
TESOL Journal, 8(2), 450470.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics,
33(1), 159174.
*Li, L., & Gao, F. (2016). The effect of peer assessment on project performance of students at different learning
levels. Assessment & Evaluation in Higher Education, 41(6), 885900.
*Li, L., & Steckelberg, A. (2004). Using peer feedback to enhance student meaningful learning. Chicago:
Association for Educational Communications and Technology.
Li, H., Xiong, Y., Zang, X., Kornhaber, M. L., Lyu, Y., Chung, K. S., & Suen, K. H. (2016). Peer assessment in
the digital age: a meta-analysis comparing peer and teacher ratings. Assessment & Evaluation in Higher
Education, 41(2), 245264.
*Lin, Y.-C. A. (2009). An examination of teacher feedback, face-to-face peer feedback, and google documents
peer feedback in Taiwanese EFL college studentswriting. (Unpublished doctoral dissertation), Alliant
International University, San Diego, United States
Lipsey, M. W., & Wilson, D. B. (2001). Practical Meta-analysis. Thousand Oaks: SAGE publications.
*Liu, C.-C., Lu, K.-H., Wu, L. Y., & Tsai, C.-C. (2016). The impact of peer review on creative self-efficacy and learning
performance in Web 2.0 learning activities. Journal of Educational Technology & Society, 19(2):286-297
Lundstrom, K., & Baker, W. (2009). To give is better than to receive: The benefits of peer review to the
reviewer's own writing. Journal of Second Language Writing, 18(1), 3043.
*McCurdy, B. L., & Shapiro, E. S. (1992). A comparison of teacher-, peer-, and self-monitoring with curriculum-
based measurement in reading among students with learning disabilities. The Journal of Special Education,
26(2), 162-18 0.
Moeyaert, M., Ugille, M., Natasha Beretvas, S., Ferron, J., Bunuan, R., & Van den Noortgate, W. (2017). Methods for
dealing with multiple outcomes in meta-analysis: a comparison between averaging effect sizes, robust variance
estimation and multilevel meta-analysis. International Journal of Social Research Methodology, 20(6), 559572.
*Montanero, M., Lucero, M., & Fernandez, M.-J. (2014). Iterative co-evaluation with a rubric of narrative texts in
primary education. Journal for the Study of Education and Development, 37(1), 184-198.
Morris, S. B. (2008). Estimating effect sizes from pretest-posttest-control group designs. Organizational
Research Methods, 11(2), 364386.
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
*Olson, V. L. B. (1990). The revising processes of sixth-grade writers with and without peer feedback. The
Journal of Educational Research, 84(1), 2229.
Ossenberg, C., Henderson, A., & Mitchell, M. (2018). What attributes guide best practice for effective feedback?
A scoping review. Advances in Health Sciences Education,119.
*Ozogul, G., Olina, Z., & Sullivan, H. (2008). Teacher, self and peer evaluation of lesson plans written by
preservice teachers. Educational Technology Research and Development, 56(2), 181.
Panadero, E., & Alqassab, M. (2019). An empirical review of anonymity effects in peer assessment, peer
feedback, peer review, peer evaluation and peer grading. Assessment & Evaluation in Higher Education,1
Panadero, E., & Jonsson, A. (2013). The use of scoring rubrics for formative assessment purposes revisited: A
review. Educational Research Review, 9, 129144.
Panadero, E., Romero, M., & Strijbos, J. W. (2013). The impact of a rubric and friendship on peer assessment:
Effects on construct validity, performance, and perceptions of fairness and comfort. Studies in Educational
Evaluation, 39(4), 195203.
*Papadopoulos, P. M., Lagkas, T. D., & Demetriadis, S. N. (2012). How to improve the peer review method:
Free-selection vs assigned-pair protocol evaluated in a computer networking course. Computers &
Education, 59(2), 182195.
Paulus, T. M. (1999). The effect of peer and teacher feedback on student writing. Journal of second language
writing, 8(3), 265289.
Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). Knowing what students know: the science and design of
educational assessment. Washington: National Academy Press.
Peters, O., Körndle, H., & Narciss, S. (2018). Effects of a formative assessment script on how vocational students
generate formative feedback to a peers or their own performance. European Journal of Psychology of
Education, 33(1), 117143.
*Philippakos, Z. A., & MacArthur, C. A. (2016). The effects of giving feedback on the persuasive writing of
fourth-and fifth-grade students. Reading Research Quarterly, 51(4), 419-433.
*Pierson, H. (1967). Peer and teacher correction: A comparison of the effects of two methods of teaching
composition in grade nine English classes. (Unpublished doctoral dissertation), New York University.
*Prater, D., & Bermudez, A. (1993). Using peer response groups with limited English proficient writers.
Bilingual Research Journal,17(1-2), 99-116.
Reinholz, D. (2016). The assessment cycle: A model for learning through peer assessment. Assessment &
Evaluation in Higher Education, 41(2), 301315.
*Rijlaarsdam, G., & Schoonen, R. (1988). Effects of a teaching program based on peer evaluation on written
composition and some variables related to writing apprehension. (Unpublished doctoral dissertation),
Amsterdam University, Amsterdam
Rollinson, P. (2005). Using peer feedback in the ESL writing class. ELT Journal, 59(1), 2330.
Rotsaert, T., Panadero, E., & Schellens, T. (2018). Anonymity as an instructional scaffold in peer assessment: its
effects on peer feedback quality and evolution in studentsperceptions about peer assessment skills.
European Journal of Psychology of Education, 33(1), 7599.
*Rudd II, J. A., Wang, V. Z., Cervato, C., & Ridky, R. W. (2009). Calibrated peer review assignments for the
Earth Sciences. Journal of Geoscience Education,57(5), 328-334.
*Ruegg, R. (2015). The relative effects of peer and teacher feedback on improvement in EFL students' writing
ability. Linguistics and Education, 29, 73-82.
*Sadeghi, K., & Abolfazli Khonbi, Z. (2015). Iranian university studentsexperiences of and attitudes towards
alternatives in assessment. Assessment & Evaluation in Higher Education, 40(5), 641665.
*Sadler, P. M., & Good, E. (2006). The impact of self- and peer-grading on student learning. Educational
Assessment,11(1), 1-31.
Sanchez, C. E., Atkinson, K. M., Koenka, A. C., Moshontz, H., & Cooper, H. (2017). Self-grading and peer-
grading for formative and summative assessments in 3rd through 12th grade classrooms: A meta-analysis.
Journal of Educational Psychology, 109(8), 1049.
Sawilowsky, S. S. (2009). New effect size rules of thumb. Journal of Modern Applied Statistical Methods, 8(2),
*Schonrock-Adema, J., Heijne-Penninga, M., van Duijn, M. A., Geertsma, J., & Cohen-Schotanus, J. (2007).
Assessment of professional behaviour in undergraduate medical education: Peer assessment enhances
performance. Medical Education, 41(9), 836-842.
Schwarzer, G., Carpenter, J. R., & Rücker, G. (2015). Meta-analysis with R. Cham: Springer.
*Sippel, L., & Jackson, C. N. (2015). Teacher vs. peer oral corrective feedback in the German language
classroom. Foreign Language Annals,48(4), 688-705.
Educational Psychology Review (2020) 32:481509 507
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Sluijsmans, D. M., Brand-Gruwel, S., van Merriënboer, J. J., & Martens, R. L. (2004). Training teachers in peer-
assessment skills: Effects on performance and perceptions. Innovations in Education and Teaching
International, 41(1), 5978.
Smith, H., Cooper, A., & Lancaster, L. (2002). Improving the quality of undergraduate peer assessment: A case
for student and staff development. Innovations in education and teaching international, 39(1), 7181.
Smith, M. K., Wood, W. B., Adams, W. K., Wieman, C., Knight, J. K., Guild, N., & Su, T. T. (2009). Why peer
discussion improves student performance on in-class concept questions. Science, 323(5910), 122124.
Steel, P. D., & Kammeyer-Mueller, J. D. (2002). Comparing meta-analytic moderator estimation techniques
under realistic conditions. Journal of Applied Psychology, 87(1), 96.
Stipek, D., & Iver, D. M. (1989). Developmental change in children's assessment of intellectual competence.
Child Development,521538.
Strijbos, J. W., & Wichmann, A. (2018). Promoting learning by leveraging the collaborative nature of formative
peer assessment with instructional scaffolds. European Journal of Psychology of Education, 33(1), 19.
Strijbos, J.-W., Narciss, S., & Dünnebier, K. (2010). Peer feedback content and sender's competence level in
academic writing revision tasks: Are they critical for feedback perceptions and efficiency? Learning and
Instruction, 20(4), 291303.
*Sun, D. L., Harris, N., Walther, G., & Baiocchi, M. (2015). Peer assessment enhances student learning: The
results of a matched randomized crossover experiment in a college statistics class. PLoS One 10(12),
Tannacito, T., & Tuzi, F. (2002). A comparison of e-response: Two experiences, one conclusion. Kairos, 7(3), 1
Team, R. (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for
Statistical Computing; 2017: R Core Team.
Topping, K. (1998). Peer assessment between students in colleges and universities. Review of Educational
Research, 68(3), 249-276.
Topping, K. (2009). Peer assessment. Theory Into Practice, 48(1), 2027.
Usher, N. (2018). Learning about academic writing through holistic peer assessment. (Unpiblished doctoral
thesis), University of Oxford, Oxford, UK.
*van den Boom, G., Paas, F., & van Merriënboer, J. J. (2007). Effects of elicited reflections combined with tutor
or peer feedback on self-regulated learning and learning outcomes. Learning and Instruction,17(5), 532-
*van Ginkel, S., Gulikers, J., Biemans, H., & Mulder, M. (2017). The impact of the feedback source on
developing oral presentation competence. Studies in Higher Education, 42(9 ), 1671-1685.
van Popta, E., Kral, M., Camp, G., Martens, R. L., & Simons, P. R. J. (2017). Exploring the value of peer
feedback in online learning for the provider. Educational Research Review, 20,2434.
van Zundert, M., Sluijsmans, D., & van Merriënboer, J. (2010). Effective peer assessment processes: Research
findings and future directions. Learning and Instruction, 20(4), 270279.
Vanderhoven, E., Raes, A., Montrieux, H., Rotsaert, T., & Schellens, T. (2015). What if pupils can assess their
peers anonymously? A quasi-experimental study. Computers & Education, 81,123132.
Wang, J.-H., Hsu, S.-H., Chen, S. Y., Ko, H.-W., Ku, Y.-M., & Chan, T.-W. (2014a). Effects of a mixed-mode
peer response on student response behavior and writing performance. Journal of Educational Computing
Research, 51(2), 233256.
*Wang, J. H., Hsu, S. H., Chen, S. Y., Ko, H. W., Ku, Y. M., & Chan, T. W. (2014b). Effects of a mixed-mode
peer response on student response behavior and writing performance. Journal of Educational Computing
Research,51(2), 233-256.
*Wang, X.-M., Hwang, G.-J., Liang, Z.-Y., & Wang, H.-Y. (2017). Enhancing studentscomputer programming
performances, critical thinking awareness and attitudes towards programming: An online peer-assessment
attempt. Journal of Educational Technology & Society, 20(4), 58-68.
Wiliam, D. (2010). What counts as evidence of educational achievement? The role of constructs in the pursuit of
equity in assessment. Review of Research in Education, 34(1), 254284.
Wiliam, D. (2018). How can assessment support learning? A response to Wilson and Shepard, Penuel, and
Pellegrino. Educational Measurement: Issues and Practice, 37(1), 4244.
Wiliam, D., Lee, C., Harrison, C., & Black, P. (2004). Teachers developing assessment for learning: Impact on
student achievement. Assessment in Education: Principles, Policy & Practice, 11(1), 4965.
*Wise,W.G.(1992).The effects of revision instruction on eighth graders' persuasive writing (Unpublished
doctoral dissertation), University of Maryland, Maryland
*Wong, H. M. H., & Storey, P. (2006). Knowing and doing in the ESL writing class. Language Awareness,15(4),
*Xie, Y., Ke, F., & Sharma, P. (2008). The effect of peer feedback for blogging on college students' reflective
learning processes. The Internet and Higher Education,11(1), 18-25.
Educational Psychology Review (2020) 32:481509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Young, J. E., & Jackman, M. G.-A. (2014). Formative assessment in the Grenadian lower secondary school:
Tea che rsperceptions, attitudes and practices. Assessment in Education: Principles, Policy & Practice,
21(4), 398411.
Yu, F.-Y., & Liu, Y.-H. (2009). Creating a psychologically safe online space for a student-generated questions
learning activity via different identity revelation modes. British Journal of Educational Technology, 40(6),
110911 23.
Publishers Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Educational Psychology Review (2020) 32:481509 509
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center
GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers
and authorised users (“Users”), for small-scale personal, non-commercial use provided that all
copyright, trade and service marks and other proprietary notices are maintained. By accessing,
sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of
use (“Terms”). For these purposes, Springer Nature considers academic use (by researchers and
students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and
conditions, a relevant site licence or a personal subscription. These Terms will prevail over any
conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to
the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of
the Creative Commons license used will apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may
also use these personal data internally within ResearchGate and Springer Nature and as agreed share
it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise
disclose your personal data outside the ResearchGate or the Springer Nature group of companies
unless we have your permission as detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial
use, it is important to note that Users may not:
use such content for the purpose of providing other users with access on a regular or large scale
basis or as a means to circumvent access control;
use such content where to do so would be considered a criminal or statutory offence in any
jurisdiction, or gives rise to civil liability, or is otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association
unless explicitly agreed to by Springer Nature in writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a
systematic database of Springer Nature journal content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a
product or service that creates revenue, royalties, rent or income from our content or its inclusion as
part of a paid for service or for other commercial gain. Springer Nature journal content cannot be
used for inter-library loans and librarians may not upload Springer Nature journal content on a large
scale into their, or any other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not
obligated to publish any information or content on this website and may remove it or features or
functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke
this licence to you at any time and remove access to any copies of the Springer Nature journal content
which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or
guarantees to Users, either express or implied with respect to the Springer nature journal content and
all parties disclaim and waive any implied warranties or warranties imposed by law, including
merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published
by Springer Nature that may be licensed from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a
regular basis or in any other manner not expressly permitted by these Terms, please contact Springer
Nature at
... Conversely, in a doubleblind review, both authors and reviewers were unaware of each other's identity (6). The peer review mechanism, foundational to scholarly publishing, is both a sentinel and catalyst in academia (5,14). It not only ensures the quality and integrity of scholarly outputs, but also shapes and refines the trajectory of scientific discourse. ...
... Min (13) summarized the following six simple, yet important, tips to assist authors in responding appropriately to peerreviewers' comments: i) Letter to the editor and reviewers, ii) Be polite and respectful, iii) Respond point-by-point to each and every comment raised by all reviewers, iv) Make the response self-contained, v) Stay optimistic; and vi) Check repeatedly for any mistake. PRP offers manifold benefits to manuscript authors, serving as a bridge between the initial research conception and final publication (5,14). Peer-review provides authors with a unique opportunity to view their work through the lens of external experts (4). ...
... Beyond their specific areas of expertise, reviewers benefit from exposure to a broad spectrum of academic perspectives (6,8,17). This expanded purview encourages interdisciplinary engagement, promotes collaborative research endeavours, and provides a more holistic approach to academic inquiries (5). For established academics, especially those in mentorship roles, insights derived from PRP are instrumental (6). ...
Full-text available
The integrity of the peer-review process (PRP) is paramount in academic publishing and serves as a critical filter for scholarly output. This mini-review centers on the introduction of comprehensive guidelines, presented in tables format, aimed at streamlining the interactions between authors and reviewers during the PRP. These guidelines, derived from an in-depth exploration of the PRP, offer structured and practical advice to ensure constructive, transparent, and effective communication, especially related to the use of artificial intelligence. While this mini-review discusses the strengths and challenges of the current PRP, its primary focus is on providing tangible recommendations to enhance the quality and efficiency of the PRP. By providing explicit guidelines and emphasizing the cooperative essence of peer review, this mini-review aims to improve the PRP, ensuring that it remains a robust mechanism for upholding the highest standards of research and knowledge dissemination in an evolving academic setting.
... Students have the chance to learn collaboratively through peer assessment by really supporting one another and sharing their experiences [16]. It enhances academic achievement [17], [18], EFL students' independence and understanding of their own metacognition when completing writing assignments [19]. Peer assessment has the potential to effectively address the issue of the teacher-student ratio in Chinese higher education while also fostering the growth of a variety of learner abilities. ...
... Besides that peer assessment can be a tool for monitoring students [11]. This finding also supports that peer assessment is more effective than self-assessment and teacher-assessment because it also offers an objective assessment similar to the teacher's assessment [18], [22], [32] and there is no bias friendship in doing peer assessment [13]. We can argue that collaborative peer assessment is the best choice among other assessment in genre-based writing class since its effectiveness and objectiveness. ...
Full-text available
span lang="EN-US">The existing peer assessment model for genre-based writing must be developed to gain the maximum quality of the assessment. It needs to be integrated with collaborative learning and problem-based learning and make the peer assessment as part of learning. Thus, this research has aim to determine the effectiveness of the peer assessment model in the group genre-based writing class. The method used was quantitative method with comparative design. The model was developed at seventh semester of English Education Department, Universitas Islam Negeri Salatiga, Indonesia. There are 23 pre-service English teachers (PSETs) joining on the implementation developed peer assessment model collaborative genre-based writing peer assessment model. The result revealed that there is no difference between the score from the peer assessment and the score given by the lecturer. It can be concluded that the collaborative genre-based writing peer assessment conducted by students has similarities with the assessment conducted by the teacher. It supports that the students’ and teacher’s assessment has the same quality.</span
... Peer feedback has been applied in a wide range of disciplines and levels due to its potential to enhance learning (Liu and Carless 2006). Importantly, peer feedback has been demonstrated to have a positive influence on cognitive and non-cognitive outcomes (Huisman et al. 2019;Double, McGrane, and Hopfenbeck 2020;Usher 2023). During peer feedback, assessors provide qualitative information about the assessee's performance, which assessees may use to improve their performance (Panadero, Jonsson, and Alqassab 2018). ...
Full-text available
Previous research has demonstrated the benefits of peer feedback for improving student work. Gender, as an individual characteristic, is now receiving increased attention due to its influence on the peer feedback process. This study examined the effects of gender and peer assessment training on the amount and content of peer feedback provided by assessors for poor, average and excellent writing samples, using a randomised controlled design. A total of 240 undergraduate psychology students participated in the study. Half of the participants received peer assessment training, while the other half received task instructions only. Participants were assigned to eight subgroups, providing peer feedback to writing samples attributed to fictitious male or female assessee. Analysis of 3017 feedback segments revealed that women provided a greater amount of peer feedback compared to men. Women also offered more positive verifications and suggestive elaborations for average and poor writing samples. Male assessees received more suggestive elaborations, while trained assessors provided more positive verifications. These findings suggest the need for a multifaceted training programme to bridge the gap between gender-based differences in peer feedback characteristics.
... In other words, for a teacher to conduct formative assessment, they will need to know each student, their learning progress, and how to support them to achieve their learning goals. In traditional classrooms, formative assessment challenges teachers, as it requires them to find ways of following up a whole class or classes of students and provide individualized feedback to everyone, either through teacher-assessment, peer assessment, self-assessment, group-assessment, or by other means (Double et al., 2020). As we will discuss in the next section, research has shown that these practices are difficult to implement at scale and in ways that are sustainable over time (Hopfenbeck and Stobart, 2015). ...
Full-text available
The integration of artificial intelligence (AI) into educational contexts may give rise to both positive and negative ramifications for teachers' uses of formative assessment within their classrooms. Drawing on our diverse experiences as academics, researchers, psychometricians, teachers, and teacher educators specializing in formative assessment, we examine the pedagogical practices in which teachers provide feedback, facilitate peer-and self-assessments, and support students' learning, and discuss how existing challenges to each of these may be affected by applications of AI. Firstly, we overview the challenges in the practice of formative assessment independently of the influence of AI. Moreover, based on the authors' varied experience in formative assessment, we discuss the opportunities that AI brings to address the challenges in formative assessment as well as the new challenges introduced by the application of AI in formative assessment. Finally, we argue for the ongoing importance of self-regulated learning and a renewed emphasis on critical thinking for more effective implementation of formative assessment in this new AI-driven digital age.
... Recent meta-analyses have confirmed the beneficial effects of peer assessment on students' academic achievements and their lifelong learning skills (e.g., Li et al., 2020;Sanchez et al., 2017;Yan et al., 2022;Zheng et al., 2020). However, the meta-analyses that have compared peer assessment to self-assessment (Double et al., 2020;Huisman et al., 2019;Li et al., 2020;Yan et al., 2022) have only drawn on a small number of studies that directly compared peer assessment to self-assessment. This may account for the inconsistencies in the results as to the relative effects of peer assessment and self-assessment. ...
Full-text available
Research suggests that troubleshooting activities that require students to reflect on teacher-crafted erroneous examples; i.e., erroneous solutions to problems that correspond to widespread naïve ideas, are beneficial to learning. One possible explanation to these beneficial effects is that troubleshooting activities encourage students to test the quality of their own naïve ideas, not only the ones driving the erroneous examples, thereby improving learning. Few studies have addressed this claim, and the results are inconsistent. These studies, however, were not designed to examine the extent to which students with different naïve ideas benefit from troubleshooting activities. Here, ten 9th grade classes took part in a field experimental study that applied a pre-post-test design after finishing a unit on exponents. Students in each class were randomly assigned to a troubleshooting (114 students) or a self-diagnosis activity (112 students). Self-diagnosis activities are considered to directly nudge students to examine the quality of their own naïve ideas by requiring them to reflect on their solutions. The troubleshooting and self-diagnosis activities both capitalized on the pre-test problems. Both groups increased their proficiency in exponents to a comparable extent from the pre-test to the immediate and the delayed post-test. Troubleshooting students with different naïve ideas detected the errors in the erroneous examples equally well, and their error detection significantly and positively correlated with their self-repair of their own naïve ideas. These findings suggest that all the students benefitted from troubleshooting activities, regardless of whether their own naïve ideas resembled the ones driving the erroneous examples or not.
... With the dramatic development of intelligent systems, machine learning schemes have been involved in predicting student performance, which could affect the analysis of the overall outcome of students in their study field. Moreover, the early performance evaluation of students can assist in identifying their strengths and weaknesses and improve their exam results [1]. ...
Full-text available
Success in student learning is the primary aim of the educational system. Artificial intelligence utilizes data and machine learning to achieve excellence in student learning. In this paper, we exploit several machine learning techniques to estimate early student performance. Two main simulations are used for the evaluation. The first simulation used the Traditional Machine Learning Classifiers (TMLCs) applied to the House dataset, and they are Gaussian Naïve Bayes (GNB), Support Vector Machine (SVM), Decision Tree (DT), Multi-Layer Perceptron (MLP), Random Forest (RF), Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis (QDA). The best results were achieved with the MLP classifier with a division of 80% training and 20% testing, with an accuracy of 88.89%. The fusion of these seven classifiers was also applied and the highest result was equal to the MLP. Moreover, in the second simulation, the Convolutional Neural Network (CNN) was utilized and evaluated on five main datasets, namely, House, Western Ontario University (WOU), Experience Application Programming Interface (XAPI), University of California-Irvine (UCI), and Analytics Vidhya (AV). The UCI dataset was subdivided into three datasets, namely, UCI-Math, UCI-Por, and UCI-Fused. Moreover, the AV dataset has three targets which are Math, Reading, and Writing. The best accuracy results were achieved at 97.5%, 99.55%, 98.57%, 99.28%, 99.40%, 99.67%, 92.93%, 96.99%, and 96.84% for the House, WOU, XAPI, UCI-Math, UCI-Por, UCI-Fused, AV-Math, AV-Reading, and AV-Writing datasets, respectively, under the same protocol of evaluation. The system demonstrates that the proposed CNN-based method surpasses all seven conventional methods and other state-of-the-art-work.
Full-text available
AbstrakPenelitian ini bertujuan untuk mengetahui efektivitas dari penilaian sejawat pada keterampilan gerak lokomotor peserta didik pada permainan gobak sodor siswa kelas VIII Sekolah Menengah Pertama. Penelitian ini menggunakan rancangan penelitian non-eksperimen dengan pendekatan kuantitatif. Prosedur penelitian ini diantaranya: 1) pembuatan instrumen; 2) pemilihan sampel, sampel penelitian ini adalah dua kelas VIII di SMP Negeri 2 Pakisaji dengan jumlah 61 siswa, dipilih menggunakan teknik purposive sampling; 3) pembuatan video petunjuk pelaksanaan; 4) pemberian arahan dan pelatihan pelaksanaan penilaian sejawat (treatment); 5) penilaian; 6) pengumpulan data; 7) menganalisis data, setelah memperoleh data teknik analisis data yang digunakan dalam penelitian ini adalah uji korelasi. Berdasarkan hasil analisis data, uji korelasi penilaian sejawat dengan rata-rata penilaian sejawat menggunakan korelasi pearson menunjukkan nilai r(2-tailed) 0,92-0,94. Maka dapat ditarik kesimpulan bahwa terdapat hubungan tingkat konsistensi yang sangat kuat terhadap penilaian sejawat dengan sejawat pada permainan gobak sodor. Penggunaan instrumen penilaian sejawat pada keterampilan gerak lokomotor pada permainan tradisional gobak sodor dinyatakan efektif.
In peer review, students provide formative assessment and feedback for their peers with the aim of improving writing. Peer review is mainly used in English language arts classrooms, but it has been shown to be effective across content areas when implemented correctly. Empirical and qualitative studies provide evidence that peer review has a positive effect on student writing outcomes and concurrent cognitive benefits for students, including increased writing self-efficacy, improved social skills, and higher self-regulation. Peer review is a collaborative classroom practice that promotes equity for learners across age groups, ability levels, and language proficiencies.
Today, many educational reforms emphasize the importance of formative assessment (FA) for effectiveness in teaching. Considering the importance of FA in education and a perceived lack of same, it is clear that much more research is needed on this subject. However, it is also very important for researchers to formulate new studies and not to repeat exiting ones, and to increase the visibility of exiting effective studies that can make teachers in service understand the importance of FA. In this context, being able to review qualified academic research is very valuable. Therefore, we aimed to conduct a bibliometric analysis using the VOSviewer to determine the focus of research on formative assessment in education (FAE). This bibliometric analysis included 447 studies on FA published in the Web of Science (WoS) from 2000 to 2021. We performed the citation analysis, created a co-authored network map, and performed an analysis of author keywords in FA publications. Results showed that Erin M. Furtak has been the most prolific author in FAE in terms of the number of publications. Moreover, David J. Nicol and Debra Macfarlane-Dick, who had a co-authored publication, were the most influential authors in terms of citations. Results indicated that Univ Colorado/USA with 13 publications was the most productive institution. However, the USA and England were the most productive countries in terms of both numbers of publications and citations. Of the 19 documents with over a hundred citations, Nicol and Macfarlane-Dick (2006) and Black and Wiliam (2009) were the most influential documents. According to the number of publications and citations, “Assessment & Evaluation in Higher Education” and “Computers & Education” came to the fore among the top five most productive sources. The results of the co-occurrence analysis showed that the terms “assessment,” “mathematics,” and “professional development” were most often co-occurrence. Moreover, FA, which was the focus of this bibliometric analysis, had a high degree of co-occurrence with feedback, followed by summative evaluation, self-regulation, and professional development. These results will contribute significantly to the efforts of the scientific community towards FA research.
Full-text available
Recent research has leveraged peer assessment as a grading system tool where learners are involved in learning and evaluation. However, there is limited knowledge regarding individual differences, such as personality, in peer assessment tasks. We analyze how personality factors affect the peer assessment dynamics of a semester-long remote learning course. Specifically, we investigate how psychological constructs shape how people perceive user-generated content, interact with it, and assess their peers. Our results show that personality traits can predict how effective the peer assessment process will be and the scores and feedback that students provide to their peers. In conclusion, we contribute design guidelines based on personality constructs as valuable factors to include in the design pipeline of peer assessment systems.
Full-text available
Confidence ratings (CR) are often used to evaluate the metacognitive processes that occur during reasoning and problem solving. Typically CR are elicited with the assumption that they do not affect participants’ underlying cognitive processes. However, recent evidence suggests that eliciting CR can cause changes in cognitive performance. What is not yet clear, are the metacognitive pathways by which CR affect overall performance in older individuals. In order to better understand the mechanisms driving reactivity to CR, we evaluated the impact of eliciting CR in an older sample (N = 89) on two aspects of the metacognitive framework - monitoring and control. Participants first rated their prospective confidence before performing the Latin Square Task either with or without confidence ratings. Participants subsequently self-appraised their performance. We found evidence that eliciting CR leads to poorer metacognitive monitoring. In addition, we found that participants with high initial prospective self-confidence who perform CR adopt a more immediate performance-orientated control strategy, which improves short-term performance but has no effect on overall performance in a timed Latin Square Task.
Full-text available
There has been an observed increase in literature concerning feedback within the last decade, with the importance of feedback well documented. Current discourse promotes feedback as an interactive, dialogic process between the learner and the learning partner. While much has been written about effective feedback, less is known about key elements that support dialogic feedback. It is therefore important to investigate what is known about the elements that guide best practice for effective feedback. A scoping review of the extant literature following Arksey and O’Malley’s methodology was conducted. A search of literature published in English identified sixty-one publications eligible for this review. Publications were representative of the international literature from both empirical and non-empirical sources. Feedback elements were extracted from the included publications and categorised into 11 core attributes. The attributes identified feedback as: being a process; criteria-based; requiring multiple forms and sources of data/evidence; needs to be desired by the recipient (i.e. invited and welcomed); timely; responsive to the learner (i.e. tailored to developmental needs/learning preferences of the learner); frequent; future-focussed; reciprocal (i.e. two-way); involves skilful interaction; and is multidimensional (i.e. engages the learner in more than one way). Despite the rhetoric on feedback as a ‘dialogic process’, a gap remains in our understanding around what is required to engage the learner as an equal partner in the feedback process. Further research exploring the impact of specific aspects of the feedback process on practice is required.
Full-text available
Over the past two decades, formative peer assessment has become a popular instructional approach. Initially, it was more readily applied in higher education but has since expanded to other educational levels, including primary and secondary education. The popularity is understandable given the increased amount of feedback by multiple peers and enhanced awareness of performance criteria. Although it is increasingly acknowledged by the research community that formative peer assessment is inherently a social endeavour, the collaborative nature is simultaneously the least-explored mechanism. The contributions in this special issue address this gap conceptualising peer assessment and peer feedback as both an individual and a collaborative learning practice. Furthermore, we highlight core learning conditions: learner characteristics, domain and task characteristics, and, finally, instructional scaffolds. © 2017 Instituto Superior de Psicologia Aplicada, Lisboa, Portugal and Springer Science+Business Media B.V.
Full-text available
The purposes of this study are threefold: It investigates effects of a formative assessment script (FAS) that was designed to support vocational students in generating feedback to (1) a peer’s and (2) their own performance. Effects of the FAS are investigated with respect to quantitative and qualitative characteristics of the peer and internal feedback generated by the students. Furthermore, this study examines (3) if generating peer feedback is beneficial for assessor’s generation of subsequent internal feedback. In a two-factorial quasi-experimental study, 75 vocational students firstly produced individual drafts for a typical technical planning task. Next, students either generated peer feedback (with vs. without FAS support) on a fictitious erroneous peer draft and subsequently on their own drafts or generated internal feedback only (with vs. without FAS support). Results yield beneficial effects of the FAS on generating peer feedback. Students, who were supported by the FAS-generated more comments on the peer draft, were more sensitive in detecting errors and missing information in the peer draft and generated more suggestions for improvement. With respect to assessors’ internal feedback generation, this study revealed mixed results. On the one hand, FAS-supported students generated more comments and ignored fewer erroneous elements in their own drafts. On the other hand, they neither detected more missing information nor generated more suggestions on how to improve their own drafts than students without FAS support. Unexpectedly, generating peer feedback prior to generating internal feedback had no effects on the quality of assessors’ subsequent internal feedback.
Peer assessment has proven to have positive learning outcomes. Importantly, peer assessment is a social process and some claim that the use of anonymity might have advantages. However, the findings have not always been in the same direction. Our aims were: (a) to review the effects of using anonymity in peer assessment on performance, peer feedback content, peer grading accuracy, social effects and students’ perspective on peer assessment; and (b) to investigate the effects of four moderating variables (educational level, peer grading, assessment aids, direction of anonymity) in relation to anonymity. A literature search was conducted including five different terms related to peer assessment (e.g., peer feedback) and anonymity. Fourteen studies that used a control group or a within group design were found. The narrative review revealed that anonymous peer assessment seems to provide advantages for students’ perceptions about the learning value of peer assessment, delivering more critical peer feedback, increased self-perceived social effects, a slight tendency for more performance, especially in higher education and with less peer assessment aids. Some conclusions are that: (a) when implementing anonymity in peer assessment the instructional context and goals need to be considered, (b) existent empirical research is still limited, and (c) future research should employ stronger and more complex research designs.
Giving students complex learning tasks combined with peer-assessment tasks can impose a high cognitive load. Scaffolding has proven to reduce cognitive load during learning and improve accuracy on domain-specific tasks. This study investigated whether scaffolding has a similar, positive effect on the learning of peer-assessment tasks. We hypothesised that: (1) domain-specific scaffolding improves domain-specific accuracy and reduces time on task and perceived mental effort, and (2) peer-assessment scaffolding improves peer-assessment accuracy and reduces time on task and perceived mental effort. Additionally, we explored whether there was an interaction between domain-specific and peer-assessment scaffolding. In a 2x2 experiment with the factors domain-specific scaffolding (present, absent) and peer-assessment scaffolding (present, absent), 236 secondary school students assessed the performance of fictitious peers in an electronic learning environment. We found that domain-specific accuracy indeed improved with domain-specific scaffolding, confirming our first hypothesis. Our tests of the second hypothesis, however, revealed surprising results: peer-assessment scaffolding significantly increased accuracy and mental effort during learning, it had no effect on peer-assessment accuracy at the test and led to reduced domain-specific accuracy, even when combined with domain-specific scaffolding. These results suggest that scaffolding students' peer assessment before they have mastered the task at hand can have disturbing effects on students' ability to learn from the task.
With the rapid progress of technology, the popularity of tablet computers and the development of e-book applications have brought the use of e-books as a learning tool under the spotlight. In the meantime, the aim of school education lies not only in providing students with knowledge but also in encouraging them to construct knowledge actively. Consequently, in this study, an approach of integrating the guided peerfeedback strategy into e-book design was proposed. An experiment was conducted on an elementary school natural science course to explore its effectiveness in comparison with the conventional e-book development activity. It was expected that the guided peer-feedback approach could engage students in knowledge organizing and in-depth thinking while stimulating more innovative ideas. To assess the impacts of this approach, a quasi-experimental design method was adopted. The students were divided into two groups: the experimental group, in which the students learned with the guided peer-feedback strategy together with the e-book development approach, and the control group, in which the students learned with the conventional ebook development approach. The experimental results indicated that the integrated guided peer-feedback and e-book development strategy had significant impacts on the students' learning achievements and ebook project outcomes while reducing their cognitive load and increasing their innovative thinking tendency in the design process.
It has become an important and challenging issue to foster students' concepts and skills of computer programming. Scholars believe that programming training could promote students' higher order thinking performance; however, many school teachers have reported the difficulty of teaching programming courses. Although several previous studies have attempted to develop friendly user interfaces to ease students' loads, teaching programming courses remains a big challenge for most school teach ers. In this study, an online peer assessment-based system was developed to cope with this problem. The students could use the peerassessment function to provide comments to peers, and review the feedback and scores from peers during the learning activity. A quasi experiment was conducted on four classes of 166 ninth graders of a junior high school located in southern Taiwan to examine the impacts of the developed system. Two classes of students were assigned to the experimental group, learning with an online peer assessment-based teaching strategy, while the other two classes were the control group, learning with the conventional teaching strategy. The experimental results showed that the students in the experimental group had better programming knowledge and skills as well as more positive learning attitudes and critical thinking awareness than those in the control group, revealing the benefits of the proposed approach.