ArticlePDF Available

A systematic review of tools that support peer assessment

Authors:

Abstract and Figures

Peer assessment is a powerful educational technique that provides significant benefits to both staff and students. Traditionally, peer assessment has been conducted using pen-and-paper in small classes. More recently, online tools have been developed to enable peer assessment to be applied in large class. In this article, the tools that support peer assessment are reviewed and analysed, revealing the common features and significant differences. Future directions for research on peer assessment tools are suggested.
Content may be subject to copyright.
Vol. 00, No. 00, Month 200x, 1–19
RESEARCH ARTICLE
A Systematic Review of Tools that Support Peer Assessment
Andrew Luxton-Reilly
(Received 6
th
July 2009; final version rece ived 20
th
September 2009)
Computer Science Department, The University of Auckland, Private Bag 92019, Auckland, New Zealand
Peer assessment is a powerful educational technique that provides significant benefits to both staff and students.
Traditionally, peer assessment has been conducted using pen-and-paper in small classes. More recently, online
tools have been developed to enable peer assessment to be applied in large class. In this paper, the tools that
support peer assessment are reviewed and analysed, revealing the common features and significant differences.
Future directions for research on peer assessment tools are suggested.
1 Introduction
The massification of tertiary education has impacted on quality and quantity of interactions
between instructors and students (Ballantyne, Hughes & Mylonas, 2002). Although the oppor-
tunities for instructors to provide detailed feedback on students’ work have decreased, a high
degree of individualized feedback for stud ents can be maintained by engaging them in tasks that
promote learning by interacting with each other. Hamer et al. (2008) report growing interest in
the use of contributing stu dent pedagogies among Computer Science educators. They define a
contributing student pedagogy (CSP) as:
A pedagogy that encourages students to contribute to the lear ning of others and to value the
contributions of others (pg 195).
Contributing student pedagogies characteristically involve the use of new (web-based) tech-
nologies. They encompass a wide range of activities, including that of peer assessment which has
been defined as:
. . . an arrangement in which individuals consider the amount, level, value, worth, quality or success
of the products or outcomes o f learning of peers of similar status (Topping, 1998).
Peer assessment is not a new assessment strategy. Peer assessment has been used in many
institutions for more than 50 years (S luijsmans, Brand-Gruwel & van Merrinboer, 2002), in a
wide range of higher education contexts such as academic writing, science, engineering, business
and medicine (Falchikov, 1995; Freeman & McKenzie, 2002). Peer review has been used as a
learning pro cess to improve the quality of computer programs for at least 30 years (Anderson &
Shneiderman, 1977).
In a review of the peer assessment literature, Topping (1998) concludes that peer assessment
has been used in a wide variety of contexts and that it can result in gains in the cognitive, social,
affective, transferable skill and sy s temic domains. The majority of the studies reviewed showed
an acceptably high level of validity and reliability. A subsequent review of peer assessment by
Do chy, Segers and Sluijsmans (1999) showed that peer assessment can be valuable as a formative
assessment method, and that students find the process sufficiently f air and accurate.
Email: andrew@cs.auckland.ac.nz
ISSN: 0899-3408 print/ISSN 1744-5175 online
c
200x Taylor & Francis
DOI: 10.1080/0899340YYxxxxxxxx
http://www.informaworld.com
Ballantyne et al. (2002) report significant benefits of peer assessment, but at the cost of
significant administrative overheads. O nline tools can alleviate this overhead by providing ap-
propriate administrative sup port for the management of the peer review pro cess. The use of
these tools enable peer assessment to be used in contexts such as large classes where it would be
infeasible without such support. Furthermore, a number of features such as anonymous online
discussion or automated weighting of reviews cannot be provided by traditional pen -and-paper
or face-to-face peer reviews.
The experience of using an online tool for reviewing assignments is q ualitatively different
to both face-to-face reviewing and using pen-and-paper to mark. An early study by Price and
Petre (1997) of instructors using electronic markin g reported numerous benefits over paper-based
marking, including: improved legibility, easy reuse of comments, faster turn-around time, lower
administrative overh eads and fewer administrative errors.
Plimmer and Apperley (2007) note that the act of marking paper-based assignments often
involves scanning the assignments, physically r eord ering the scripts and making annotations on
the scripts as a reminder of the critical points, summarising grades and as feedback to students.
The location of annotations on paper provides an easily identifiable reference point which is
clumsy to replicate with an online sys tem. The au th ors advocate the use of digital ink as a
means to retain the traditional advantages of pen-and-paper marking while using electronic
systems to relieve the administrative burden imposed by paper.
Murphy and Wolff (2005) compared “Minute Papers” created electronically with those created
using pen-and -paper. T hey found that the response rate using pen-and-paper was higher, but
the length of the student responses was much higher in the electronic version.
McLuckie and Topping (2004) note that although both face-to-face and online peer assessment
activities involve many s imilar skills, th ere are important differences. Face-to-face activities in-
volve socio-affective elements which are difficult to develop in online interactions. Other skills,
such as interactive pro cess management, that are essential for online environments are less crit-
ical for face-to-face interaction. Figl, Bauer and Mangler (2006) noted qualitative differences
between reviews conducted between teams using traditional pen-and-paper, an online tool, and
face-to-face. Students reported that communication was easier face-to-face, and that it was sig-
nificantly easier to give hints and helpful feedback using pen-and-paper compared to lling in
an online form.
The significant qualitative differences observed when different med iums are used to conduct
peer review highlight the importance of reviewing the research on tools that support peer as-
sessment. This is of p articular interest to Computer Science educators since the majority of the
tools described in this review have been developed by Computer Science ins tr uctors for use in
Computer Science classrooms.
Webster and Watson (2002) claim that the paucity of review articles pu blished in the in-
formation systems field impedes research progress. Although the literature on peer review has
previously been reviewed, the literature on the tools that support peer assessment has not. In
this paper we review the currently available tools, compare and contrast the features provided
by the tools and analyse these features with respect to the findings from the literature. The
following research questions are addressed.
(1) What are the common features and important differences between online tools?
(2) How does the implementation of features relate to the main findings reported by reviews
of the peer assessment literature?
(3) What directions for future research are indicated?
2 Method
A systematic literature review is a pro cess that seeks to aggregate empirical data using a formal
protocol. Kitchenham describes the process as:
“a means of evaluating and interpreting all available research relevant to a particular research
question of topic area or phenomenon of interest” (Kitchenham, 2004).
The procedures for practising evidence-based literature reviews have been well documented in
the medical domain (Sackett, R ichardson, Rosenberg & Haynes, 1997). More recently, the steps
for conducting evidence based reviews in software engineering have been identified and docu-
mented (Kitchenham, 2004). Brereton, Kitchenham, Budgen, Turner and Khalil (2007) report
that s ystematic literature reviews help researchers rigorously and systematically aggregate out-
comes from relevant emp irical research.
2.1 Data sources and stu dy selection
Primary studies were identified by searching the EEExp lore, ACM Digital Library, Google
Scholar, Citeseer, ScienceDirect and S pringerLink electronic databases. The Journal of C om-
puter Assisted Learning, Compu ter Science Education and Computers and Education were also
searched. The title, abstract and keywords were searched for the phrases (“peer assessment” OR
“peer review” OR “peer evaluation”). As not all of the databases su pported boolean phrases in
the same way, the search was adapted as required to obtain equivalent results.
The title and abstracts of the search results were assessed for relevance. Studies th at mentioned
the use of peer assessment in large classes were scanned to determin e if technology was u s ed
to assist the assessment process. Studies that mentioned the use of software to support peer
assessment were scanned to determine the nature of the software and how it was used.
In order to be included in this review, the software must have been designed specifically for
the purpose of s upporting peer assessment activities. This excluded a number of studies that
discussed the use of standard communication software such as email (Downing & Brown, 1997),
forums (Mann, 2005) and wikis (Xiao & Lucking, 2008; Lutteroth & Luxton-Reilly, 2008) for
peer assessment. Tools such as TeCTra (Raban & Litchfield, 2007) and SPARK (Freeman &
McKenzie, 2002) that are d esigned to be used in the context of group projects to support the
assessment of an ind ividual’s contribution within a team are explicitly excluded from this study.
Although these kinds of peer reviews have elements of commonality with the peer review of
artefacts, the review process is qualitatively different. In the review of teammates, students are
typically assessing in a competitive way (since a higher grade for a teammate n ormally results in
a lower person al score), and they are often required to evaluate personal qualities and impressions
built up over time, rather than assessing a distinctive artefact at a specific time.
Studies that describe a proposal for a tool that has not yet been b uilt, such as RRAS (Trivedi,
Kar & Patterson-McNeil, 2003), or that describe a prototype such as PeerPigeon (Millard,
Sinclair & Newman, 2008), which has not been used in a classroom at the time of publication,
are excluded from this review.
In summary, studies that describe a software tool designed for peer review in an educational
setting and used in at least one course are included in this review. The following software tools
are considered outside the scope of this review:
tools used for the peer review of an individual contribution to team;
standard technologies designed for another purpose and used for peer review (e.g. word pro-
cessor change tracking, email, forums and wikis);
tools designed for peer review that have not been implemented and used in the classroom;
and
conference management tools and other software designed to manage peer review in a profes-
sional rather than educational setting.
The reference lists of all primary research reports were searched for other cand idate reports.
Tab le 1. Generic peer assessment tools
Name Year
Rubric
Design
Rubric
Criteria
Discuss
Backward
Feedback
Flexible
Workflow
Evaluation
PeerGrader 2000 ? b,d,n,t
shared
page
student no student survey
Web-SPA 2001 flexible d,n,t
public
comments
none fixed
validity,
performance
improvement
OPAS 2004 flexible b, d, n, t debrief none script student survey
CeLS 2005 flexible b,d,n,t
peers
instructor
? script validity
PRAISE 2005 flexible b,t none none fixed
student survey,
usage statistics
Arop¨a 2007 flexible b,d,n,t none student limited
student survey,
staff interview
3 Results
The results of the review are organised into three s ubsections based on the kind of tools that
were identified during the review process. The fi rst subs ection summarizes generic tools that
have been designed to be flexible and support peer assessment in a variety of different disciplines
and contexts. The second subsection summarizes tools th at have been designed to support peer
assessment in a specific domain such as the review of a specific kind of artefact such as written
reports, or computer programs. T he final subsection summarizes tools that have been purpose-
built for a specific course, or which require manual mod ification to the software to adapt it for
use in other contexts.
Each subsection contains a table that summarizes the relevant tools. It lists the name of the
tool and the year of the first published report about it. Rubric designs are described as “flexible”
if the administrator has the ability to modify the rubric for a given assessment, and “fixed” if
the rubr ic cannot be modified. The rubric criteria are coded as “b” if the tool supports boolean
criteria (e.g. check boxes), “d” is the tool supports discrete choices (su ch as a drop-down list or
a forced choice between a fin ite number of specified criteria) “n” if the tool su pports numeric
scales (e.g. rating a solution on a 1–10 scale), and “t” if the tool supports open ended textual
comments (e.g. suggestions to improve the solution). If the quality of the reviews is assessed,
then the s ource of the feedback is noted (i.e. either a student or an instructor evaluates the
quality of the reviews). The opportunity for dialogue to occur between the reviewers and the
authors is coded. The way in which the tool allows workflow to be specified by the instructor is
listed. Finally, a s ummary of the kinds of evaluation performed with the tool is included .
3.1 Generic systems
A number of the s ystems reported in the literature are designed to be highly configurable and
support peer review activities in a wide range of disciplines and contexts. Although some systems
have only been used in a limited context at the time of publication, the design of those systems
indicates that they could be used in a variety of d isciplines and context. This section describes
these very flexible systems.
Table 1 summarizes the generic peer assessment tools.
3.1.1 PeerGrader
The PeerGrader (PG) system reported by Gehringer (2000) allow s students to submit an
arbitrary number of web pages for review, allowing students to include multimedia resources.
Reviewers and authors are able to communicate anonymously via a shared web page. After the
initial feedback phase, authors are given an opportunity to revise their work. At the end of the
revision period, the reviewers are required to allocate a grade. When the reviews are completed,
the students are required to grade the reviews on the basis of how helpful and careful the review
was.
An initial evaluation of PeerGrader in a standard data structures and algorithms course ex-
posed some problems with review allocations. Since assignments were allocated to reviewers on
the basis of stu dent enrollments, students who didn’t su bmit assignments or reviews (due to
dropping the course, or simply ch oosing not to participate) caused other students to receive
too few assignments to review, or to receive to o few reviews on the assignments they submit-
ted. Since reviewing can only begin after an assignment is submitted, assignments that were
submitted late left little time for the reviewers to complete their reviews. Gehringer notes th at
dynamically allocating assignments to reviewers may go some way towards alleviating these
problems.
3.1.2 Web-SPA
Web-SPA (Sung, Chang, Chiou & Hou, 2005) is designed to guide students through self and
peer assessment activities. Instructors have some flexibility to configure the type of activity
by configuring parameters s uch as setting a group or individual assignment, and defining the
method of scoring used by the rubric (discrete scale, percentage or no scoring). An instructor
can define criteria which are scored according to the method chosen in the initial configuration.
The Web-S PA system uses a fixed workflow to progressively engage students in the peer
assessment activity. Initially, students assess themselves. Having completed an evaluation, they
compare their own evaluation with others in their group. The groups select the best and worst
examples. The system will randomly present each ind ividual with exemplars of the best and
worst cases chosen by other groups to review. Once the reviews have been condu cted, the system
presents the best and wors t examples from the entire class. The act of re-reviewing exemplars
is designed to help students identify what is good and bad in a given assignment.
The au th ors conducted a study with 76 high school students in a Computer and In formation
Science course. The study found considerable consistency between instructor and peer marks. I t
also found that the quality of work improved after the peer review activities.
3.1.3 Online Peer Assessment System OPAS
The OPAS system (Trahasch, 2004) has been designed to support a wide range of peer assess-
ment activities with exible submission and marking criteria. Collaboration scripts are used to
formalise the structure an d workflow of the peer assessment process. An artefact submitted for
review can be a single document or a zip file. Submissions can come from individual authors,
groups, or the instructor. Reviews can be assigned randomly, manually, or using a combination
of random and manual. The system supports the allocation of reviews within groups. The re-
view rubrics are flexible and contain criteria that can be assessed u s ing radio buttons, list boxes,
numeric scales or with open-ended feedback. Multiple review cycles are supported. An overview
of the rankings and criteria is displayed to students at th e completion of the review and the best
example of each is displayed. A forum supports discussion after the completion of the reviews.
The system was evaluated with a class of 76 students enrolled in an Algorithms and Data
Structures course in Computer S cience. A student satisfaction survey was completed in which
students were generally positive.
3.1.4 Collaborative e-Learning Structures CeLS
CeLS (Ronen, Kohen-Vacs & Raz-Fogel, 2006) is a system designed to support collaborative
learning activities, including peer review with flexible work processes us ing collaboration scripts.
An instructor can create new activities or use the structure of an existing activity. The assessment
activities can in clude all the standard elements of a web form, but may additionally includ e
activities that involve ranking or sorting a set of artefacts.
A prototype of the CeLS system was piloted in 2003–2004 in Israel by 9 universities, 5 schools
and 4 in -service teacher courses. In total, 1600 students used CeLS in 48 different courses,
although the nature of the collaborative activities was not reported.
Kali and Ronen (2005) report on the use of CeLS for peer review in three successive semesters
of an undergraduate Educational Philosophy course. Students were asked to use the system
to evaluate a group presentation on a scale of 1–7 and write feedback in text fields for three
grading criteria. After an initial evaluation of the system, a fourth criteria was introduced to
allow students to write their own opinion which was not considered to be a grading criterion.
This was intended to explicitly distinguish between objective and subjective viewpoints. A third
design iteration introduced the idea of evaluating students as reviewers. Instead of assigning
grades according to the results of the peer review, the reviews themselves were evaluated by an
instructor and 15% of the students’ grades were calculated based on the qu ality of the reviews.
3.1.5 PRAISE
PRAISE (de Raadt, Toleman & Watson, 2005) supports the peer review of documents ac-
cording to a rubric defined by an instructor. The rubric consists of objective binary criteria,
and a holistic open-ended comment. The system waits until a specified number of reviews have
been received (e.g. 4 or 5), and thereafter immediately allocates an assignment to review when
a student submits. Assignment reviews can be flagged for mod eration by th e author if they feel
that th e review is unfair.
PRAISE has been used in at least 5 different courses, across the subj ects of Comp uting,
Accounting and Nursin g. Student surveys, us age statistics, time management and moderation
required have all been analysed. Student attitudes and practices of novice programmers were
found to differ from those of non-programmers (de Raadt, Lai & Watson, 2007).
3.1.6 Aro p¨a
Arop¨a (Hamer, Kell & Spence, 2007) is a generic web-based sys tem that supports the adm in-
istration and management of peer assessment activities in a variety of contexts. Authors submit
files directly to the Arop¨a system. Reviewers download the les for off-line viewing. Reviews are
conducted online by filling in an web f orm, which is customized by an instructor for the review
activity. After students receive their reviews, they may be required to provide feedback on the
quality of the reviews (according to a rubric defined by the instructor).
The allocation of authors to reviewers can be automatic or manual. If automatic, the instructor
can define a subset of authors, a s ubset of reviews and the number of reviews to allocate to each
reviewer. This system of allocation can accommodate a wide range of peer assessment activities,
including intra- or inter-group reviews.
The auth ors report that Arop¨a has been used in over 20 disciplines with a diverse range of
classes, ranging in size from 12 to 850. It h as been used for formative feedb ack on drafts, critical
reflection after an assignment and for summative assessment. Each of these varieties of peer
assessment differ in the timing, style of the rubric and degree of compulsion and awarding of
marks.
3.2 Domain-specific systems
Many of the systems are designed to sup port peer review activities in a specific domain, such as
reading and writing essays, or reviewing Java programming code. Systems that are designed for
use in a specific domain are described in this section, an d summarized in table 2.
3.2.1 Calibrated Peer Review
TM
CPR
CPR (Chapman & Fiore, 2000) has been designed to help students develop writing skills
through peer assessment. Instructors use CPR to create assignments that include specifications,
guiding questions to focus the development of a good solution, and examples of solutions with
corresponding reviews. Stu dents write and submit short essays on the specified topic. The review
process requires students to engage in a training phase wh ere they must evaluate the quality of
three sample essays. Their reviews are compared to th e samples provided by the instructor and
Tab le 2. Domain-specific peer assessment tools
Name Year Domain
Rubric
Design
Rubric
Criteria
Discuss
Backward
Feedback
Flexible
Workflow
Evaluation
CPR 1998 essays flexible b,n none auto none
validity, student
survey, writing perfor-
mance
C.A.P. 2000 essays fixed n,d,t
private
author/
reviewer
auto none
student surveys,
higher-order skills,
comment frequency,
use of review fea-
tures, compare with
self-assessment
Prakotomat 2000 programs fixed d,t none none none
student survey,
usage correlation
Sitthiworachart 2003 programs fixed d,n,t reviewers student none student survey, validity
SWoRD 2007 essays fixed n,t none s tudent limited validity
PeerWise 2008 MCQ fixed n,t
public
feedback
student none
usage, effect on exam
performance, quality of
questions, validity
peerScholar 2008 essays fixed n,t none s tudent none validi ty
feedback is given to the students about their review performance. Students are not permitted to
participate in real reviews until they can perform adequately on the samples. Although w idely
used, few evaluation studies have been published. A recent report suggests that the use of
CPR did not improve writing skills or scientific understanding (Walvoord, Hoefnagels, Gaffin,
Chumchal & Long, 2008).
3.2.2 C.A.P.
The C.A.P. system (Davies, 2000) (originally, Computerized Assessm ent including Plagia-
rism, and later Computerized Assessment by Peers) is designed for peer assessment of written
documents such as research reports and essays. It has evolved substantially from its initial im-
plementation, and continues to be actively studied and improved.
C.A.P. includes a predefined list of comments. Each student can configure the list by adding
their own comments. In addition, comments can be assigned a rating to specify how important
they are to the reviewer. The review process requires s tu dents to summ atively assess essays
by allocating a numeric value for four fi x ed criteria (such as “Readability”), and to provide
formative feedback by choosing comments from the configurable list. Students also provide a
holistic comment in an open-text area.
After an initial review period, students are given an opportunity to see the comments that other
reviewers selected, and may choose to mod ify their own review as a result (Davies, 2008). Once
the review stage is fully complete, the marks are used to calculate a compensated average peer
mark from the ratings submitted by the reviewers. The choice of comments and the ratings are
evaluated and used to generate an au tomated mark for the quality of reviewing (Davies, 2004).
Students are given th e opportunity to anonymously discuss their marks with the reviewer. The
reviewer may choose to m odify the marks on the basis of the discussion (Davies, 2003)
The C.A.P. system has been used in a number of stud ies of peer assessment, particularly
around the assessment of the review q uality. Studies show that marks for reviewing are positively
correlated with both essay marks and marks in a multiple choice test (Davies, 2004). The upper
two quartiles of students are more critical with comments than the lower two quartiles (Davies,
2006).
3.2.3 Praktom at
The Praktomat system (Zeller, 2000) is designed to provide feedback on programming code
to students. Authors submit a program to Praktomat which automatically runs regression tests
to evaluate the correctness of the code. Auth ors who have submitted code to the system can
request a program to anonymously review. The reviews use a fixed rubric that focus on a number
of specific style considerations that the instructor uses for final grading purposes. Th e code is
displayed in a text area that can be edited to allow annotations to be entered directly into the
co de. The review feedback is purely formative and plays no p art in the fi nal grade, nor is it
required. Students can review as m any programs as they wish.
Students reported that they found the system u seful, both in terms of automated testing,
reviewing programs and having programs reviewed by others. The grades obtained by students
for program readability increased both with the number of sent reviews and the number of
received reviews, although no formal statistical analysis was perf ormed.
3.2.4 Sitthiworachart
The system developed by Sittiworachart and Joy (2004) was based on the OASYS system.
It was designed for the peer review of programming assignments. A fixed rubric is used to
assess pr ogram style and correctness using Likert-scale ratings. An asynchronous communication
tool is provided to allow reviewers to anonymously discuss the assignments they are reviewing
throughout the process. An evaluation study showed that the peer ratings correlate significantly
with ins tr uctor ratings, and that students are better able to make accurate objective judgements
than subjective ones.
3.2.5 SWoRD Scaffolded Writing and Rewriting in the Discipline
SWoRD (Cho & Schunn, 2007) is a to ol designed specifically to support writing practice. At
the time of publication, it had been used in 20 courses in four different universities between 2002
and 2004.
An instructor using SWoRD defines a pool of topics, from which students select those th ey
want to write about and those they want to review. SWoRD balances the allocation of topics,
so some s tu dents may have a r ed uced set of choices. Students submit drafts and a self-assessed
estimate of grade.
The review structure is fixed, and uses pseudonyms to ensure that the identity of the authors
remains confidential. Reviewers evaluate the writing according to three dimensions: flow; logic;
and insight. For each dimension, reviewers rate the work on a scale of 1–7 and provide a written
comment about the quality of the writing.
SWoRD assumes that the average grade given by a group of student reviewers is an accurate
assessment. It calculates the accuracy of an individu al reviewer using three different metrics
—systematic difference, consistency and spread. These three m etrics are calculated for each
of the three dimensions of flow, logic and insight, giving nine measures of accuracy. T he nine
measures of accuracy obtained for each reviewer are normalized and combined to calculate
a weighted average grade for a submitted piece of writing. All the drafts are published with
their ps eu donyms , ratings and associated comments. Authors revise the drafts and submit final
papers, along with feedback about th e usefulness of the reviews th ey received. The review cycle
is repeated with the revised papers.
An evaluation of SWoRD was conducted with 28 students in a research methods course. A
controlled experiment comparing a single expert reviewer, single peer reviewer and multiple peer
reviewers showed the greatest improvement between draft an d nal paper occurr ed when the
author received multiple peer reviews, and the least improvement occurred with a single expert
reviewer.
3.2.6 PeerWise
PeerWise (Denny, Luxton-Reilly & Hamer, 2008a) supports the development of an online
multiple-choice question (MCQ) database by students. The MCQs submitted using PeerWise
become available f or other stud ents to use for revision purposes. When students answer a ques-
tion, they are r equired to review the question and enter a holistic rating (0–5) of the quality.
They are also encouraged to w rite a holistic comment in an text area. The author of a question
has the right to reply to any given comment, although there is no facility for a continuing dis-
Tab le 3. Context-specific peer assessment tools
Name Year Context
Rubric
Design
Rubric
Criteria
Discuss
Backward
Feedback
Flexible
Workflow
Evaluation
Peers 1995 Comp. Sci. flexible n none none none
student survey,
validity
NetPeas 1999
Comp. Sci.
Science
teachers
fixed n, t none none none
student survey,
rubric comparison,
thinking styles
OASYS 2001 Comp. Sci. fixed d,t none none none
student survey,
admin costs
Wolfe 2004
Comp. Sci.
Mathematics
Marketing
Psychology
fixed n, t none none none
usage
PEARS 2005 Comp. Sci. fixed n, t none none none
rubric comparison
cussion. Although the comments are visib le to all users, the individual ratings are averaged and
only the aggregate rating is displayed. All interaction between students is anonymous.
Numerous studies have evaluated aspects of PeerWise, including the usage (Denny, Luxton-
Reilly & Hamer, 2008b),effect on exam performance (Denny, Hamer, Luxton-Reilly & Pu rchase,
2008) and the qu ality of th e questions (Denny, Luxton-Reilly & Simon, 2009). A study of the
validity of the peer assessed ratings (Denny et al., 2009) found that the correlations between
ratings of students and the ratings of instructors who taught the course were good (0.5 and
0.58). The authors conclude that students are reasonably effective at determining the quality of
the multiple choice questions created by their peers.
3.2.7 peerScholar
The peerScholar (Par´e & Joordens, 2008) system was designed to impr ove the writing and
critical thinking skills of students in a large undergraduate Psychology class. In the first phase,
students are required to write two abstracts and two essays. The second phase requires students
to anonymously assess five abstracts and five essays by assigning a numeric grade (1–10) and
writing a positive constructive comment for each piece of work. Finally, in the third phase,
students receive the marks and comments as f eedback. An accountability feature allows students
to submit a mark (1–3) for each of the reviews they received.
An study was conducted to compare expert marks with the marks generated through the peer
review process. The authors found that the correlation between expert and peer marks was good,
and that it improved when the accountability feature was applied to students.
3.3 Context-specific systems
Some of the systems reported in the literature have been written for use in a specific cour s e and
must be rewritten if they were to accommodate other contexts. Although most of these systems
have the potential to be developed further in the future, at the time of publication they were
bound to the specific context in which they were developed. Table 3 summarizes the tools in
this category.
3.3.1 Peers
The Peers (Ngu & Shepherd, 1995) s ystem was implemented in Ingres, a commercial database
management system. Students were able to anonymously suggest assessment criteria and alter
weightings on existing criteria before the submission of assignments. Assignments were allocated
to students who were able to anonymously review them and provide marks for the criteria
that were cooperatively developed. A short evaluation study found a good correlation between
instructor and student marks. However, the student survey that was conducted found that all the
students preferred to have instr uctor assessment in addition to the peer evaluation, suggesting
that s tu dents did not trust the outcomes of peer assessment.
3.3.2 NetPeas
NetPeas (Lin, Liu & Yuan, 2001), initially known as Web-based Peer Review or WPR (Liu,
Lin, Chiu & Yuan, 2001), requires students to submit documents in HT ML format. Initially,
the system only supported a s ingle holistic rating and an open-ended comment, but was later
revised to support numerous specific criteria involving both a rating (1–10 Likert scale) and an
open-ended comment for each criterion. Th e system supports the modification of assignments
by students which allows drafts to be revised after an initial review period .
Evaluation studies have looked at correlations between review ability and examination scores,
different thinking styles, specific and holistic feedback and student attitude. The authors con-
clude that being a successful author, or a successful reviewer alone may not be sufficient for
success in a peer review environment.
3.3.3 OASYS
OASYS (Bhalerao & Ward, 2001) is designed to support self-assessment and provide timely
formative feedback to students in large classes without increasing academic workload. It is a
hybrid system used to assess students using a combination of mu ltiple choice questions and free-
response questions. The system automatically marks the MCQ questions and uses peer review
to provide summative feedback to students about their answers to the free-response questions.
Although this system has been designed and used in the context of a programming course,
the authors note that it could easily be adapted for more widespread use in other disciplines.
An evaluation which compared the time taken to mark paper tests with the time required to
mark using the OASYS system was perf ormed. Students using the system received feedback
more rapidly with less staff time required than paper-based tests.
3.3.4 Wolfe
Wolfe (2004) developed a system in which stu dents posted their assignments on their own web
site and submitted the URL to the peer review s ys tem. Reviewer s remained anonymous, but they
knew who they were reviewing. Reviewers were presented with the list of all the assignments
submitted and were expected to submit a score (1–10) and a holistic comment about each
assignment. Students were required to submit a minimum number of r eviews, but no maximum
was s et. The web site listed the number of r eviews that had already been submitted for each
assignment and students were asked to ensure the numbers were roughly even, but th e r equ est
was not en forced.
The system was used in Computer Science, Mathematics, Marketing and Psychology courses,
but required manual recoding to adapt it to each new context. Wolfe notes that roughly 70% of
the reviews were superficial. He reports on the use of the s ystem in a s mall software engineering
course (34 students). Students were required to submit a minimum of 10 reviews, but could
conduct additional reviews if desired. The majority of students received more th an the minimum
10 reviews, and the majority of those reviews were submitted by students ranked in the top third
of the class.
3.3.5 PEARS
PEARS (Chalk & Adeboye, 2005) is designed to support th e learning of programming skills.
Students subm it Java les directly to the system, conduct peer reviews, respond to feedback
and may r esubmit reviewed work. In the published study, students used two different rubrics
to review Java code. The rst rubric contained sixteen specific binary criteria (yes/no, and not
applicable), while the second rubric used a text area to submit open-ended holistic feedback
about the strengths and weaknesses of the reviewed work and a single overall score out of 10.
The authors report th at over two-thirds of the s tu dents p refer to write reviews using holistic
feedback, that they preferred receiving holistic feedback, and that the holistic feedback written
by students had a significant positive correlation with the marks allocated by a tutor.
4 Discussion
In this section, the common elements of the systems are discussed and unique approaches are
identified.
4.1 Anonymity
Ballantyne et al. (2002) suggest that students should remain anonymous to alleviate student
concerns over bias and unfair markin g. The majority of systems use a double-blind peer review
process, ensuring that students remain anonymous through out th e entire p rocess. Bhalerao and
Ward (2001) report that anonymity is a s tatutory requir ement in their institution. Developers
of peer review software would be well advised to consider their own institutional regulations
regarding the privacy of student grades.
In some cases, s tu dent presentations are being assessed (Kali & Ronen, 2005), or students are
working in teams on different projects, in which case students performing a review would be
aware of the identity of the person they were reviewing. In such cases, there is no need to ensure
a double-blind review occurs. Flexible systems such as OPAS and Arop¨a may be configured to
have different levels of anonymity for a given activity (e.g. double-blind, single b lind, pseudonym,
or open reviewing).
Notably, the system developed by Wolfe (2004) ensured the anonymity of the reviews, but the
identity of the authors was known to the reviewers.
4.2 Allocation and distribution
A variety of methods are employed to distribute artefacts produced by an author to a reviewer
(or most commonly, multiple reviewers ). The simplest approach is simply to allocate the reviews
randomly. A spreadsheet specifying the allocation of assignments from author to reviewer is ac-
commo dated by Arop¨a, PeerGrader and OPAS. Although some systems (such as Arop¨a) support
the allocation of assignments by groups (to allow inter-or intra-group reviews), many do not.
PRAISE system waits until a minimum number of submissions are received before it begins to
allocate assignments to reviewers. After the threshold has been reached, an author that submits
an assignment is immediately allocated assignments to review. The major benefit of this approach
is a reduction in time between submission and review. However, no analysis of the consequences
of this strategy has yet been conducted. It is possible that better students (who complete the
assignment and submit early) will end reviewing each other while weaker students who submit
later will be allocated weaker assignments to review. Further investigation may be warranted to
explore th e implications of this allocation strategy.
The use of exemplars can help students to identify what is go od or bad in a given assignment.
These exemplars can act as a ‘yard-stick’ by which students can measure their own performance
and that of others. In order to ensure that students see a diversity of assignments, OASYS
uses the marks for an MCQ test in the distribution algorithm to ensure that each reviewer
receives one script from authors in each of the good, intermediate and poor MCQ categories.
Web-SPR uses multiple review cycles to ensure that students are exposed to examples of the
best and worst assignments. SWoRD makes all the drafts, reviews and ratings publicly available
for students to peruse, providing students with the opportunity to compare the best and worst
submissions. At the completion of the review phase, OPAS displays a summary of the rankings f or
each criteria assessed and the top ranked assignment for each criteria is available for students to
view. Although Arop¨a does not systematically provide students with the best and worst reviews,
during the allocation phase, it has been seeded with a sample solution provided by the instructor
to ensure all students see a good solution.
4.2.1 Unrestricted reviewing
The PeerWise system has no system of allocation. Instead, stu dents can choose to answer
as many MCQ questions as they wish. E ach time a question is answered a review is required.
Students tend to choose the questions with the highest rating, therefore the better qu estions are
reviewed more frequently. Poor questions are infrequently reviewed.
Wolfe (2004) allowed students to choose who they reviewed (and the identities of the authors
was known). The number of reviews that each artefact had received was displayed and reviewers
were asked to ensure that they were approximately even, but this requirement was not enforced
by the system.
Since reviewing is optional in the Praktomat system, the pr ocess of review allocation uses a
non-random strategy to encourage students to participate and contribute h igh-quality reviews.
Praktomat uses a set of rules to determine wh ich artefacts are reviewed next. The artefact that
has had the minimum number of reviews is selected. Programs whose authors have composed
a greater number of reviews are selected by preference. Praktomat tries to allocate reviews
mutually, so a pair of authors review each others’ programs.
4.3 Marking criteria
A variety of different appr oaches to designing marking criteria are apparent. Students are rarely
invited to participate in the design of th e marking criteria, although numerous authors report
that criteria are discussed with students prior to the review process. Some systems use very
specific criteria while others use a more holistic general rating.
Systems that are designed to be used in a wide variety of conditions (i.e. those classified
as “generic” systems) support instructor-designed marking forms. These forms are typically
constructed from the components that m ake up standard web forms, and su pport check boxes,
discrete lists, numeric scales and open responses in text areas. CeLS has a very flexible design
that can accommodate a range of assessment activities including selection, assigning a numeric
value and free-text comments, but also more complex assessments such as ranking and sorting.
Systems that are designed to operate in a more r estricted domain frequently use a fixed
structure for the assessment process and may provide few options for the configuration of the
marking schema.
Falchikov and Goldfinch (2000) conducted a meta-analysis that investigated the validity of
peer assigned marks by comparing peer marks with teacher marks. They recommend that it
is better to use an overall global mark rather than expecting students to rate many individual
dimensions. However, Miller (2003) found that more specific, detailed rubr ics provided better
differentiation of performance at the cost of qualitative feedback. Rubrics that provided more
opportunities to comment elicited a greater number of qualitative responses and a larger number
of comments.
An evaluation study comparing holistic with specific feedback using the PEARS system found
that the majority of students preferred both wr iting and receiving the h olistic feedback (Chalk &
Adeboye, 2005). They also found that there was no correlation between the students’ s cores and
the tutors scores when using the rubric w ith specific criteria, but a significant positive correlation
was found between students and tutors when the holistic rubric was used.
PRAISE us es objective binary criteria to ensure consistency between reviewers. A holistic
comment is also supported.
Kali and Ronen (2005) report that an explicit distinction between objective and subjective cri-
teria improves the quality of the review. Students like having the option to exp ress their personal,
subjective op inion (which d oes not contribute to the grading process), and distinguishing their
subjective view from the objective grading criteria improves the correlation between student and
instructor marks.
CAP requ ir es students to use numeric scales to summatively assess an essay, b ut they are
also expected to provide formative feedback by selecting comments from a defined list. The
importance of each comment in the list is weighted by the reviewer, allowing the CAP system to
automatically compare the comments applied by different reviewers in an attempt to estimate
the effectiveness of a given reviewer.
Open-ended feedback requires students to write prose that states their opinion in a critical, yet
constructive way. It is certainly possible that the formative feedback provided by this approach
is more useful to students than that obtained through check boxes or simple numeric scale.
However, furth er research is required to identify the conditions under which specific feedback
is more valuable than holistic feedback for both the reviewers and the authors who receive the
review.
4.4 Calcul ating the peer mark
Many of the s y s tems use a simp le mean value, although a variety of other methods of calculating
the peer mark are employed.
The peerScholar has a fixed workfl ow design in which each artefact is reviewed by five different
authors. An average of the middle three values were used to calculate the final mark. This red uces
the impact of a single rogue reviewer from the calculation.
Arop¨a uses an iterative weighting algorithm (Hamer, Ma & Kwong, 2005) to calculate the
grade. This algorithm is designed to eliminate the effects of rogue reviewers. Th e more that a
reviewer deviates from the weighted average, the less their review contributes to the average in
the next iteration. When the weighted averages have settled, the algorithm halts and the values
are assigned as grades.
CPR requires students to go through a training stage where the grades assigned by students
are compared with the expected grades. Students receive feedback on their grading performance
and mu st be able to accurately apply the criteria before they are permitted to begin reviewing
work submitted by their peers. Th e degree to which a reviewer agrees with the “ideal” review
set by the instructor determines a “reviewer competency index” which is later used to weight
the reviews when a weighted average is calculated.
SWO R D calculates a weighted grade based on three accuracy measures, systematic differences,
consistency and spread. Th e system assumes that th e average of all the reviewers of a given
artefact is an accurate measure. The “systematic” metric determines the degree to w hich a
given reviewer is overly generous or overly harsh (a variation of a t-test between the reviewer and
the average marks across all the reviews). The “consistency” metric determines the correlation
between the reviewer marks and the average marks (i.e can the reviewer distinguish between
go od and poor papers). Finally, the “spread” metric determines the degree to which the reviewer
allocates marks too narrowly or too widely. These metrics are combined to form an accuracy
measure which is factored into the weighting for reviewer marks.
CAP initially used a median value (Davies, 2000) to eliminate the effect of “off the wall”
reviewers, but was subsequently modified to calculate a compensated peer mark (Davies, 2004).
The compensated peer mark is a weighted mark that takes into account w hether a given r eviewer
typically over estimates th e grade, or under-estimates the grade (compared to the average given
by peers). Although the overall effects of the compensation are minor, students feel more com-
fortable knowing that they will not be disadvantaged by a “tough” marker.
4.5 Qualit y of reviews
The q uality of the reviews created by students is of significant concern by both instructors and
students. A number of systems offer the opportunity to provide feedback to the reviewer about
the quality of their reviews. However, there are few studies that have investigated the quality of
the reviews, the value of the feedback to the students, or investigated how the rubric format, or
quality assurance method s have affected the quality of the feedback.
4.5.1 Validity of reviews
One aspect of quality is the ability of peers to mark fairly and consistently. The metric most
commonly used to determine if students can mark effectively is the correlation with the marks
assigned by an instructor.
Falchikov and Goldfinch (2000) condu cted a meta-analysis comparing peer marks with teacher
assigned marks. They f ou nd a mean correlation of 0.69 between teacher and peer marks over
all the studies they considered. Par´e and Joordens (2008) found a small but significant differ-
ence between expert and peer marks in psychology courses using the peerScholar system. The
correlation between the expert and peer marks was low, but increased after they introduced the
facility for students to grade the reviews they received. They conclude that the averaged peer
marks are similar to the averaged expert marks in terms of level and ranking of assignments.
Sitthiworachart and Joy (2008) conducted a study that compared tutors’ and peers’ marks for
a number of detailed marking criteria for assignments in a rst-year programming course. They
found high correlations between tutors’ and students’ marks for objective criteria, but lower
correlations between tutor and student marks for subjective criteria.
Wolfe (2004) reports that su fficiently large numbers of reviews result in reliable averages,
although this was an anecdotal oberservation by the author rather than the result of a formal
study. It is worth noting that the system used by Wolfe resulted in a larger number of reviews
being contributed by the better students than the poorer students.
4.5.2 Quality of formative feedback
There are few studies that have investigated the nature of formative feedback provided by
students in holistic comments, and compared the value of those comments with those provided
by instructors. A study using SWoRD revealed that formative f eedback from multiple peer
reviews was more useful for improving a draft than feedback from a single expert.
4.5.3 Backwards feedback
The term backwards feedback is used to d escribe the feedback that an author provides to
a reviewer about the quality of the review. This feedback can be formative, in the form of a
comment, or can be summative in the form of a numeric value.
Ballantyne et al. (2002) suggest that teachers award marks for the feedb ack provided by peers
in order to boost student engagement and commitment to the task. The system created by Wolfe
(2004) did not contain any assessment of review quality, and he estimates that approximately
70% of the reviews were superficial. Many of the more recently developed tools require students
to assess the quality of the reviews they have recieved, either summatively or with formative
feedback.
The “tit f or tat” approach used in Praktomat allocates reviews on a paired basis where possible,
so a reviewer knows that they are reviewing the work of the person that will be reviewing them
in turn. This encourages students to produce high quality reviews in the hope th at the recipient
will be doing the same. Although this is a feasible strategy for f ormative assessment, it is not
appropriate for summative assessment where it would be likely to encourage grade inflation.
Kali and Ronen (2005) decided not to grade assignments on the basis of the peer assessments,
but instead to grade the quality of the reviews. They report that grading stu dents on the quality
of the reviews rather than the peer assessed marks for their assignments reduced tensions and
produced higher correlations between the marks assigned by students and instructors. This
grading was performed by instructors.
PEARS allows authors to respond to their reviewers, giving feedback on the usefulness of the
reviews they received. However, this feedback is purely form ative and is not used in assessment
criteria.
SWoRD requires authors to provide feedback to the r eviewers about the quality and usefulness
of the review. This feedback is pur ely formative and plays no part in the final grade.
Arop¨a can be configured to require students to formally review a number of reviews using
an instructor-defined rubric. The instructor can specify that students assess the reviews they
have received, or the reviews can be considered to be artefacts in th eir own right and allocated
anonym ously and randomly to be reviewed by a stu dent that has no vested interest.
4.6 Dialogue
The systems considered here vary substantially when it comes to su pporting discussion within the
peer assessment framework. PeerGrader allows authors and reviewers to access and contrib ute to
a shared web page where discussions can occur. The instructor can configure the system to m ake
the comments posted to the shared web page visible to the other students allocated to review
the same authors work. This allows either private discuss ions between authors and reviewers,
or a group discussion between the author and all the reviewers of their work. Web-SPA uses a
similar approach in which students can post short messages to a p ublic page.
OPAS includes a discussion forum which is available for students to post after the completion
of the review. T his encourages reflection on the criteria and quality of the work pro duced. The
highest degree of discussion is provided by the Sitthiworachart system, which provides reviewers
with the capacity to communicate with both the author and all the other reviewers assigned to
review the given assignment. A chat s ystem allows th em to communicate in real time, or leave
messages for each other if they are not available.
4.7 Workflow
SWoRD is designed for students to progressively improve essay drafts using formative peer
feedback. It uses a xed process for a given cycle, but the instructor can define the number of
review cycles that occurs before the final submission.
PeerGrader allows the author to revise their work at any time through the reviewing pr ocess.
When the author subm its a revised version, an email message is sent to all the reviewers. The
older version is archived and a new d iscus sion page is created for the revised version. The collab-
oration scripts used by OPAS support multiple review cycles wh ere students can progressively
improve drafts on the basis of feedback.
Miao and Koper (2007) show how collaboration scripts can be used to describe the structure
of interactions that occur in the process of peer review. Using a script to describe the peer
assessment process, a tool can automatically generate documents adhering to the IMS Learning
Design specification (IMS LD) and IMS Question and Test Interoperability specification (IMS
QTI) that can be viewed using an appropriate player. However, the authoring tools used to
create th e scripts are complex and require a significant degree of technical expertise.
CeLS is extremely flexible and allows instructors to create a wide range of peer assessment
activities with varying workflow. The authors report the flexibility resulted in a large number of
variants of basic structures which could be confusing. The authors suggest that further work is
required to categorize the structures to ensure that the variety of options is not overwhelming.
There appears to be a significant trade-off between flexibility and ease-of-use. Systems that
have more flexible workflow have us ed collaboration scripts or domain specific languages to
express the complex processes, bu t this fl exibility makes th em too difficult to use for a n on -
technical person.
5 Conclusion
This review makes a significant contribution by summ arizing the available tools that support
online peer assessment. These tools have been classified as generic, domain specific and context
specific. The major features have been compared and discussed. Although a variety of different
tools have been reported in the literature, few of them have been thoroughly evaluated. There
is a clear need for more usability studies and further evaluation studies that investigate the
differences between the approaches taken .
Arop¨a, SWoRD and C.A.P. have the most sophisticated processes for identifying “good” re-
viewers and weighting student assigned grades accordingly. A variety of different algorithms
are applied to weight the peer marks in a attempt to establish a more accurate measure of
the “true” quality of an assignment. Comparative studies that investigate the benefits of these
different approaches are required .
Since the peer assessment process uses the output from one student as th e input to another
student, online tools need to provide a mechanism to deal with late or missing submissions. Many
of the systems support both manual and automatic allocation of reviews, but PRAISE is the only
system that dynamically allocates reviews during the submission pr ocess. S ome s ys tems, such
as PeerWise and that of Wolfe, do not limit th e number of reviews that a student can perform.
In s uch systems, students with higher grades tend to contribute more than weaker students,
resulting in a greater amount of higher quality feedback being produced. This approach looks
promising, and f uture tools should support unlimited reviewing where possible, although further
research is required to investigate this approach more carefully.
All of the systems considered in this study are web-based and use standard web forms for the
entry of the review. Only one of the the sys tems (Praktomat) supports direct annotation on the
product of review, something that has always been possible on paper-based reviews. None of
the tools currently sup port the use of digital ink to provide annotations during the peer review
process.
Although some tools s upported instructor designed m arking criteria, others specified a fixed
sched ule. The marking criteria varied between binary criteria and a holistic overall rating and
open-ended text. Th ere is no clear in dication of the impact of each approach. Future work is
required to evaluate the effectiveness of different forms of rubrics for both th e reviewer and
the recipient of the review. Although numerous studies have considered the correlation between
instructor-assigned grades and stu dent-assigned grades, no studies have thoroughly investigated
the quality of the formative feedback (comments) provided by students.
Many of the tools support some form of feedback between reviewer and author, but few support
full discussion. The impact of discussion at different stages of the peer assessment process has
not been investigated. The support of discussion between reviewers and between reviewers and
authors warrants further study.
Instructors in Computer Science have the expertise to develop online tools that support peer
assessment, and the opportun ity to evaluate those tools in the classroom. The majority of online
tools described in this paper (13 of 18) have b een used in Computer Science courses, but most
are unavailable for use outside the context in wh ich they were developed, and none of them have
been widely adopted. It is likely that peer assessment tools in the immediate future will continue
to be developed by Computer Science educators for use in their own classrooms, informed by
reports of the current tools. However, it would contribute significantly to the Computer Science
community if future peer assessment tools were designed for use in multiple institutions.
References
Anderson, N., & Shneiderman, B. (1977). Use of peer ratings in evaluating computer program
quality. In Proceedings of the fifteenth annual SIGCPR conference, Arlington, Virginia,
United States (pp. 218–226). New York, NY, USA: ACM.
Ballantyne, R., Hughes, K., & Mylonas, A. (2002). Developing Procedures for Implementing Peer
Assessment in Large Classes Usin g an Action Research Process.. Assessment & Evaluation
in Higher Ed uca tion, 27(5), 427–441.
Bhalerao, A., & Ward, A. (2001). Towards electronically assisted peer assessment: a case study.
Association for Learning Technology Journal, 9(1), 26–37.
Brereton, P., Kitchenham, B., Budgen, D., Turner, M., & Khalil, M. (2007). L essons from ap-
plying the systematic literature review process w ithin the software engineering domain. The
Journal of Systems and Software, 80, 571–583.
Chalk, B., & Adeboye, K. (2005). Peer Assessment Of P rogram Code: a comparison of two
feedback instruments. In 6th HEA-ICS Annual Conference, University of York, UK (pp.
106–110).
Chapman, O., & Fiore, M. (2000). Calibrated Peer Review
TM
. Journal of Interactive Intruction
Development, 12(3), 11–15.
Cho, K., & Schunn, C.D. (2007). Scaffolded writing and rewriting in the discipline: A web-based
reciprocal peer review system. Computers & Education, 48(3), 409 426.
Davies, P. (2000). Computerized Peer Assessment. Innovations In Education & Training Inter-
national, 37(4), 346–355.
Davies, P. (2003). Closing the communications loop on the computerized peer-assessment of
essays. ALT-J, 11(1), 41–54.
Davies, P. (2004). Don’t write, just mark: the validity of assessing student ability via their
computerized peer-marking of an essay rather than their creation of an essay. ALT-J, 12(3),
261–277.
Davies, P. (2006). Peer assessment: judging the quality of the students’ work by comments rather
than marks . Innovations In Education & Training International, 43(1), 69–82.
Davies, P. (2008). Review and reward within the computerised peer-assessment of essays. As-
sessment & Eva luation in Higher Education, (pp. 1–12).
de Raadt, M., Lai, D., & Watson, R. (2007). An evaluation of electronic individual peer assess-
ment in an introductory programming course. In R. Lister & Simon (Eds .), Seventh Baltic
Sea Conference on Computing Education Research (Koli Calling 2007), Vol. 88 of CRPIT
(pp. 53–64). Koli National Park, Finland: ACS.
de Raadt, M., Toleman, M., & Watson, R. (2005). Electronic peer review: A large cohort teaching
themselves?. In Proceedings of the 22nd Annual Conference of the Australasian Society for
Computers in Learning in Tertiary Education (ASCILITE’05) Brisbane, Australia.
Denny, P., Hamer, J., Luxton-Reilly, A., & Purchase, H. (2008). PeerWise: students sharing their
multiple choice questions. In ICER ’08: Proceed ing o f the fourth international workshop on
Computing education research, Sydn ey, Australia (p p. 51–58). New York, NY, USA: ACM.
Denny, P., Luxton-Reilly, A., & Hamer, J. (2008a). T he PeerWise system of student contributed
assessment questions. In Simon & M. Hamilton (Eds.), Tenth Australasian Computing Ed-
ucation Conference (ACE 200 8), Vol. 78 of CRPIT (pp. 69–74). Wollongong, NSW, Aus-
tralia: ACS.
Denny, P., Luxton-Reilly, A., & Hamer, J. (2008b). Student use of the PeerWise system. In
ITICSE ’08: Proceedings of the 13th annual SIGCSE conference on Innovation and tech-
nology in computer science education (pp. 73–77). Madrid, Spain: ACM.
Denny, P., Luxton-Reilly, A., & Simon, B. (2009). Quality of student contributed questions using
PeerWise. In M. Hamilton & T. Clear (Eds.), Eleventh Australasian Computing Education
Conference (ACE 2009), Vol. 95 of CRPIT, Wellington, New Zealand, January (pp. 55–64).
Wellington, New Zealand: Austr alian Computer S ociety.
Do chy, F., Segers, M., & Sluijsmans, D. (1999). The use of self-, peer and co-assessment in higher
education: A review. Studies in Higher Education, 24(3), 331–350.
Downing, T., & Brown, I. (1997). Learning by cooperative publishing on the World-Wide Web.
Active Learning, 7, 14–16.
Falchikov, N. (1995). Peer Feedb ack Marking: Developing Peer Assessment. Innovations in Ed-
ucation and Teaching International, 32(2), 175–187.
Falchikov, N., & Goldfinch, J. (2000). Student Peer Ass essment in Higher Education: A Meta-
Analysis Comparing Peer and Teacher Marks. Review of Educational Research, 70(3), 287–
322.
Figl, K., Bauer, C., & Mangler, J. (2006). Online versus Face-to-Face Peer Team Reviews. In
36th ASEE/IEEE Frontiers in Education Conference, Oct. (pp. 7–12).
Freeman, M., & McKenzie, J. (2002). SPARK, a confidential web-based template for self and
peer assessment of student teamwork: benefits of evaluating across different subjects.. British
Journal of Educational Technology, 33(5), 551–569.
Gehringer, E. (2000). Strategies and mechanisms for electronic peer review. Frontiers in Educa-
tion Conference, 2000. FIE 2000. 30th Annual, 1, F1B/2–F1B/7 vol.1.
Hamer, J., Cutts, Q., Jackova, J., Luxton-Reilly, A., McCartney, R., Purchase, H., et al. (2008).
Contributing student pedagogy. SIGCSE Bull., 4 0(4), 194–212.
Hamer, J., Kell, C., & Spence, F. (2007). Peer assessment us ing arop¨a. In ACE ’07: Proceedings
of the ninth Australasian conference on Computing education, Ballarat, Victoria, Australia
(pp. 43–54). Darlinghurst, Australia, Australia: Australian Computer Society, Inc.
Hamer, J., Ma, K.T.K., & Kwong, H.H.F. (2005). A method of automatic grade calibration in
peer assessment. In ACE ’05: Proceedings of the 7th Australasian conference on Computing
education, Newcastle, New South Wales, Australia (pp. 67–72). Darlinghurst, Australia,
Australia: Australian Computer Society, Inc.
Kali, Y., & Ronen, M. (2005). Design principles for online peer-evaluation: Fostering objectivity.
In T. Koschmann, D.D. S uthers & Chan (Eds.), Computer suppor t for collaborative learning:
The Next 10 Tears! Proceedings of CSCL 2005 (Taipei, Taiwan) Mahwah, NJ: Lawrence
Erlbaum Associates.
Kitchenh am, B., Procedur es for Perf oming Systematic Reviews. (2004). , Technical report
TR/SE-0401, K eele University.
Lin, S., Liu, E., & Yuan, S. (2001). Web-based peer assessment: f eedback for students with
various thinking-styles. Journal of Computer Assisted Learning, 17(4), 420–432.
Liu, E.Z.F., Lin, S., Chiu, C.H., & Yuan, S.M. (2001). Web-based peer review: the learner as
both adapter and reviewer. Education, IEEE Transactions on, 44(3), 246–251.
Lutteroth, C., & Luxton-Reilly, A. (2008). Flexible learning in C S2: A case study. In Proceed-
ings of the 21st Annual Conference of the National Advisory Committee on Computing
Qualifications Auckland, New Zealand.
Mann, B. (2005). The Post and Vote Model of Web-Based Peer Assessment.. I n P. Kommers &
G. Richards (Eds.), Proceedings of World Conference on Educational Multimedia, Hyper-
media and Telecommunications 2005 (pp. 2067–2074). Chesapeake, VA: AACE.
McLuckie, J., & Topping, K.J. (2004). Transferable skills for online peer learning. Assessment
& Evaluation in Higher Education, 29(5), 563–584.
Miao, Y., & Koper, R. (2007). An Efficient and Flexible Technical Approach to Develop and
Deliver Online Peer Assessment. In C.A. Chinn, G. Erkens & S. Pu ntambekar (Eds.), Pro-
ceed ings of the 7th Computer Suppo rted Collaborative Learning (CSCL 2007) conference
’Mice, Minds, a nd Society’, July (pp. 502–511)., New Jersey, USA.
Millard, D., Sinclair, P., & Newman, D. (2008). PeerPigeon: A Web Application to Support
Generalised Peer Review. In E-Learn 2008 - World Conference on E-Learning in Corporate,
Government, Healthcare, a nd Higher Education, November .
Miller, P.J. (2003). The Effect of Scoring Criteria Specificity on Peer and Self-assessment. As-
sessment & Eva luation in Higher Education, 28(4), 383–394.
Murphy, L., & Wolff, D. (2005). Take a minute to complete the loop: us ing electronic Classroom
Assessment Techniques in computer science labs. J. Comput. Small Coll., 21(1), 150–159.
Ngu, A.H.H., & Shepherd, J. (1995). Engineering the ‘Peers’ system: the development of a
computer-assisted approach to peer assessment.. Research and Development in Higher Ed-
ucation, 18, 582–587.
Par´e, D., & Joordens, S. (2008). Peering into large lectures: examining peer and expert mark
agreement using peerScholar, an online peer assessment tool. Journal of Computer Assisted
Learning, 24(6), 526–540.
Plimmer, B., & Apperley, M. (2007). Making paperless work. In CHINZ ’07: Proceedings of the
7th ACM SIGCHI New Zealand ch apter’s international conference on Computer-human
interaction, Hamilton, New Zealand (pp. 1–8). New York, NY, USA: ACM.
Price, B., & Petre, M. (1997). Teaching programming through paperless assignments: an empir-
ical evaluation of instructor feedback. In ITiCSE ’97: Proceedings of the 2nd conference on
Integrating technology into co mputer science educa tion, Uppsala, Sweden (pp. 94–99). New
York, NY, USA: ACM.
Raban, R., & Litchfield, A. (2007). Supporting peer assessment of individual contributions in
groupwork. Australasian Journal of Educational Technology, 23(1), 34–47.
Ronen, M., Kohen-Vacs, D., & Raz-Fogel, N. (2006). Adopt & adapt: structuring, sharing and
reusing asynchronous collaborative pedagogy. In ICLS ’06: Proceedings of the 7th interna-
tional conference on Learning sciences, Bloomington, Indiana (pp. 599–605). International
Society of the Learning Sciences.
Sackett, D.L., Richard s on , W.S., Rosenberg, W., & Haynes, R.B. (1997). Evidence-based
medicine: how to practice and teach EBM. Lond on (UK): Churchill Livingstone.
Sitthiworachart, J., & Joy, M. (June 2008). Computer support of effective peer assessment in an
undergraduate programming class. Journal of Computer Assisted Learning, 24, 217–231(15).
Sitthiworachart, J., & Joy, M. (2004). Effective peer assessment for learning computer program-
ming. In ITiCSE ’04: Proceedings of the 9th annual SIGCSE conference on Innovation and
technology in computer science education, Leeds, United Kingdom (pp. 122–126). New York,
NY, USA: ACM.
Sluijsmans, D.M.A., Brand-Gruwel, S., & van Merrinboer, J .J.G. (2002). Peer Assessment Train-
ing in Teacher Education: effects on performance and perceptions. A ssessment & Evaluation
in Higher Ed uca tion, 27(5), 443–454.
Sung, Y.T., Chang, K.E., Chiou, S.K., & Hou, H.T. (2005). The design and application of a
web-based self- and p eer-assessment system. Computers & Education, 45(2), 187 202.
Topping, K. (1998). Peer Assessment Between Students in Colleges and Universities. Review of
Educational Research, 68(3), 249–276.
Trahasch, S. (2004). From peer assessment towards collaborative learning. Frontiers in Educa-
tion, 2004. FIE 2004. 34th Annual, (pp. F3F–16–20 Vol. 2).
Trivedi, A., Kar, D.C., & Patterson-McNeill, H. (2003). Automatic assignment management and
peer evaluation. J. Comput. Small Coll., 18(4), 30–37.
Walvoord, M.E., Hoefnagels, M.H., Gaffin, D.D., Chumchal, M.M., & Long, D.A. (2008). An
analysis of Calibrated Peer Review (CPR) in a science lecture classroom. Journal of College
Science Teaching, 37(4), 66–73.
Webster, J., & Watson, R.T. (2002). Analyzing the Past to Prepare for the Future: Writing a
Literature Review. MIS Quarterly, 26(2), xii–xxiii.
Wolfe, W.J. (2004). Online student peer reviews. In CITC5 ’04: Proceedings of the 5th conference
on Information technology education, Salt Lake City, UT, USA (pp. 33–37). New York, NY,
USA: ACM.
Xiao, Y., & Lucking, R. (2008). The impact of two types of peer assessment on students’ per-
formance and satisfaction within a Wiki environment. The Internet and Higher Education,
11(3-4), 186 193. Special Section of the AERA Education and World Wide Web Special
Interest Group (EdWeb/SIG)
Zeller, A. (2000). Making students read and review code. In ITiCSE ’00: Proceedings of the 5th
annual SIGCSE/SIGCUE ITiCSE conference on Innovation and tech nology in computer
science education, Helsinki, Finland (pp. 89–92). New York, NY, USA: ACM.
... Digital tools can facilitate the PCA practice, enhance learning, and allow for adapting this practice in large-scale classes [6][7][8]. However, the design of PCA tools requires sufficient knowledge about design requirements to ensure a straightforward learning experience [9,10]. ...
... Peer code review is a widely adopted best practice for ensuring code quality in the software industry. It also helps novice developers align with the software quality standards of their community [6,17]. Both general and inline comment features are common in industrial code review tools [18]. ...
Article
Full-text available
Peer code assessment (PCA) empowers computer science students, enhancing their learning and equipping them with practical skills for industry work. However, instructors often face a scarcity of well-designed tools customized for the learning environment. Additionally, there is a knowledge gap concerning the latest generation of peer assessment tools. So, the question is, how could a PCA tool be designed to support students’ programming learning experience? A case study was conducted using the PeerGrade tool, employing interviews and observation as data generation methods to answer this question. Research aimed to identify features that enhance student learning and uncover essential qualities of peer code assessment tools from students’ perspectives. Informants considered inline comments, general comments, rubrics, threads, and Code editor functionalities essential. They also highlighted the importance of utilizing a customized, user-friendly tool with a step-by-step process. The Self-Regulated Learning conceptual theory has been used as a theoretical lens. Based on SRL, it has been found that implementing identified tool qualities and features will improve the learning experience by increasing students’ motivation and enabling them to follow their learning strategies. The findings can be used as a design principle for developing peer code assessment tools.
... Social feedback serves as a primary source of motivation regulation within the emulation level, leading to improvements and positively reinforcing prior improvements. One clear instantiation of social feedback within peer review systems involves backward evaluation, where authors rate the helpfulness of the feedback they received on their documents from respective reviewers (Luxton-Reilly, 2009;. Such feedback on review quality can act as a reinforcer, prompting individuals to track progress toward their goals by providing insights into what is working and what isn't, thereby increasing subsequent comment length for those whose comments OTHER-REGULATION OF PEER FEEDBACK LENGTH were too short or keeping the tendency to provide long comments for those whose comments were already long. ...
Article
The learning benefits of peer feedback depend upon students actively participating and providing high-quality comments. Theoretically, students should come to see this relationship and thus generally provide more constructive feedback with experience. However, even with experience, students often provide short and relatively unhelpful comments. Given that providing longer feedback—which tends to include more useful content—is consistently associated with learning, the relationship between peer feedback experiences and feedback length requires exploration. We analyzed online peer feedback data from 418 assignments across 197 courses at 57 different institutions. We first validate that there is a consistent linear increase in probability of including useful comment features when comment length increases and that most peer comments tend to be so short that useful features are rare. Then, utilizing negative binomial regression and meta-analysis, we sought to examine the effect size and consistency of the relationship between specific peer feedback experiences and changes in peer feedback length, conceptualized as other-regulation. Our findings revealed several other-regulation patterns that were highly consistent across courses and assignments, conceptualized in terms of modeling via receiving longer feedback, potential changes in self-efficacy due to the perceived helpfulness of peers’ feedback, and positive reinforcement from recognition for their own review quality.
... Even though some studies have described peer assessment systems, and there exists a survey of tools (albeit a nearly 15-year-old one) [23], little attention has been paid to the methodologies used for collecting and aggregating peer assessment data. These methodologies are the focus of this research. ...
Chapter
Iterative peer grading activities may keep students engaged during in-class project presentations. Effective methods for collecting and aggregating peer assessment data are essential. Students tend to grade projects favorably. So, while asking students for numeric grades is a common approach, it often leads to inflated grades across all projects, resulting in numerous ties for the top grades. Additionally, students may strategically assign lower grades to others’ projects so that their projects will shine. Alternatively, requesting students to rank all projects from best to worst presents challenges due to limitations in human cognitive capacity. To address these issues, we propose a novel peer grading model consisting of (a) an algorithm designed to elicit student evaluations and (b) a median-based voting protocol for aggregating grades to a single ranked order that reduces ties. An application based on our model was deployed and tested in a university course, demonstrating fewer ties between alternatives and a significant decrease in students’ cognitive and communication burdens.
... We explored existing literature to conduct a systematic analysis of the multitude, variety, and complexity of such implementations, functionalities, and design choices in OPRA. Several attempts have been made to survey computerized peerassessment practices (Bouzidi & Jaillet, 2009;Chang et al., 2021;Davies, 2000;Doiron, 2003;Gikandi et al., 2011;Luxton-Reilly, 2009;Tenório et al., 2016;Topping, 2005), or some specific aspects of peer assessment, such as approaches to reliability and validity of peer evaluations (Gehringer, 2014;Misiejuk & Wasson, 2021;Patchan et al., 2017). However, meta-analysis of OPRA systems is complicated because their design space has high dimensionality; OPRA practices and designs vary across many disciplines in many different ways (Søndergaard & Mulder, 2012). ...
Article
Full-text available
Over the past two decades, there has been an explosion of innovation in software tools that encapsulate and expand the capabilities of the widely used student peer assessment. While the affordances and pedagogical impacts of traditional in-person, “paper-and-pencil” peer assessment have been studied extensively and are relatively well understood, computerized (online) peer assessment introduced not only shifts in scalability and efficiency, but also entirely new capabilities and forms of social learning interactions, instructor leverage, and distributed cognition, that still need to be researched and systematized. Despite the ample research on traditional peer assessment and evidence of its efficacy, common vocabulary and shared understanding of online peer-assessment system design, including the variety of methods, techniques, and implementations, is still missing. We present key findings of a comprehensive survey based on a systematic research framework for examining and generalizing affordances and constraints of online peer-assessment systems. This framework (a) provides a foundation of a design-science metatheory of online peer assessment, (b) helps structure the discussion of user needs and design options, and (c) informs educators and system design practitioners. We identified two major themes in existing and potential research—orientation towards scaffolded learning vs. exploratory learning and system maturity. We also outlined an agenda for future studies.
... And there are many design elements of a peer feedback collection tool (e.g., subject area, timing, peer matching, anonymity) that can impact the quantity and quality of the comments [21]- [23]. However, peer feedback specifically during oral presentations is not as well studied as other assessment types (e.g., writing samples) [24], has an emphasis on peer grading rather than qualitative comments, and an increasing focus on the incorporation of technology (e.g., clicker systems, exam software, mobile apps) [25]- [27]. ...
Article
Full-text available
The web-based system Peerwise allows students to submit their own multiple-choice questions (MCQs) about course content, complete with distractors and an explanation of the ‘correct’ answer. Other students can then attempt their peers’ questions and provide feedback on the quality of each question. To date, Peerwise has been used mostly in subjects where typical MCQs have a finite number of correct answers. This paper – a work-in-progress – suggests that Peerwise is potentially useful in units with a more ‘discursive’ orientation, such as MUS100, offered at Macquarie University, Australia. The system provides a good forum in which students can test each other on both lower-order and higher-order tasks. Future research will explore the efficacy of the tool across multiple iterations of the unit MUS100.
Article
Full-text available
Peer assessment is acknowledged as a potent strategy that motivates students to engage in reflection and comparison through the evaluation of their peers' work. However, the effectiveness of peer assessment is not always satisfactory due to a number of factors, such as lack of in-depth understanding of peers' work and insufficient motivation for the behaviour of evaluation. Based on constructivist theory, this study proposed an Understanding-Evaluation-Backward Evaluation-Reflection based peer assessment (UEBR-PA) approach, which directs students to comprehensively grasp their peers' projects, undertake efficient evaluations, provide retrospective reviews on others' feedback, and ultimately, integrate this collective feedback to refine and enhance their own projects. To evaluate the effectiveness of the proposed approach, a quasi-experiment was conducted in an Open-Source Hardware Project Design course with 54 university students from two classes. One class was assigned as the experimental group to conduct the peer assessment using the proposed approach, while the other class was assigned as the control group to conduct the peer assessment using the conventional presentation-evaluation-communication based approach. The results of the study indicated that the UEBR-PA approach significantly increased students' creative self-efficacy, critical thinking tendency, and learning performance. Furthermore, students engaging with the UEBR-PA approach showcased more positive interactive assessment behaviours.
Conference Paper
Full-text available
Code review is a common type of peer review in Computer Science (CS) education. It's a peer review process that involves CS students other than the original author examining source code and is widely acknowledged as an effective method for reducing software errors and enhancing the overall quality of software projects. While code review is an essential skill for CS students, they often feel uncomfortable to share their work or to provide feedback to peers due to concerns related to coding experience, validity, reliability, bias, and fairness. An automated code review process could offer students the potential to access timely, consistent, and independent feedback about their coding artifacts. We investigated the use of generative Artificial Intelligence (genAI) to automate a peer review process to enhance CS students' engagement with code review in an industry-based subject in the School of Computing and Information System, University of Melbourne. Moreover, we evaluated the effectiveness of genAI at performing checklist-based assessments of code. A total of 80 CS students performed over 36 reviews in two different weeks. We found our genAI-powered reviewing process significantly increased students' engagement in code review and, could also identify a larger number of code issues in short times, leading to more fixes. These results suggest that our approach could be successfully used in code reviews, potentially helping to address issues related to peer review in higher education settings.
Article
Full-text available
This paper reports on a case study that evaluates the validity of assessing students via a computerized peer-marking process, rather than on their production of an essay in a particular subject area. The study assesses the higher-order skills shown by a student in marking and providing consistent feedback on an essay. In order to evaluate the suitability of this method of assessment in judging a student’s ability, their results in performing this peer-marking process are correlated against their results in a number of computerized multiple-choice exercises and also the production of an essay in a cognate area of the subject being undertaken. The results overall show a correlation of the expected results in all three areas of assessment being undertaken, rated by the final grades of the students undertaking the assessment. The results produced by quantifying the quality of the marking and commenting of the students is found to map well to the overall expectations of the results produced for the cohort of students. It is also shown that the higher performing students achieve a greater improvement in their overall marks by performing the marking process than those students of a lower quality. This appears to support previous claims that awarding a ‘mark for marking’ rewards the demonstration of higher order skills of assessment. Finally, note is made of the impact that such an assessment method can have upon eradicating the possibility of plagiarism.DOI: 10.1080/0968776042000259573
Article
Full-text available
Calibrated Peer Review (CPR) is an online tool being used to integrate a writing component in classrooms. In an introductory zoology lecture class, we found that CPR-assigned scores were significantly higher than instructor-assigned scores on two of three essay assignments. We also found that neither students' technical-writing skills nor their abilities to convey scientific understanding of articles through summary essays improved during the semester. The CPR system offered fairly simple setup and submission for students and decreased grading time for instructors, so we offer suggestions on CPR settings and the best usage of this program.
Conference Paper
Peer Review (also known as Peer Assessment) is an important technique in learning, but can be difficult to support through e-learning due to the complexity and variety of peer review processes. In this paper we present PeerPigeon, a Web 2.0 style application that supports generalised Peer Review by using a canonical model of Peer Review based on a Peer Review Pattern consisting of Peer Review Cycles, each defined in terms of Peer Review Transforms. We also demonstrate how PeerPigeon makes use of a Domain Specific Language based on Ruby to define these plans, and thus cope with the irreducible complexity of the flow of documents around a peer network.
Article
PeerWise is a web-based system that supports the creation of student-generated test banks of multiple choice questions. Students contribute question stems and answers, provide explanations, answer questions contributed by other students, rate questions for difficulty and quality, and participate in on-line discussions of all these activities. In 2007, the system was used in four computing classes that varied in level, instructors, and student reward. We present results that show common patterns of response from students, and outline some initial investigations into the impact of the system on student performance. Our main findings are: external motivators are needed only for question generation; exam performance is correlated with participation in on-line discussions; and, despite student enthusiasm, drill-and-practice use does not contribute to exam success.
Article
This paper presents a new approach for creating and conducting structured asynchronous collaborative activities and incorporating them in the existing instructional setting for all subjects and levels. CeLS is a web-based system designed to create and reuse Activity Structures; runable formats reflecting various collaborative instructional strategies e.g., creating and analyzing a common database, reaching an agreement, peer-product evaluation, contest, creating a group product. The unique feature in CeLS's design is its ability to use learners' products from previous stages and to conduct complex, multi-stage, structured activities. CeLS provides a sample of content-free Activity Structures and a searchable domain of all the activities that were implemented with students. Teachers can explore these examples, adopt them for personal use and adapt their structure and content to suit their specific needs. If none of the existing pre-designed resources seems to suit the needs, they can create new structures using basic building blocks.
Conference Paper
Peer-evaluation is a powerful method for fostering learning in a variety of contexts. Yet challenges of application in contexts involving personal values received little attention. This study used a design-based research approach to explore such challenges in an undergraduate educational-philosophy course. The study was organized in three design-and-implementation iterations of a peer evaluation activity. Discrepancies between student and instructor scores were explained by bias due to non-objective student personal stands. Refinements to the design, based on emerging design principles a) assisted students to better differentiate between objective criteria and personal opinions, b) increased learning gains, and c) decreased tensions between different cultural groups.
Article
This article details the implementation and use of a ‘Review Stage’ within the CAP (computerised assessment by peers) tool as part of the assessment process for a post‐graduate module in e‐learning. It reports upon the effect of providing the students with a ‘second chance’ in marking and commenting their peers’ essays having been able to view the peer‐comments of other markers. Included is discussion on how a mark for performing the peer‐marking process can be automatically generated that reflects the quality of the student’s marking and commenting of their peers’ work. Student feedback is also presented to illustrate the effect that this additional stage of computerised peer‐assessment has had upon the student’s learning, development and assessment.
Article
Some studies of peer assessment in higher education are reviewed, and found to focus on either assessment of a product such as an examination script, or of the performance of a particular skill, often in a medical or dental setting. Classroom performance studies focus mainly on interpersonal skills or group dynamics. Many examples where mean peer assessments resembled lecturer assessments were found, and the overwhelming view seems to be that peer assessment is a useful, reliable and valid exercise. Student evaluations of peer assessment suggest that they also perceive it to be beneficial. However, some students expressed a dislike of awarding a grade to their peers, particularly in the context of a small, well established group. A study which attempted to capitalize on the benefits of peer assessment while minimizing the problems is described. In this study, the emphasis was on critical feedback, rather than on the awarding of a grade, though this was required also. Results indicated a close correspondence between lecturer and peer marks, as in previous studies. Feedback was perceived to be useful, and the scheme of Peer Feedback Marking (PFM) rated as conferring more benefits than the more usual, lecturer marked method. The main strength of PFM seems to be related to the enhancement of student learning by means of reflection, analysis and diplomatic criticism.
Article
PeerWise is a web-based system that allows multiple-choice question banks to be built solely from student input. The system provides a number of intrinsic reward structures that encourage students to contribute high-quality questions in the complete absence of instructor moderation. Several opportunities for learning arise, spanning the range from simple drill-and-practice exercises to deep, reflective study. Affective skills are also developed, as students are challenged to give and receive critical feedback and provide quality judgements. The system is freely available, and has been used in a range of disciplines in two Universities.