Grading Grades as a Measure of Student Learning

To read the full-text of this research, you can request a copy directly from the authors.


In this article, we discuss the limitations of grades as a measure of student learning. We also consider more robust measures that may be preferable to grades alone for assessing college students’ academic achievement. We begin with a brief history of grades, followed by a discussion of the pros and cons of educators’ widespread reliance on grades to represent student learning, capabilities, and overall potential. We then discuss supplemental assessment strategies that yield a richer reporting of student products, processes, and progress in their learning. Finally, we conclude by reiterating the inadequacy of employing a uni-dimensional measure to represent the multi-dimensionality of the human potential.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The commonly-held beliefs that teacher judgments and classroom assessments do not appear to be as accurate measures of academic achievement as standardized testing and that they are subject to frequent error and bias have been circulating among parents, students, and even some professionals, although not explicitly stated (Allen, 2005;Brophy, 1983;Egan & Archer, 1985;Hoge, 1983Hoge, , 1984Hoge & Cudmore, 1986;Schwab, Moseley, & Dustin, 2018). In line with such beliefs, Egan and Archer (1985) noted that "It is commonly argued that commercial tests provide teachers with valuable information about the abilities and deficiencies of their students, from which it follows that teachers who rate their students without such information will often be in error" (p. ...
... Additionally, as it appears that there are no independent and psychometrically-sound AREL measures of validating teacher grading because such grading practices are often restricted to the boundaries of the classroom and not applicable on a larger scale, the probable invalidity of such grading practices might pose concerns for teachers and other stakeholders (Allen, 2005;Schwab, Moseley, & Dustin, 2018). Disputes among the major stakeholders including parents, teachers, and students are more likely when teachers have no means of supporting the validity of their grading decision-making (McMillan, 2013). ...
Grades represent one of the most common sources of evidence of student achievement in classrooms, though their relationship with test scores has remained understudied, particularly in settings such as in Iran, where English is taught as a foreign language. The purpose of this study was to investigate the relationship between graded and tested achievement with respect to gender and proficiency level differences. Teacher-assigned grades and standardized achievement test scores of 693 Iranian learners of English taught by 15 teachers were examined. Primary analyses focused on the validity of teacher grades and the subsequent Pearson correlation coefficients revealed that grades associated positively with externally-validated test scores obtained from reliable tests, an indication of the validity of teacher grading. Additionally, the results of independent-samples t-tests showed that female students outperformed male students on achievement tests, but with fluctuations across proficiency levels. Higher proficiency levels gave male participants an advantage over female participants in achievement tests. Moreover, male teachers were found to grade female participants more accurately than their female counterparts. Implications are discussed for informing teachers about the validation of their grading practices, as well as for teacher education programs and teachers' professional development.
... En dépit de la validité écologique accrue de cet indicateur (Marsh et al., 2007) et de son importance pour la suite du parcours scolaire des jeunes (ex., admission dans les filières scolaires enrichies), il s'agit d'une source relativement limitée pour juger des différents types d'apprentissages plus spécifiquement réalisés par les élèves. De plus, puisque les notes obtenues au bulletin sont issues de pratiques évaluatives qui diffèrent selon les enseignant·es, la fiabilité de cet indicateur pour capter l'apprentissage est incertaine (Schwab et al., 2018). Pour mieux cerner l'influence des pratiques de gestion de classe sur les apprentissages réalisés, une mesure uniforme permettant de capter les formes et la durabilité des apprentissages serait donc souhaitable. ...
Cette étude examine l’hypothèse selon laquelle les pratiques des enseignant·es perçues par les élèves relatives à la gestion de classe (comportements positifs et coercitifs) et au climat de classe (soutien académique et émotionnel, respect mutuel, interactions liées aux tâches) influencent indirectement le rendement des élèves, à travers leur motivation scolaire. Ainsi, des élèves de première année du secondaire ont rapporté les pratiques de leur enseignant·e et leur motivation scolaire (attentes de succès et valeur) en français (n = 1417) ou en mathématiques (n = 1420) et les milieux scolaires ont fourni le rendement des élèves. Des analyses de pistes ont révélé qu’une fois le rendement antérieur pris en compte, les pratiques ciblées modulent généralement au moins un indicateur motivationnel, mais que seule la valeur attribuée aux apprentissages prédit le rendement ultérieur. Ces constats se généralisent aux garçons comme aux filles, et dans la grande majorité des cas, aux élèves de niveau scolaire faible et élevé. De tels résultats suggèrent que les enseignant·es, à travers une combinaison de pratiques mobilisées pour établir leur gestion de classe et leur climat de classe, peuvent faciliter l’arrivée au secondaire de leurs élèves, notamment en influençant leur motivation.
... In addition to assessing a student's performance, grades can serve as a means of sharing information about a student's accomplishments in college courses with the student and others, such as graduate schools, professional schools, and future employers; as a measure of the potential for further success; as an incentive for students to continue learning and for future academic improvement; and as a way to structure a lesson, a module, or a semester, in that grades indicate transitions in a course and provide closure (Walvoord & Anderson, 1998). Additionally, a student's grade point average (GPA) has acquired wide acceptance as a major sorting mechanism for differentiating between excellent and deficient students; and for deciding who advances and who does not in elementary and secondary schools; who gets accepted to colleges and universities; who gets accepted to graduate school, medical school, and law school; and frequently who attains employment (Schwab et al., 2018). Grades can also be seen as measures of how well a student lives up to the instructor's expectations of what a good student is instead of measuring the student's academic performance in the subject matter (Allen, 2005). ...
The purpose of this study was to find out more about what hospitality and tourism educators think about grading to add to the dialog about this subject that is relevant to both instructors and their students. A survey was completed by 118 educators from around the world to learn about instructors’ perceptions of grading practices. Those sampled used predominantly absolute grading based on criteria and the letter system grading in the format of A, B, C, and D. Respondents viewed education as a continuing process of improvement, perceived that grading primarily provides information about student performance, and placed importance on papers and other written assignments when assigning grades in courses. Themes identified in the study included: assigning and grading a variety of items to measure student performance, using portfolios of student work, and providing meaningful feedback regarding academic performance. Based on the findings, suggestions for improving grading practices are provided.
... Course grades, and the assessment scores from which they are derived, are useful but imperfect assessment of student learning (Guskey, 2007;Kohn, 2011;Pulfrey et al., 2011;Schwab et al., 2018). Many traditional courses stress efficiency in the time and effort taken up with assessment by having a small number of high-stakes exams-sometimes only one or two-that cover large portions of course material and hence count toward large proportions of overall grades. ...
Introduction Psychology instructors face decisions about adopting new approaches to lectures, readings, and assessment in their courses. Statement of the Problem These choices about course structure can be both intimidating and confusing in terms of the costs and benefits for different options. Literature Review As framed by anecdotal and empirical evidence from personal experience in teaching introductory psychology, this article reviews research on impactful pedagogy. The goal is to provide useful encouragement and cautionary notes on these different options. Teaching Implications Based on 18 classes, taught over a dozen years, the authors provide concrete tips for how to navigate decision points and implement these teaching changes, as supported by student performance and course evaluation data. Conclusion Though unable to make causal conclusions from this limited data, it is worthwhile to discuss the tradeoffs of choices surrounding lectures, readings, and assessments in introductory psychology.
... Grades are important to students. However, as instructors, Schwab et al. [29] note that we are more interested in student learning. While we already knew that student grades in our flipped classes were not impacted by classroom type, we did want to find out whether classroom type impacted student engagement, which is "highly correlated with many desirable learning and personal developmental outcomes of college", according to Axelson and Flick [30]. ...
Full-text available
Problem-based learning is the latest name for a teaching philosophy that is as old as Ancient Greece. Whether you call it Socratic Inquiry, case-based teaching, problem-based learning, interactive group learning, or “flipped” learning, the essential concept is to encourage the student to collaborate in applying their gained knowledge to solve a problem. As traditional lecture-based teaching has been challenged, the design of classrooms has been called into question. A flat or tiered room is not seen as an ideal setting for collaborative work. In our own College of Business, several traditional classrooms were converted to problem-based learning classrooms at considerable expense. This paper evaluates, using measures based on Michael G. Moore’s theory of transactional distance, whether moving flipped classes into these high-tech classrooms improves the collaborative learning experience. Transactional distance can be defined as the barriers that exist to a student’s engagement with their learning experience. These barriers arise due to the interaction between students and the teacher, other students, the subject matter content, and instructional technology being used. Our results suggest that, from a student engagement and outcome standpoint, the investment in costly high-tech classrooms is not warranted—a welcome result in times when university budgets are stretched to the limit.
... Failure is a common part of System 2 thought processes. How to know whether System 2 thinking is working or not thus presents another barrier to its engagement, especially in an educational environment commonly governed by standardized learning expectations and easily quantifiable and measurable learning outcomes (Schwab et al., 2018). ...
In Thinking, Fast and Slow (2011) Kahneman describes two modes of thinking: System 1 and 2. System 1 operates quickly, automatically, and unconsciously, drawing on our vast reservoir of stored knowledge to decide what should or should not be done in any situation. System 2 is a slower, more deliberate process, requiring us to step back from our immediate circumstances to analyze them in more depth. Kahneman praises System 1 for its efficiency in dealing with life’s ordinary problems, but cautions against relying on System 1 when faced with more complex problems. In this paper, we reason that the essence of effective college teaching is moving students from System 1 to System 2 thinking. We describe both systems in detail, illustrate how System 1 thinking applied to a System 2 problem can be troublesome, and then propose an educational strategy to elevate students to a System 2 frame of mind.
Course grades are commonly used as an evaluation metric within institutions and as part of education research. However, using grades to compare across course sections may implicitly assume that grades are awarded similarly and consistently. This article details how different sections of the same course offered differing amounts of extra credit and adjusted letter grades to different extents at the end of the term ( post hoc ). In one section, extra credit altered the letter grades of 26.6% of students, and post hoc adjustments altered the letter grades of 35.4% of students. In contrast, in a concurrently-offered section, 1.7% of student grades changed due to extra credit, and 4.3% due to post hoc adjustments. This may complicate some grade-based assessments of instructors, curricula, pedagogical practices, or students. We hope this catalyzes further study into how widespread this phenomenon is, what mechanisms influence it, and what the implications are. Meanwhile, we suggest that education researchers might consider explicitly discussing any available evidence that grades are consistently awarded and/or the possible repercussions of any inconsistency. When not possible, this might be discussed as a study limitation.
Providing feedback to students is a critically important step in the learning process, and yet in many classrooms, feedback only occurs at the end of assignments, almost serving as a postmortem in justifying a student’s final grade (Percell, 2017: 111). This chapter discusses the importance of grading as a means of formative and summative assessment. The chapter is in search of ways to substitute graded activities with non-graded portfolios, quizzes, and exams for the sake of providing healthy feedback aimed at developing effective writing skills. The chapter also reports on the ways to treat students’ attendance and participation, claiming that both of them should be implicitly assessed, yet should not constitute a part of a final grade. Such a traditional approach is juxtaposed with a digitalized one, where the role of technology is to facilitate communication between a teacher and students, making it possible to deliver oral and written support to students online and on campus.
Existing literature discusses various issues associated with the teaching learning process, type of examinations to be conducted, methods of assessing students’ performance, advantage and disadvantages of each method, psychological issues of teachers, students, parents etc. It has been observed that there is no mathematical tool to early predict the unexpected consequence of relative grading and to get away from this problem. Neither touching ethical issues of teaching learning process nor the tools to set proper question paper and evaluation, in this article some formulae were introduced through which, one can easily identify the courses which bound to produce unintended consequences with the relative grading. Set formulae were designed based on the vital parameters of normal distribution -that is the mean and standard deviation of the scores obtained by the students in a course examination.
The purpose of this study is to conduct empirical research into the use of an interactive e-book in a predominantly international MSc Business Management cohort, evaluating its impact on engagement, and academic achievement. There is a lacuna in the current state of research into interactive e-books, and a lack of attention on international students studying in the UK. Quantitative data was obtained and analysed using t-tests and correlational analysis. The key findings were that those students who used the interactive e-book scored significantly higher coursework and examination marks than those students who did not. Student engagement was critical to the effectiveness of the interactive e-book, with students performing better when it was adopted as summative as opposed to formative assessment. The paper contributes to the literature on blended learning and the use of digital technology, and strongly highlights the benefits of its use in an international student cohort.
Full-text available
The authors explore a history of grading and review the literature regarding the purposes and impacts of grading. They then suggest strategies for making grading more supportive of learning, including balancing accuracy-based and effort-based grading, using self/peer evaluation, curtailing curved grading, and exercising skepticism about the meaning of grades.
Full-text available
Education leaders must recognize obstacles to grading reform that are rooted in tradition—and then meet them head on. Education improvement efforts over the past two decades have focused primarily on articulating standards for student learning, refining the way we assess students' proficiency on those standards, and tying results to accountability. The one element still unaligned with these reforms is grading and reporting. Student report cards today look much like they looked a century ago, listing a single grade for each subject area or course. Educators seeking to reform grading must combat five long-held traditions that stand as formidable obstacles to change. Although these traditions stem largely from misunderstandings about the goals of education and the purposes of grading, they remain ingrained in the social fabric of our society. Obstacle 1: Grades should provide the basis for differentiating students. This is one of our oldest traditions in grading. It comes from the belief that grades should serve to differentiate students on the basis of demonstrated talent. Students who show superior talent receive high grades, whereas those who display lesser talent receive lower grades. Although seemingly innocent, the implications of this belief are significant and troubling. Those who enter the profession of education must answer one basic, philosophical question: Is my purpose to select talent or develop it? The answer must be one or the other because there's no in-between. If your purpose as an educator is to select talent, then you must work to maximize the differences among students. In other words, on any measure of learning, you must try to achieve the greatest possible variation in students' scores. If students' scores on any measure of learning are clustered closely together, discriminating among them becomes difficult, perhaps even impossible. Unfortunately for students, the best means of maximizing differences in learning is poor teaching. Nothing does it better.
Full-text available
Evaluation is an inescapable feature of academic life with regular grading and performance appraisals at school and at university. Although previous research has indicated that evaluation and grading in particular are likely to have a substantial impact on motivational processes, little attention has been paid to the relationship between grading and approach versus avoidance achievement goals, 2 fundamental concerns whenever evaluation is at stake. Three experiments, carried out in professional schools, revealed that expectation of a grade for a task, compared with no grade, consistently induced greater adoption of performance-avoidance, but not performance-approach, goals. Experiments 2 and 3 revealed that expectation of a grade, compared with no grade, consistently induced greater adoption of performance-avoidance goals even when grading was accompanied by a formative comment. Furthermore, Experiment 3 showed that reduced autonomous motivation measured after having completed a task for a grade versus no grade mediated the relationship between grading and adoption of performance-avoidance goals in a subsequent task. Results are discussed in the light of achievement goal and self-determination theory. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Full-text available
The research on formative assessment and feedback is reinterpreted to show how these processes can help students take control of their own learning, i.e. become self-regulated learners. This refor-mulation is used to identify seven principles of good feedback practice that support self-regulation. A key argument is that students are already assessing their own work and generating their own feedback, and that higher education should build on this ability. The research underpinning each feedback principle is presented, and some examples of easy-to-implement feedback strategies are briefly described. This shift in focus, whereby students are seen as having a proactive rather than a reactive role in generating and using feedback, has profound implications for the way in which teachers organise assessments and support learning.
Grades matter and the future lives of students are in many ways dependent on teacher grading practices. After all, so many decisions that affect students’ lives, including student ranking, matriculation, retention, college admission, and scholarships, depend on grades (Guskey, 2015; Marzano, 2000). This is troubling because the grading practices used in high schools across the country are generally considered to be highly variable and invalid measures of learning, often consisting of a hodgepodge of factors including achievement, behavior, and effort (Brimi, 2011; Brookhart, 1991; Randall & Engelhard, 2009, 2010). In other words, the data that many parties use to make important decisions regarding the lives of students is invalid. In fact, Marzano (2000), a leading grading researcher, declared, “Grades are so imprecise that they are almost meaningless” (p. 1). While most parties assume that grades represent student learning (Brookhart, 2004), this is rarely the case. As a result, the misinterpretation of grade data often leads to poor decisions regarding students. To add insult to injury, over 100 years of research has documented these problems (Brookhart et al., 2016), but educators’ seem- ing indifference to the research has resulted in no significant changes (Dueck, 2014; Reeves, 2011).
Based on Fullan’s work with school districts and large systems in the United States, United Kingdom, and Canada, this resource lays out a comprehensive action plan for achieving whole system reform.
Background/Context College grades can influence a student's graduation prospects, academic motivation, postgraduate job choice, professional and graduate school selection, and access to loans and scholarships. Despite the importance of grades, national trends in grading practices have not been examined in over a decade, and there has been a limited effort to examine the historical evolution of college grading. Purpose/Objective/Research Question/Focus of Study Here we look at the evolution of grading over time and space at American colleges and universities over the last 70 years. Our data provide a means to examine how instructors’ assessments of excellence, mediocrity, and failure have changed in higher education. Data Collection and Analysis We have collected historical and contemporary data on A–F letter grades awarded from over 200 four-year colleges and universities. Our contemporary data on grades come from 135 schools, with a total enrollment of 1.5 million students. Research Design Through the use of averages over time and space as well as regression models, we examine how grading has changed temporally and how grading is a function of school selectivity, school type, and geographic region. Findings/Results Contemporary data indicate that, on average across a wide range of schools, A's represent 43% of all letter grades, an increase of 28 percentage points since 1960 and 12 percentage points since 1988. D's and F's total typically less than 10% of all letter grades. Private colleges and universities give, on average, significantly more A's and B's combined than public institutions with equal student selectivity. Southern schools grade more harshly than those in other regions, and science and engineering-focused schools grade more stringently than those emphasizing the liberal arts. At schools with modest selectivity, grading is as generous as it was in the mid-1980s at highly selective schools. These prestigious schools have, in turn, continued to ramp up their grades. It is likely that at many selective and highly selective schools, undergraduate GPAs are now so saturated at the high end that they have little use as a motivator of students and as an evaluation tool for graduate and professional schools and employers. Conclusions/Recommendations As a result of instructors gradually lowering their standards, A has become the most common grade on American college campuses. Without regulation, or at least strong grading guidelines, grades at American institutions of higher learning likely will continue to have less and less meaning.
Acknowledged or not, students are curriculum theorists and critics of schooling. Useful lessons can be learned by drawing them into the dialogue about the purposes and practices of education. The addition of children's perspectives can help education become an adventure in which teachers, researchers, and children together learn new questions as well as answers. This book relates the story of a year's observation of, and conversations with, a second grade class. It illustrates what happens when teachers go beyond listening and provoke children to become collaborators in their education; gives new substance to the educational ideas of John Dewey; and suggests how students can become more wholeheartedly involved in education. The book portrays young students as people with coherent notions about what knowledge is of most worth, and offers unique insights of interest to college students, classroom teachers, professors, administrators, and parents. Contains over 130 references. (TJQ)
This article examines multiple measures of performance in school accountability systems from two perspectives: laterally (different indicators of different domains) and vertically (indicators that are at different levels of depth of the same domain). Organizational responsibility and instructional sensitivity are examined. In particular, alternative procedures are explored for integrating into the multiple measures concept external, uniform top-down measures and responsive, locally adaptive bottom-up measures.
This study compared different stakeholders' perceived validity of various indicators of student learning used to judge the quality of students' academic performance. Data were gathered from the questionnaire responses of 314 educators in three states that have implemented comprehensive state-wide assessment programs with high-stakes consequences both for educators and for students. MANOVA results showed that while educators generally hold similar perceptions, significant differences exist between school administrators and teachers. Administrators perceived the results from nationally normed standardized assessments, state assessments, and district assessments to be more valid indicators of student achievement than did teachers. In contrast, teachers granted more validity to classroom observations and homework completion and quality than did administrators. The implications of these differences for reform initiatives are discussed, particularly with regard to teachers' motivation to improve results.
Why do colleges still use grades?
  • C Ruff
Ruff, C. (2016). Why do colleges still use grades? The Chronicle of Higher Education, March.
Harvard College’s median grade is an A- dean admits
  • V Strauss
The rare find: How great talent stands out
  • G Anders
Anders, G. (2011). The rare find: How great talent stands out. New York: Penguin. Antioch University Seattle. (2016). Retrieved on September 2, 2016. Retrieved from http://www.
The end of average: How we succeed in a world that values sameness
  • T Rose
Rose, T. (2016). The end of average: How we succeed in a world that values sameness. New York: HarperCollins Publishers.
Eliminating the grading system in college: The pros and cons
  • D Tomar
Tomar, D. (2017). Eliminating the grading system in college: The pros and cons. Retrieved from (accessed 02 February 2018).
The learning portfolio: A powerful idea for significant learning
  • J Zubizarreta
Zubizarreta, J. (2008). The learning portfolio: A powerful idea for significant learning. Manhattan, KS: The IDEA Center. Retrieved from
Developing standards-based report cards
  • T Guskey
  • J Bailey
Guskey, T., & Bailey, J. (2010). Developing standards-based report cards. Thousand Oaks, CA: Corwin.
Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Studies in Higher Education
  • D J Nicol
  • D Macfariane-Dick
Rethinking grading: Meaningful assessment for standards-based learning
  • C Vatterott
Vatterott, C. (2015). Rethinking grading: Meaningful assessment for standards-based learning. Alexandria, VA: ASCD.
Students react to a classroom without grades
  • S Sackstein
Sackstein, S. (2014). Students react to a classroom without grades. Education Week.
The wound is mortal': Marks, honors, unsound activities. The Clearing House
  • D De Zouche
De Zouche, D. (1945). 'The wound is mortal': Marks, honors, unsound activities. The Clearing House, 19, 339-344. doi:10.1080/00098655.1945.11474021
Imagining college without grades
  • S Jaschik
Jaschik, S. (2009). Imagining college without grades. Retrieved from news/2009/01/22/grades (accessed 02 February 2018)
Teaching without grades
  • M Marshall
Marshall, M. (1968). Teaching without grades. Corvallis, OR: Oregon State University Press.
Formative assessment and standards-based grading: The classroom strategies series
  • R Marzano
Marzano, R. (2009). Formative assessment and standards-based grading: The classroom strategies series. Bloomington, IN: Marzano Research.
School: The story of American public education
  • S Mondale
Mondale, S. (2001). School: The story of American public education. Boston: Beacon Press.
Harvard College's median grade is an A-, dean admits. The Washington Post
  • V Strauss
Strauss, V. (2013). Harvard College's median grade is an A-, dean admits. The Washington Post, December 4.
Individuality. Boston: Houghton Mifflin
  • E Thorndike
Teaching for understanding: Linking research with practice
  • M Wiske
Wiske, M. (1998). Teaching for understanding: Linking research with practice. San Francisco: Jossey-Bass.
The general ecology of knowledge in curriculums of the future
  • J Clark
Clark, J. (1972). The general ecology of knowledge in curriculums of the future. In E. Laszlo (Ed.) The relevance of general systems theory (pp. 163-180). New York, NY: George Braziller.
Accountability systems in support of student learning: Moving to the next generation (CRESST Line)
  • J Herman
  • E Baker
  • R Linn
Herman, J., Baker, E., & Linn, R. (2004). Accountability systems in support of student learning: Moving to the next generation (CRESST Line). Los Angeles, CA: Center for Research on Evaluation Standards, and Student Learning.
Ahead of the curve: The power of assessment to transform teaching and learning
  • K O'connor
O'Connor, K. (2007). The last frontier: Tackling the grading dilemma. In D. Reeves (Ed.), Ahead of the curve: The power of assessment to transform teaching and learning. Bloomington, IN: Solution Tree.
How good do your college grades actually have to be? USA Today
  • P O'brien
O'Brien, P. (2013). How good do your college grades actually have to be? USA Today, October 29.