Alexis Hoh Sheng Jia’s research while affiliated with Monash University Malaysia and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (2)


Evaluating large language models for criterion-based grading from agreement to consistency
  • Article
  • Full-text available

December 2024

·

11 Reads

npj Science of Learning

·

Melissa Boey

·

Yan Yu Tan

·

Alexis Hoh Sheng Jia

This study evaluates the ability of large language models (LLMs) to deliver criterion-based grading and examines the impact of prompt engineering with detailed criteria on grading. Using well-established human benchmarks and quantitative analyses, we found that even free LLMs achieve criterion-based grading with a detailed understanding of the criteria, underscoring the importance of domain-specific understanding over model complexity. These findings highlight the potential of LLMs to deliver scalable educational feedback.

Download

Is ChatGPT Effective in Providing Educational Feedback? A Quantitative Analysis of Summative Feedback

January 2024

·

19 Reads

·

1 Citation

Providing effective feedback is one of the most promising ways in which large language models (LLMs) can transform education, particularly by offering scalable and personalized support to learners. We present systematic evaluation of an LLM – ChatGPT – in delivering both summative and formative feedback, using two sequent studies to assess its ability to provide effective feedback. Study 1 employed a quantitative approach to examine the quality of ChatGPT's summative feedback by comparing it to well-established human ratings, using Intraclass Correlation Coefficients (ICCs) to evaluate interrater agreement and consistency. Results indicated moderate absolute agreement but high consistency, particularly when prompts were tailored with domain-specific information, suggesting that LLMs can reliably deliver feedback when aligned with specific assessment criteria. Study 2 adopted a single-blind randomized controlled design to investigate the impact of formative feedback generated by ChatGPT compared to a well-established automated writing evaluation (AWE) tool. Multiple measures, including writing quality, perceived feedback quality, and writing motivation, were used to assess the efficacy of the feedback. Both tools effectively improved writing quality; however, only ChatGPT's feedback significantly enhanced student motivation, underscoring the potential of LLMs to foster greater engagement in learning beyond performance improvement. These findings highlight the transformative potential of LLMs like ChatGPT to provide scalable, personalized, and consistent educational support, which could play a critical role in reducing educational inequalities. Further refinement is needed to enhance the accuracy of LLM-generated feedback and to address ethical and pedagogical considerations for real-world implementation.