July 2024
·
13 Reads
·
2 Citations
International Journal of Artificial Intelligence in Education
Large language models (LLMs) offer an opportunity to make large-scale changes to educational content that would otherwise be too costly to implement. The work here highlights how LLMs (in particular GPT-4) can be prompted to revise educational math content ready for large scale deployment in real-world learning environments. We tested the ability of LLMs to improve the readability of math word problems and then looked at how these readability improvements impacted learners, especially those identified as emerging readers. Working with math word problems in the context of an intelligent tutoring system (i.e., MATHia by Carnegie Learning, Inc), we developed an automated process that can rewrite thousands of problems in a fraction of the time required for manual revision. GPT-4 was able to produce revisions with improved scores on common readability metrics. However, when we examined student learning outcomes, the problems revised by GPT-4 showed mixed results. In general, students were more likely to achieve mastery of the concepts when working with problems revised by GPT-4 as compared to the original, non-revised problems, but this benefit was not consistent across all content areas. Further complicating this finding, students had higher error rates on GPT-4 revised problems in some content areas and lower error rates in others. These findings highlight the potential of LLMs for making large-scale improvements to math word problems but also the importance of additional nuanced study to understand how the readability of math word problems affects learning.