Instructional Judgment, Not Just Correctness: Evaluating AI-Generated Feedback on Student Mathematical Thinking

Published in Society for Information Technology and Teacher Education, 2026

bibtex

@inproceedings{ReinYovaMong2026zu,
	author = { Amanda Reinsburrow and Cindy Yovanov and Bill Mongan and Michael Cummins and Matthew Latta and Wesley Shumar and Valerie Klein and Jason Silverman and Steve Weimar },
	title = { Instructional Judgment, Not Just Correctness: Evaluating AI-Generated Feedback on Student Mathematical Thinking },
	booktitle = { Proceedings of Society for Information Technology & Teacher Education International Conference 2026 },
	year = { 2026 },
	month = { March },
	pages = { 2542--2547 },
	address = { Philadelphia, PA },
	publisher = { Association for the Advancement of Computing in Education (AACE) },
	url = { https://www.learntechlib.org/p/2129322 },
	abstract = { Use of AI to generate student feedback has the potential to increase teacher capacity to focus on student mathematical reasoning over the correct answer, by annotating student work to promote sustained engagement and development of student mathematical thinking skills. We leverage the Mathforum data corpus, a rich dataset of highly trained mentor feedback to student mathematical work and reasoning, to generate a Retrieval-Augmented Generation (RAG) model that supplements an AI system to generate engaging feedback from teacher annotations using the noticing and wondering framework. We evaluate the performance of this AI system using the Mathforum dataset against a similar AI system without this added training data, and against an AI system fine-tuned with the Mathforum training data, in order to determine the viability of a RAG approach to generating more meaningful student feedback. It is well known that text-based generative AI systems struggle with mathematical reasoning in the absence of code generation, but we found that the RAG model tended to focus responses on procedural correctness, while the fine-tuned model attempted to generate responses centered on student reasoning and process (albeit with flawed mathematics). Our findings suggest that a fine-tuned approach may better direct AI generations using large exemplar data such as the Mathforum data corpus, but should be coupled with a mathematical reasoning model or code generator, and with expert human supervision. }
}

Recommended citation: Reinsburrow, A., Yovanov, C., Mongan, B., Cummins, M., Latta, M., Shumar, W., Klein, V., Silverman, J. & Weimar, S. (2026). Instructional Judgment, Not Just Correctness: Evaluating AI-Generated Feedback on Student Mathematical Thinking. In Proceedings of Society for Information Technology & Teacher Education International Conference 2026 (pp. 2542-2547).
Download Paper

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)