FURI | Summer 2023

An Independent Evaluation of ChatGPT on Math Word Problems

Data icon, disabled. Four grey bars arranged like a vertical bar chart.

This research examines the performance of a large-language model known as GPT-3.5 in solving math word problems. The research team evaluates aspects of the model’s response that are strong indicators of it being correct. Doing this provides valuable insights into the capabilities of GPT-3.5 in tackling mathematical problem-solving tasks and allows the researchers to better understand how large-language models reason.

View the poster