FURI | Summer 2023
An Independent Evaluation of ChatGPT on Math Word Problems
This research examines the performance of a large-language model known as GPT-3.5 in solving math word problems. The research team evaluates aspects of the model’s response that are strong indicators of it being correct. Doing this provides valuable insights into the capabilities of GPT-3.5 in tackling mathematical problem-solving tasks and allows the researchers to better understand how large-language models reason.