FURI | Spring 2025

Bias-Proof Math: Evaluating LLM Generalization

Data icon, disabled. Four grey bars arranged like a vertical bar chart.

Large language models (LLMs) are being employed in increasingly complex problem-solving tasks. However, previous studies show how subtle variations in prompts can lead to unexpected failures. These failures stem from a lack of understanding of how biases may critically affect a model’s reasoning process. The research team proposes using biased prompts and datasets regarding fundamental math processes to systematically evaluate the model’s performance. This approach aims to identify any root causes that may lead to unreliable model responses. These findings may allow for future improvements in model design, further enhancing their reliability, overall robustness, and explainability for critical applications.

Student researcher

Joshua Tom

Computer systems engineering

Hometown: Chandler, Arizona, United States

Graduation date: Spring 2028