FURI | Fall 2025

Bias-proof Data: Evaluating LLM Generalization

Data icon, disabled. Four grey bars arranged like a vertical bar chart.

Large language models (LLMs) are having considerable influence on real-world decisions and events. However, biases instilled from pre-training lead to unreliable performance. While other studies seek to reduce the impact of bias, the overall cause of bias remains widely unknown. The research team proposes using open-source LLMs and semantic word pairs to identify biased words or phrases that have a critical impact on model performance. This approach aims to identify how biases emerge through pre-training data. These findings may allow for future improvements in pre-training practices and model design, further enhancing their reliability, general robustness, and explainability for critical applications.

View the poster