FURI | Spring 2025

Merging Large Language Models: Threats and Opportunities

Data icon, disabled. Four grey bars arranged like a vertical bar chart.

While Merged Generative AI Models offer promising avenues for creating domain-specific experts using significantly less compute resources, their safety and detectability are not formally evaluated in large language models. This threat, and merging as a whole, is especially relevant to open-source models (like DeepSeek and Facebook’s Llama) since they are publicly accessible and modifiable by anyone on the internet. Using real-world datasets, safety benchmarks, and diverse attack scenarios, researchers assess the impact of model merging techniques and quantify trade-offs between performance and safety. Findings are expected to contribute to designing safer and more reliable model merging techniques.

Student researcher

Aryan Vinod Keluskar

Computer science

Hometown: Mumbai and Hyderabad, India

Graduation date: Spring 2026