FURI | Summer 2025

Text-to-Image Models Do NOT “Know” How to Interpret “No”

Data icon, disabled. Four grey bars arranged like a vertical bar chart.

Modern text-to-image (T2I) models often misunderstand negative instructions, generating the very objects, attributes, or relationships users ask to omit. The inability to interpret negation limits the precision and reliability of these creative tools. To address this failure, the study establishes a benchmark dataset of negation-inclusive prompts to systematically evaluate state-of-the-art (SOTA) T2I models. The methodology further involves assessing current automated T2I evaluation metrics and investigating the underlying causes of this negation deficiency. By establishing a critical benchmark that quantifies how poorly current T2I models handle negation, this work ranks current SOTA T2I models and provides a standard against which all future efforts to solve this problem can be measured.