FURI | Summer 2025

LLMs as Guardian Angels: Real-World Task Planning with Safety-Critical Physical Systems

Data icon, disabled. Four grey bars arranged like a vertical bar chart.

This research investigates the feasibility of using large language models (LLMs) as “guardian angels” to plan daily real-world tasks involving physical devices and evolving user contexts. The study introduces a novel framework that enables LLMs to manage multi-task planning, adhere to safety and physical constraints, and adapt to dynamic environments. A custom benchmark dataset is curated, representing complex scenarios in domains such as autonomous driving and healthcare. The research further develops an LLM-based evaluation method to assess plan quality and reduce human oversight. This framework highlights both the strengths and limitations of LLMs in safety-critical, human-centered planning tasks.

View the poster