MORE | Fall 2025
Predicting LLM Planning Performance with Logistic Regression
Large language models show promise in high-level planning but often fail unpredictably across problem instances and model sizes. We study whether the planning success of an LLM can be predicted in advance using compact text features. Using a 501-instance Blocksworld benchmark and 500 Plan bench instances, we construct natural-language prompts for each instance that include the domain description, initial state, and goal state, and obtain plans from two LLM variants (Llama 8B and Llama 70B). Each instance has a ground-truth validity label indicating whether the produced plan is executable and achieves the goal. Each prompt is then converted to a fixed-length text vector using Sentence-BERT (SBERT). We compute 384 dimensional embeddings of the inputs and optionally the outputs. A regularized logistic regression classifier is trained on the SBERT embeddings to predict success versus failure. At inference time, before running the planner, the classifier outputs a probability that a given prompt will yield a valid plan for a specified LLM; this probability is converted to a final prediction using a fixed or validation-tuned threshold. Our evaluation reports Accuracy, F1, Precision, Recall, and ROC-AUC with stratified splits, and includes threshold calibration to avoid degenerate “always valid” predictions. We also examine cross-model transfer, training on one LLM and testing on another, to probe how general the signals are. Analysis relates prediction quality to prompt length, goal complexity, and plan length, and compares prompt-only versus prompt+plan embeddings to quantify the value of output features. The expected outcome is a practical gatekeeper that flags likely failures early, saving compute and enabling adaptive strategies such as model switching or prompt refinement. We do not assume findings at this stage; the contribution is the dataset protocol, predictive framework, and planned evaluations.
Student researcher
Sanjay Chezhian
Robotics and autonomous systems
Hometown: Chennai, Tamil Nadu, India
Graduation date: Spring 2026