From Prompts to Paths: Large Language Models for Zero-Shot Planning in Unmanned Ground Vehicle Simulation

  • Kelvin Olaiya
  • , Giovanni Delnevo
  • , Chan Tong Lam
  • , Giovanni Pau
  • , Paola Salomoni

Research output: Contribution to journalArticlepeer-review

Abstract

Highlights: What are the main findings? A modular LLM-based zero-shot planning framework was developed for unmanned ground vehicles (UGVs), integrating visual and LiDAR inputs for multimodal reasoning and adaptive navigation. Benchmarking across foundational LLMs (Gemini 2.0 Flash, Gemini 2.0 Flash-Lite) demonstrates emergent goal-directed behavior but highlights significant variability and limitations in spatial consistency. What are the implications of the main finding? The results indicate that multimodal LLMs can perform basic spatial reasoning for autonomous planning, paving the way for more explainable and flexible navigation strategies in unmanned systems. The framework provides a foundation for hybrid autonomy, where high-level semantic reasoning from LLMs complements traditional low-level control and SLAM algorithms in future real-world drone and UGV applications. This paper explores the capability of Large Language Models (LLMs) to perform zero-shot planning through multimodal reasoning, with a particular emphasis on applications to Unmanned Ground Vehicles (UGVs) and unmanned platforms in general. We present a modular system architecture that integrates a general-purpose LLM with visual and spatial inputs for adaptive planning to iteratively guide UGV behavior. Although the framework is demonstrated in a ground-based setting, it directly extends to other unmanned systems, where semantic reasoning and adaptive planning are increasingly critical for autonomous mission execution. To assess performance, we employ a continuous evaluation metric that jointly considers distance and orientation, offering a more informative and fine-grained alternative to binary success measures. We evaluate a foundational LLM (i.e., Gemini 2.0 Flash, Google DeepMind) on a suite of zero-shot navigation and exploration tasks in simulated environments. Unlike prior LLM-robot systems that rely on fine-tuning or learned waypoint policies, we evaluate a purely zero-shot, stepwise LLM planner that receives no task demonstrations and reasons only from the sensed data. Our findings show that LLMs exhibit encouraging signs of goal-directed spatial planning and partial task completion, even in a zero-shot setting. However, inconsistencies in plan generation across models highlight the need for task-specific adaptation or fine-tuning. These findings highlight the potential of LLM-based multimodal reasoning to enhance autonomy in UGV and drone navigation, bridging high-level semantic understanding with robust spatial planning.

Original languageEnglish
Article number875
JournalDrones
Volume9
Issue number12
DOIs
Publication statusPublished - Dec 2025

Keywords

  • drone autonomy
  • Large Language Models
  • path planning
  • unmanned ground vehicles

Fingerprint

Dive into the research topics of 'From Prompts to Paths: Large Language Models for Zero-Shot Planning in Unmanned Ground Vehicle Simulation'. Together they form a unique fingerprint.

Cite this