Distilling On-device Language Models for Robot Planning with Minimal Human Intervention
University of Pennsylvania

TLDR: PRISM automatically distills performant small language model (SLM)-enabled planners given a source LLM-enabled planner, without the need for manually curated datasets or simulators.

Abstract


Large language models (LLMs) provide robots with powerful contextual reasoning abilities and a natural human interface. Yet, current LLM-enabled robots typically depend on cloud-hosted models, limiting their usability in environments with unreliable communication infrastructure, such as outdoor or industrial settings. We present PRISM, a framework for distilling small language model (SLM)-enabled robot planners that run on-device with minimal human supervision. Starting from an existing LLM-enabled planner, PRISM automatically synthesizes diverse tasks and environments, elicits plans from the LLM, and uses this synthetic dataset to distill a compact SLM as a drop-in replacement of the source model. We apply PRISM to three LLM-enabled planners for mapping and exploration, manipulation, and household assistance, and we demonstrate that PRISM improves the performance of Llama-3.2-3B from 10-20% of GPT-4o's performance to over 93% - using only synthetic data. We further demonstrate that the distilled planners generalize across heterogeneous robotic platforms (ground and aerial) and diverse environments (indoor and outdoor).


Method


PRISM takes as input an LLM-enanbled planner.

PRISM then generates data via a multi-step process. PRIMS first uses an LLM to synthesize environments and tasks that match the source LLM-enabled planner's observation space. PRISM then emulates partial observations via environment masking, and uses these observations to elicit actions from the source LLM-enabled planner in a closed-loop manner.

PRISM uses the resulting dataset to distill an SLM via supervised fine tuning (SFT).
Centered Image

The distilled SFT-enabled planner serves as a drop-in replacement for the source LLM-enabled planner, and can be deployed across a variety of platforms.

Results


We evaluate PRISM against three distinct LLM-enabled planners using GPT-4o (LLM). When using Llama-3.2-3B (SLM), the performance of these planners drops to 10-20% of the original performance. We then apply PRISM to this SLM, and boost performance back up to over 90% across all planners.

BibTeX

@article{ravichandran_prism,
      title={Distilling On-device Language Models for Robot Planning with Minimal Human Intervention}, 
      author={Zachary Ravichandran and Ignacio Hounie and Fernando Cladera and Alejandro Ribeiro and George J. Pappas and Vijay Kumar},
      year={2025},
      journal={arxiv preprint arxiv:2506.17486},
      url={https://arxiv.org/abs/2506.17486}, 
}