Heterogeneous Robot Collaboration in Unstructured Environments with Grounded Generative Intelligence


GRASP Lab, University of Pennsylvania1
Dept. of Electrical and Computer Engineering, Texas A&M2

SPINE-HT enables heterogenous robot collaboration in unstructured environments given language-specified missions

Abstract


Heterogeneous robot teams operating in realistic settings often must accomplish complex missions requiring collaboration and adaptation to information acquired online. Because robot teams frequently operate in unstructured environments — uncertain, open-world settings without prior maps — subtasks must be grounded in robot capabilities and the physical world. While heterogeneous teams have typically been designed for fixed specifications, generative intelligence opens the possibility of teams that can accomplish a wide range of missions described in natural language. However, current large language model (LLM)-enabled teaming methods typically assume well-structured and known environments, limiting deployment in unstructured environments. We present SPINE-HT, a framework that addresses these limitations by grounding the reasoning abilities of LLMs in the context of a heterogeneous robot team through a three-stage process. Given language specifications describing mission goals and team capabilities, an LLM generates grounded subtasks which are validated for feasibility. Subtasks are then assigned to robots based on capabilities such as traversability or perception and refined given feedback collected during online operation. In simulation experiments with closed-loop perception and control, our framework achieves nearly twice the success rate compared to prior LLM-enabled heterogeneous teaming approaches. In real-world experiments with a Clearpath Jackal, a Clearpath Husky, a Boston Dynamics Spot, and a high-altitude UAV, our method achieves an 87% success rate in missions requiring reasoning about robot capabilities and refining subtasks with online feedback.


Overview


SPINE-HT, which extends our SPINE framework for heterogeneous teaming, grounds the reasoning abilities of LLMs in the physical and semantic context of a robot team via closed-loop plan validation and mapping. SPINE-HT consists of three modules for subtask generation, subtask assignment, and subtask refinement.


Plans are composed with behaviors - high-level robot functionality. Each behavior defines preconditions which must be true in order for the behavior to be invoked.
The plan generation module uses an LLM to predict team-level plans, which are validated for realizability.
Once validated, mission-level subtasks are generated via a directed acyclic graph (DAG), which identifies independent and dependent subtasks.
Subtasks are then assigned to robots based on capabilities via linear assignment, which can optimize for traversal distance and other relevant factors.
As robots realize their subtasks, they collect feedback in the form of maps or subtask outcomes. This feedback is aggregated and sent back to the subtask generation module for plan refinement and adaptation.
We evaluate in real world and simulation environments.
On 40 missions.
Compared to existing approaches, our method is over 2x as successful while being more efficient.


Video

BibTeX

@article{ravichandran_spine_ht,
      title={Heterogeneous Robot Collaboration in Unstructured Environments with Grounded Generative Intelligence},
      author={Zachary Ravichandran and Fernando Cladera and Ankit Prabhu and Jason Hughes and Varun Murali and Camillo Taylor and George J. Pappas and Vijay Kumar},
      year={2025},
      journal={arXiv preprint arXiv:2510.26915},
}

Webpage adapted from nerfies.