Seeing, Saying, Solving: An LLM-to-TL Framework for Cooperative Robots

Increased robot deployment, such as in warehousing, has revealed a need for seamless collaboration among heterogeneous robot teams to resolve unforeseen conflicts. To address this challenge, we propose a novel, decentralized framework for robots to request and provide help. The framework begins with robots detecting conflicts using a Vision Language Model (VLM), then reasoning over whether help is needed. If so, it crafts and broadcasts a natural language (NL) help request using a Large Language Model (LLM). Potential helper robots reason over the request and offer help (if able), along with information about impact to their current tasks. Helper reasoning is implemented via an LLM grounded in Signal Temporal Logic (STL) using a Backus-Naur Form (BNF) grammar to guarantee syntactically valid NL-to-STL translations, which are then solved as a Mixed Integer Linear Program (MILP). Finally, the requester robot chooses a helper by reasoning over impact on the overall system. We evaluate our system via experiments considering different strategies for choosing a helper, and find that a requester robot can minimize overall time impact on the system by considering multiple help offers versus simple heuristics (e.g., selecting the nearest robot to help).
View on arXiv@article{choe2025_2505.13376, title={ Seeing, Saying, Solving: An LLM-to-TL Framework for Cooperative Robots }, author={ Dan BW Choe and Sundhar Vinodh Sangeetha and Steven Emanuel and Chih-Yuan Chiu and Samuel Coogan and Shreyas Kousik }, journal={arXiv preprint arXiv:2505.13376}, year={ 2025 } }