GRS: Generating Robotic Simulation Tasks from Real-World Images

We introduce GRS (Generating Robotic Simulation tasks), a system addressing real-to-sim for robotic simulations. GRS creates digital twin simulations from single RGB-D observations with solvable tasks for virtual agent training. Using vision-language models (VLMs), our pipeline operates in three stages: 1) scene comprehension with SAM2 for segmentation and object description, 2) matching objects with simulation-ready assets, and 3) generating appropriate tasks. We ensure simulation-task alignment through generated test suites and introduce a router that iteratively refines both simulation and test code. Experiments demonstrate our system's effectiveness in object correspondence and task environment generation through our novel router mechanism.
View on arXiv@article{zook2025_2410.15536, title={ GRS: Generating Robotic Simulation Tasks from Real-World Images }, author={ Alex Zook and Fan-Yun Sun and Josef Spjut and Valts Blukis and Stan Birchfield and Jonathan Tremblay }, journal={arXiv preprint arXiv:2410.15536}, year={ 2025 } }