65

SIMA 2: A Generalist Embodied Agent for Virtual Worlds

SIMA team
Adrian Bolton
Alexander Lerchner
Alexandra Cordell
Alexandre Moufarek
Andrew Bolt
Andrew Lampinen
Anna Mitenkova
Arne Olav Hallingstad
Bojan Vujatovic
Bonnie Li
Cong Lu
Daan Wierstra
Daniel P. Sawyer
Daniel Slater
David Reichert
Davide Vercelli
Demis Hassabis
Drew A. Hudson
Duncan Williams
Ed Hirst
Fabio Pardo
Felix Hill
Frederic Besse
Hannah Openshaw
Harris Chan
Hubert Soyer
Jane X. Wang
Jeff Clune
John Agapiou
John Reid
Joseph Marino
Junkyung Kim
Karol Gregor
Kaustubh Sridhar
Kay McKinney
Laura Kampis
Lei M. Zhang
Loic Matthey
Luyu Wang
Maria Abi Raad
Maria Loks-Thompson
Martin Engelcke
Matija Kecman
Matthew Jackson
Maxime Gazeau
Ollie Purkiss
Oscar Knagg
Peter Stys
Piermaria Mendolicchio
Raia Hadsell
Rosemary Ke
Ryan Faulkner
Sarah Chakera
Satinder Singh Baveja
Shane Legg
Sheleem Kashem
Tayfun Terzi
Thomas Keck
Tim Harley
Tim Scholtes
Tyson Roberts
Volodymyr Mnih
Yulan Liu
Zhengdong Wang
Zoubin Ghahramani
Main:27 Pages
19 Figures
Bibliography:9 Pages
1 Tables
Appendix:4 Pages
Abstract

We introduce SIMA 2, a generalist embodied agent that understands and acts in a wide variety of 3D virtual worlds. Built upon a Gemini foundation model, SIMA 2 represents a significant step toward active, goal-directed interaction within an embodied environment. Unlike prior work (e.g., SIMA 1) limited to simple language commands, SIMA 2 acts as an interactive partner, capable of reasoning about high-level goals, conversing with the user, and handling complex instructions given through language and images. Across a diverse portfolio of games, SIMA 2 substantially closes the gap with human performance and demonstrates robust generalization to previously unseen environments, all while retaining the base model's core reasoning capabilities. Furthermore, we demonstrate a capacity for open-ended self-improvement: by leveraging Gemini to generate tasks and provide rewards, SIMA 2 can autonomously learn new skills from scratch in a new environment. This work validates a path toward creating versatile and continuously learning agents for both virtual and, eventually, physical worlds.

View on arXiv
Comments on this paper