A machine running the AI model Gemini Robotics places a basketball in a hoop.Credit: Google DeepMind
Artificial-intelligence company Google DeepMind has put a version of its most advanced large language model (LLM), Gemini, into robots. Using the model, machines can perform some tasks â such as âslam dunkingâ a miniature basketball through a desktop hoop â despite never having watched another robot do the action, says the firm.
The hope is to create machines that are intuitive to operate and can tackle a range of physical tasks, without relying on human supervision or being preprogrammed. By connecting to Geminiâs robotic models, a developer could enhance their robot so that it comprehends ânatural language and now understands the physical world in a lot more detail than before,â says Carolina Parada, who leads the Google DeepMind robotics team and is based in Boulder, Colorado.
The model known as Gemini Robotics â announced on 12 March in a blog post and technical paper â is âa small but tangible stepâ towards that goal, says Alexander Khazatsky, an AI researcher and co-founder of CollectedAI in Berkeley, California, which is focused on creating data sets to develop AI-powered robots.
Spatial awareness
A team at Google DeepMind, which is headquartered in London, started with Gemini 2.0, the firmâs most advanced vision and language model, trained by analysing patterns in huge volumes of data.
They created a specialized version of the model designed to excel at reasoning tasks involving 3D physical and spatial understanding â for example, predicting an objectâs trajectory or identifying the same part of an object in images taken from different angles.
Finally, they further trained the model on data from thousands of hours of real, remote-operated robot demonstrations. This allowed the robotic âbrainâ to implement real actions, much in the way LLMs use their learned associations to generate the next word in a sentence.
The team tested Gemini Robotics on humanoid robots and robotic arms, on tasks that came up in training and on unfamiliar activities. According to the team, robots using the model consistently outperformed state-of-the-art rivals when tested on new tasks and familiar ones in which details had been changed.
Enjoying our latest content?
Login or create an account to continue
Access the most recent journalism from Nature’s award-winning team
Explore the latest features & opinion covering groundbreaking research