DeepMind Unveils Gemini Robotics-ER 1.6 for Enhanced Physical AI

Google DeepMind has launched Gemini Robotics-ER 1.6, a powerful new foundation robotics AI model. This advanced system significantly boosts spatial reasoning and multi-view understanding, aiming to bring greater autonomy to robots.

The model offers high-level reasoning for task planning and tool integration, including native access to Google Search, vision-language-action capabilities, and user-defined functions. Key improvements enhance precision object detection and categorization, vital for tasks like parcel sorting or domestic cleaning.

Gemini Robotics-ER 1.6 excels in relational logic, allowing robots to make comparisons, identify specific objects, and understand directional commands. It also refines trajectory mapping for optimal object grasping.

Researchers have also focused on the model's ability to interpret complex visual information, such as reading gauges and instruments. This capability is crucial for operating autonomously in diverse environments like factories, warehouses, and homes.

According to Marco da Silva, VP at Boston Dynamics, enhanced instrument reading and task reasoning will enable robots like Spot to perceive, comprehend, and autonomously respond to real-world challenges.

The model achieves this accuracy through agentic vision, integrating visual reasoning with code execution. It analyzes images for fine details, uses code to estimate proportions, and then employs its reasoning engine for interpretation.