Google LLC is reportedly developing an advanced artificial intelligence system that is designed to autonomously operate web browsers that could make its debut in December, according to The Information.
The new AI, internally known as “Project Jarvis,” is expected to enhance user productivity by automating routine tasks such as online shopping, research and booking flights.
Project Jarvis is reportedly powered by Google’s Gemini 2.0 large language model, which promises substantial improvements in understanding and generating human-like text. The AI is said by The Information to be specifically engineered for Google Chrome and includes capabilities to interpret screenshots, click buttons and input text, simulating user interactions within the browser to complete various web-based actions.
However, it is claimed that the AI takes “a few seconds” between actions. Whether the final release would have similar delays is to be seen.
The news comes less than a week after Anthropic PBC introduced new models that included a new way for models to interact with computers in public beta mode – computer use. Anthropic’s Claude Sonet model can interact with computers by moving the mouse, typing text and clicking buttons to interact with the user interface.
Athropic’s take differs from what Google is reportedly working on in that the AI can control a computer, while Project Jarvis can only access webpages within Google Chrome.
The move towards AIs that can either interact or see what’s on a computer is a growing trend within the AI space, with other companies working on similar systems, such as Microsoft with Copilot Vision. First revealed by Microsoft on Oct. 1 but not yet available, Copilot Vision can analyze the images on a webpage and answer questions about them.
Apple Inc. is also working on similar AI-driven interactions through its upcoming Apple Intelligence platform. Unlike Project Jarvis, which operates primarily through Chrome to handle tasks across the web, Apple’s approach integrates AI directly into device features like Siri, enabling contextual responses and actions based on on-screen content.
While different companies may have different takes and abilities when it comes to AI being able to interact with or analyze what’s on a screen, what is clear is that AI agents that can interact and undertake tasks are quickly becoming the next wave of AI development.