Google’s Gemini AI Gains Advanced Web Browser Control for Agentic Tasks

2025-10-08 CoolPal

Google’s Gemini AI Gains Advanced Web Browser Control for Agentic Tasks

Google is rolling out a significant enhancement to its Gemini AI platform, introducing a new model designed to navigate and interact with the web directly through a browser. This capability allows AI agents to operate within interfaces traditionally built for human users, marking a leap forward in automated task execution.

The model, dubbed Gemini 2.5 Computer Use, leverages advanced visual understanding and reasoning to interpret user requests and perform complex tasks, such as accurately filling out and submitting forms online. This functionality is particularly valuable for UI testing and for interacting with systems that lack direct API access.

Google highlights that this browser-only approach distinguishes Gemini 2.5 Computer Use from some competitors, which might access an entire computer environment. Despite this, Google claims its new model outperforms leading alternatives on various web and mobile benchmarks. Currently, the model supports 13 core actions, including opening web browsers, typing text, and dragging and dropping elements.

Developers can access Gemini 2.5 Computer Use via Google AI Studio and Vertex AI, with a public demo also available on Browserbase showcasing its ability to complete tasks like playing games or browsing news sites.

阅读中文版 (Read Chinese Version)

Disclaimer: This content is aggregated from public sources online. Please verify information independently. If you believe your rights have been infringed, contact us for removal.