The Allen Institute for AI (Ai2) has unveiled MolmoWeb, an open-source visual AI agent that can autonomously operate web browsers by interpreting live screenshots and executing actions like clicking, typing, and scrolling.

Unlike other agents that rely on HTML or structured page data, MolmoWeb uses pixel coordinates to interact with websites-making it resilient to code obfuscation, dynamic JavaScript, or anti-bot measures. This human-like visual approach also simplifies debugging.

Built on Ai2’s Molmo 2 multimodal foundation, MolmoWeb comes in 4B and 8B parameter versions. Despite its compact size, the 8B model outperforms larger open-weight rivals-and even some GPT-4-based agents-on benchmarks like WebVoyager (78.2%) and TailBench (49.5%).

All weights, training data, code, and evaluation tools will be freely available for local or cloud self-hosting, empowering researchers and developers to build custom web automations.

Closed-source competitors like OpenAI’s ChatGPT Atlas and Perplexity’s Comet already offer similar capabilities, but Ai2’s release marks a major leap for open, transparent browser automation.