Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


venturebeat
Ai2 releases MolmoWeb, an open-weight visual web agent with 30K human task trajectories and a full training stack

Engineers building browser agents today face a choice between closed APIs they cannot inspect and open-weight frameworks with no trained model underneath them. Ai2 is now offering a third option.The Seattle-based nonprofit behind the open-source OLMo language models and Molmo vision-language family today is releasing MolmoWeb, an open-weight visual web agent available in 4 billion and 8 billion parameter sizes.

Until now, no open-weight visual web agent shipped with the training data and pipeline needed to audit or reproduce it. MolmoWeb does.

MolmoWebMix, the accompanying dataset, includes 30,000 human task trajectories across more than 1,100 websites, 590,000 individual subtask demonstrations and 2.2 million screenshot question-answer pairs — which Ai2 describes as the largest publicly released collection of human web-task execution ever assembled."Can you go from just passively understanding images, describing them and captioning them, to actually making them take action in some environment?" Tanmay Gupta, senior research scientist at Ai2, told VentureBeat. "That is exactly what MolmoWeb is."How it works: It sees what you seeMolmoWeb operates entirely from browser screenshots. It does not parse HTML or rely on accessibility tree representations of a page. At each step it receives a task instruction, the current screenshot, a text log of previous actions and the current URL and page title. It produces a natural-language thought describing its reasoning, then executes the next browser action — clicking at screen coordinates, typing text, scrolling, navigating to a URL or switching tabs.The model is browser-agnostic. It requires only a screenshot, which means it runs against local Chrome, Safari or a hosted browser service. The hosted demo uses Browserbase, a cloud browser infrastructure startup. The dataset that makes it workThe model weights are only part of what Ai2 is releasing. MolmoWebMix, the accompanying training dataset, is the core differentiator from every other open-weight agent available today."The data basically looks like a sequence of screenshots and actions paired with instructions for what the intent behind that sequence of screenshots was," Gupta said.MolmoWebMix combines three components.Human demonstrations. Human annotators completed browsing tasks using a custom Chrome extension that recorded actions and screenshots across more than 1,100 websites. The result is 30,000 task trajectories spanning more than 590,000 individual subtask demonstrations.Synthetic trajectories. To scale beyond what human annotation alone can provide, Ai2 generated additional trajectories using text-based accessibility-tree agents — single-agent runs filtered for task success, multi-agent pipelines that decompose tasks into subgoals and deterministic navigation paths across hundreds of websites. Critically, no proprietary vision agents were used. The synthetic data came from text-only systems, not from OpenAI Operator or Anthropic's computer use API.GUI perception data. A third component trains the model to read and reason about page content directly from images. It includes more than 2.2 million screenshot question-answer pairs drawn from nearly 400 websites, covering element grounding and screenshot-based reasoning tasks."If you are able to perform a task and you're able to record a trajectory from that, you should be able to train the web agent on that trajectory to do the exact same task," Gupta said.How MolmoWeb stacks up against the competitionIn Gupta's view, there are two categories of technologies in the browser agent market.The first is API-only systems, capable but closed, with no visibility into training or architecture. OpenAI Operator, Anthropic's computer use API and Google's Gemini computer use fall into this group.

The second is open-weight models, a significantly smaller category. Browser-use, the most widely adopted open alternative, is a framework rather than a trained model. It requires developers to supply their own LLM and build the agent layer on top.MolmoWeb sits in the second category as a fully trained open-weight vision model. Ai2 reports it leads that group across four live-website benchmarks: WebVoyager, Online-Mind2Web, DeepShop and WebTailBench. According to Ai2, it also outperforms older API-based agents built on GPT-4o with accessibility tree plus screenshot input.Ai2 documents several current limitations in the release. The model makes occasional errors reading text from screenshots, drag-and-drop interactions remain unreliable and performance degrades on ambiguous or heavily constrained instructions. The model was also not trained on tasks requiring logins or financial transactions.Enterprise teams evaluating browser agents are not just choosing a model. They are deciding whether they can audit what they are running, fine-tune it on internal workflows, and avoid a per-call API dependency.

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Most enterprises can't stop stage-three AI agent threats, VentureBeat

<p>A rogue AI agent at Meta <a href="https://venturebeat.com/security/meta-rogue-ai-agent-confused-deputy-iam-identity-governance-matrix">passed every identity check and still ex [...]

Match Score: 214.58

venturebeat
Ai2’s Olmo 3 family challenges Qwen and Llama with efficient, open reason

<p>The <a href="https://allenai.org/"><u>Allen Institute for AI (Ai2)</u></a> hopes to take advantage of an increased demand for customized models and enterpris [...]

Match Score: 187.81

venturebeat
Ai2's new Olmo 3.1 extends reinforcement learning training for stron

<p>The Allen Institute for AI (Ai2) recently released what it calls its most powerful <a href="https://venturebeat.com/ai/ai2s-olmo-3-family-challenges-qwen-and-llama-with-efficient-open [...]

Match Score: 165.59

AI2's fully open web agent MolmoWeb navigates the web using only screenshots
AI2's fully open web agent MolmoWeb navigates the web using only scree

<p><img width="1538" height="866" src="https://the-decoder.com/wp-content/uploads/2026/03/YouTube-MolmoWebinAktion-447-1.jpg" class="attachment-full size-fu [...]

Match Score: 147.79

venturebeat
Google's 'Watch & Learn' framework cracks the data bottl

<p>A new framework developed by researchers at Google Cloud and DeepMind aims to address one of the key challenges of developing computer use agents (CUAs): Gathering high-quality training examp [...]

Match Score: 143.73

venturebeat
We keep talking about AI agents, but do we ever know what they are?

<p>Imagine you do two things on a Monday morning.</p><p>First, you ask a chatbot to summarize your new emails. Next, you ask an AI tool to figure out why your top competitor grew so [...]

Match Score: 140.51

venturebeat
Ai2’s Molmo 2 shows open-source models can rival proprietary giants in vi

<p>Fresh off releasing the latest version of its Olmo foundation model, the Allen Institute for AI (Ai2) launched its open-source video model, Molmo 2, on Tuesday, aiming to show that smaller, o [...]

Match Score: 140.08

venturebeat
Bolmo’s architecture unlocks efficient byte‑level LM training without s

<p>Enterprises that want tokenizer-free multilingual models are increasingly turning to byte-level language models to reduce brittleness in noisy or low-resource text. To tap into that niche — [...]

Match Score: 135.04

venturebeat
RSAC 2026 shipped five agent identity frameworks and left three critical ga

<p>“You can deceive, manipulate, and lie. That’s an inherent property of language. It’s a feature, not a flaw,” <a href="https://www.crowdstrike.com/en-us/press-releases/crowdstr [...]

Match Score: 133.93