Anthropic introduces "dreaming," a system that lets AI agents learn from their own mistakes
Anthropic on Tuesday unveiled a suite of updates to its Claude Managed Agents platform at its second annual Code with Claude developer conference in San Francisco, introducing a new capability called "dreaming" that lets AI agents learn from their own past sessions and improve over time — a step toward the kind of self-correcting, self-improving AI systems that enterprises have demanded before trusting agents with production workloads.
The company also moved two previously experimental features — outcomes and multi-agent orchestration — from research preview into public beta, making them broadly available to developers building on the Claude platform. Together, the three features address what Anthropic says are the hardest problems in running AI agents at scale: keeping them accurate, helping them learn, and preventing them from becoming bottlenecks on complex, multi-step work.
Early adopters are already reporting significant results. Legal AI company Harvey saw task completion rates increase roughly 6x after implementing dreaming. Medical document review company Wisedocs cut its document review time by 50% using outcomes. And Netflix is now processing logs from hundreds of builds simultaneously using multi-agent orchestration.
The announcements come at a moment of extraordinary momentum for Anthropic. CEO Dario Amodei disclosed during a fireside chat at the conference that the company's growth has outpaced even its own aggressive internal projections.
In the first quarter of 2026, Anthropic saw what Amodei described as 80x annualized growth in revenue and usage — far exceeding the 10x annual growth the company had planned for. API volume on the Claude platform is up nearly 70x year over year, and the average developer using Claude Code now spends 20 hours per week working with the tool.
"We tried to plan very well for a world of 10x growth per year," Amodei said. "And yet we saw 80x. And so that is the reason we have had difficulties with compute."
How Anthropic's dreaming feature teaches AI agents to learn from their own history
Dreaming is the most novel of the three features and the one Anthropic is most eager to distinguish from conventional memory systems. While the company launched agent memory earlier this year — allowing Claude to retain preferences and context within and across individual sessions — dreaming works at a higher level of abstraction. It is a scheduled process that reviews an agent's past sessions and memory stores, extracts patterns across them, and curates those memories so agents improve over time. It surfaces insights that no single agent session could see on its own: recurring mistakes, workflows that multiple agents converge on independently, and preferences shared across a team of agents.
Alex Albert, who leads research product management at Anthropic, explained the concept in an interview at the conference. He described dreaming as analogous to how people within organizations create skills after working through a task. "They might do a workflow with Claude, and at the end of that workflow, after they've iterated and zigzagged a little bit, they want to record that path from A to B," Albert said. "A very similar thing is happening with dreaming — instead of you manually creating the skill from your experience working with Claude, the model is doing it, so it has that same context for a future session."
Crucially, dreaming does not modify the underlying model weights. "We're not changing the model itself through dreaming — it's not doing updates to the weights or anything like that," Albert said. Instead, the agent writes learnings as plain-text notes and structured "playbooks" that future sessions can reference, making the entire process observable and auditable by humans. When asked about the trust implications of agents consolidating their own knowledge, Albert acknowledged that "there is a level of trust that you need to place" but noted that all memories are inspectable and that smarter models are getting progressively better at managing this process. "They're learning to write better notes for their future self," he said.
A live demo showed AI agents improving overnight without human guidance
During the keynote, the Anthropic team demonstrated all three features live on stage using a fictional aerospace startup called "Lumara" that needed to autonomously land drones on the moon for resource mining. The team configured a multi-agent system with three specialists — a commander agent responsible for overall mission success, a detector agent that identified high-quality landing sites, and a navigator agent that handled safe drone flight and landing — and defined a success rubric requiring soft landings, clear ground, and enough fuel reserves for a return trip to Earth.
An initial simulation across six hypothetical landing sites produced strong but imperfect results. To improve, the presenters triggered a dreaming session directly from the Claude Developer Console. Overnight, the dreaming agent reviewed all past simulation sessions and wrote a detailed descent playbook — a comprehensive set of heuristics drawn from patterns across multiple mission runs. When the team ran a new simulation the following morning with the dreaming-derived playbook in memory, the results improved meaningfully on the sites that had previously underperformed.
"All we had to do was just have Caitlin press a button," said Angela Jiang, Head of Product for the Claude Platform, referring to her colleague on stage. "All dreaming."
The demo illustrated how the three features compose together in practice. Multi-agent orchestration split the complex task across specialists with independent context windows. Outcomes provided the rubric against which a separate grader agent evaluated each run. And dreaming extracted lessons across those runs to improve future performance — forming what Anthropic describes as a continuous improvement loop that requires no human intervention between iterations.
Why Anthropic built a separate 'grader' agent to check Claude's own work
The outcomes feature, now in public beta, gives developers a way to define what success looks like using a rubric — a structural framework, a presentation standard, a brand voice, or any other set of criteria — and then lets the agent iterate toward that standard autonomously. What makes outcomes architecturally distinctive is its separation of concerns. When an agent completes its work, a separate grader agent evaluates the output against the developer-defined rubric in its own independent context window. Because the grader operates in a fresh context, it is not influenced by the working agent's reasoning or accumulated biases from the session.
When the grader identifies gaps between the output and the rubric, it pinpoints specifically what needs to change, and the working agent takes another pass. This loop continues until the rubric criteria are met — without a human needing to review each attempt.
Albert described Anthropic's broader verification strategy as employing "more test time compute, more models thinking about a problem for longer, to check over the work of another." He acknowledged that having a model check its own work raises reasonable questions, but said a fresh context window reviewing completed work consistently outperforms asking the same long-running thread to identify its own bugs. "You will get higher success if you give that output to a fresh Claude and say, 'what bugs do you see?'" he said. "There is still something to the attention" that degrades over very long sessions — a limitation he said Anthropic is actively working to fix in future models.
The approach mirrors strategies already in use at GitHub. Mario Rodriguez, Chief Product Officer at GitHub, described during a separate talk at the conference how Copilot uses a similar advisor pattern with Claude models — pairing a smaller, cheaper model as an executor with a larger model as a mentor. When the smaller model encounters a problem beyond its capability, it calls the larger model for guidance, then continues executing on its own. Rodriguez said the approach delivers near-Opus-level intelligence at significantly lower cost, and that GitHub inserts critique models at three specific points in the coding workflow: after drafting a plan, after a complex implementation, and after writing tests but before running them.
Parallel AI agents can now tackle tasks too complex for a single model thread
Multi-agent orchestration, the third feature moving to public beta, allows a lead agent to decompose a large task into subtasks and delegate each one to a specialist agent — each with its own model, system prompt, tools, and independent context window. Every step in the process is traceable in the Claude Console, showing which agent did what, in what order, and why.
The design gives each sub-agent an isolated context, which Anthropic says produces better results than having a single agent attempt to hold all the complexity in one thread. "Each sub-agent has its own independent thread and context window," the keynote presenters explained. "This is very intentional — we found that by splitting the work and then merging the results, we get better outcomes."
Albert offered his own heuristic for when multi-agent architectures make sense versus sticking with a single thread. "Parallel agents are better for investigation," he said — situations where there is a lot of context that will ultimately be discarded. "If you're trying to answer a specific question, you don't need all the search results from the areas where it didn't find the answer. You just need the answer." He described spinning up disposable sub-agents for specific retrieval tasks and bringing only the result back to the main thread. Increasingly, he said, the model itself will decide when to parallelize. "In the future, you won't really care if it's one agent or multi-agent or whatever's happening. You just have a Claude that you're talking to, and it will deploy the right architecture automatically."
Anthropic's bigger bet: closing the gap between AI capabilities and real-world adoption
The three features arrive as part of a broader platform push that Anthropic framed throughout the conference as closing "the gap between what AI can do and what it's actually doing for people." Ami Vora, Anthropic's Chief Product Officer, set the theme in her opening keynote, noting that while model capabilities are advancing on an exponential curve, most organizations are still adopting AI on a linear path.
Dianne Penn, who leads product for Anthropic's research team, described the company's measure of progress as "task horizon" — how long an AI agent can work autonomously while improving the quality of its deliverables. "This time last year, models could work for minutes," she said. "Now, most of us have agents running for hours on end. Tomorrow, we'll have agents that are proactive, always on, and know what to work on without losing the frame."
The event also included several infrastructure announcements designed to help developers keep pace. Anthropic said it is doubling its five-hour rate limits for Pro, Max, Team, and Enterprise plans, and raising API rate limits considerably. The company announced a partnership with SpaceX to use the full capacity of its Colossus data center to expand compute availability — a direct response to the demand crunch Amodei described.
All three features are built into Claude Managed Agents, which launched in public beta on April 8 as an opinionated harness that bundles best practices including memory, tool integration, and action handling. Anthropic says teams using Managed Agents have shipped 10x faster than those building their own agent infrastructure from scratch. Albert described the platform using an operating system analogy: "With managed agents, you don't need to think about all the technicalities of how you set up the surrounding system," he said. "You're building an application for Macs — you don't want to go have to re-implement every detail of macOS."
What dreaming, outcomes, and multi-agent orchestration mean for the future of enterprise AI
The competitive implications are significant. As AI agent platforms from OpenAI, Google, and others compete for developer adoption, Anthropic is betting that production reliability — not just raw model intelligence — will determine which platform wins enterprise budgets. The dreaming feature in particular stakes out new territory: while other platforms offer memory and tool use, the idea of agents systematically reviewing their own histories to extract reusable knowledge goes further toward the kind of continuously improving systems that enterprises need before delegating high-stakes work.
The conference showcased companies already operating at that scale. Mercado Libre, Latin America's largest e-commerce platform, has 23,000 engineers running Claude Code, has reviewed more than 500,000 pull requests with human oversight, and is aiming for 90% autonomous coding by the third quarter of this year. Shopify has deployed Claude Code across not just engineering but design, product, and data science teams.
But it was Dario Amodei who articulated the most expansive vision for where all of this leads. He described a progression from single agents to multiple agents to whole organizational intelligence — from "a team of smart people in a room" to what he called "a country of geniuses in the data center." And he reiterated a prediction he made roughly a year ago: that 2026 would see the first billion-dollar company run by a single person. "Hasn't quite happened yet," he said. "But we've got seven more months."
Dreaming is available now in research preview. Outcomes and multi-agent orchestration are in public beta and available to all developers on the Claude platform. Whether seven months is enough time for a solo founder to build a billion-dollar business remains an open question — but after Tuesday, they have a few more tools to try.



