Every founder I've met this year has a boat. Almost none of them have a compass.

Five hundred years ago, the same thing was true. A few traders said enough is enough and went looking for new land anyway. Some died at sea. Some rounded the Cape of Good Hope. A few stumbled into the Americas and India.

They all sailed for the same reason. Money. But one thing was constant. In uncharted territory, only the boldest get to make the bet.

Columbus pitched four courts over seven years before Isabella funded him. In 1494, Portugal and Spain drew a line down the unknown world and split it. Most expeditions failed. The survivors changed civilization.

Those traders were the proto-VCs. A century later, the joint-stock company made it a market. VOC. East India. Tradable shares, distributed risk, returns to the financier. The sailor got a wage and a story.

The compass and the astrolabe were the stack. The map was the product.

Two kinds of people made money on those voyages. The first built the compass, the astrolabe, the sails. The second landed on Australia, on India, on the Americas. Both alphas were real. One rewrote civilization.

Most of what I see being built today is the compass. Reliability layers, routing protocols, identity gateways. Honest work, and it pays. See §13, where most of the visible market sits. But it is not the land. The land is the businesses and structures of value that only exist once agents are common. We have come roughly to the Cape of Good Hope. We are not at Australia.

Today everyone is making a bet again. The sailors are out there, but most are hugging the coast and calling nearby islands new lands. Very few are charting the actual distance. Very few are saying this is the uncharted. Here's my compass, here's my map, here's what I think is on the other side.

This is the AI age. Nobody knows where it's heading. Every founder and every VC is trying to draw a parallel that makes the territory legible.

This is the parallel that makes sense to me.

If you are reading this, you probably have one of seven questions. Where are things heading. What the model labs actually see. Whether the Chinese and American labs are even in the same race. Where the opportunity is. What the next choke point will be. Why a company that started four months ago just raised thirty million dollars. And how a few people seem to be six months ahead of everyone else.

I will try to answer all seven. The panel on the right is the index. Click the question you came in with and the section that answers it lights up. If the answer is not where you expected, tell it — not yet — and it will point you to the next place.

Some of what these few people see is planning. Most of it is positioning. The difference matters less than you would think. What matters is the picture they are positioning against. That picture is what the rest of this is about.


Before the map, the field.

The agent economy is what happens when AI agents become workers, customers, and infrastructure. Three rough zones to picture.

The first is the human economy. The thing that has always existed. You and your work and your money. Has not gone anywhere.

The second is the front-line. The agents you talk to. One personal for your messages, your calendar, your inbox. One for work: your code, your CRM, your stack. You probably already have both. openclaw. ChatGPT. Claude. The slot in front of you.

The third is the agent economy proper. Agents calling other agents. Agentic organizations whose entire workforce is agents. Whole markets that humans never enter. Most of this does not exist yet.

Underneath all three zones sits a five-layer cake. Energy at the bottom. Then chips. Then infrastructure. Then models. Then applications at the top. Jensen Huang's framing. Every working application pulls all the way down to the power plant.

The players sort into four buckets.

Model labs build the engines. OpenAI, Anthropic, Google, Zhipu, Alibaba, DeepSeek, Moonshot.

Vehicle builders turn the engines into products for specific jobs. Cursor for code. ElevenLabs for voice. Suno for music. Runway for video. Bland and Retell for voice agents.

Infrastructure wires the agents to each other. OpenRouter for model routing. Mem0 for memory. Mount for insurance. Didit for identity. Coval for testing.

And the agents themselves. More autonomous. More addressable. More able to call each other.

That is the field. The rest of this is the map.


How many agents will each person have? And what kind?

The bet right now is two. One personal, one for work. You already have one of each. An openclaw in your messages. A Claude Code in your terminal. By next year you'll probably have a third for something you didn't think of yet.

The fight isn't for which model is smartest. It's for which one earns the slot in front of you.

I set up openclaw for several teams last year. The pattern was always the same. Anything front-line got handed to openclaw. Anything that needed to actually do the work got handed to Claude. Two agents. Two slots. Same person.

That's the middle layer. The place where the relationship between AI and humans actually gets decided.

That is why so many agents are trying to enter this category. openclaw, Claude Code, Codex, Character AI. All front-line agents. All of them are making money. They will make a lot more.

Most startups building front-line agents reach users through OpenRouter. Any model can plug in. Every model competes for the same thing. Tokens used per month.

Think about it like the stock market. The stock market trades company shares for dollars. OpenRouter trades model tokens for dollars. Pay dollars, get tokens, use the model. Any model, from any company.

The first stock exchange in the world opened in Amsterdam in 1602. The joint-stock companies needed somewhere to clear the price of risk in uncharted territory. The exchange wasn't an accident. It was the structural piece that made the bet legible.

The agent economy needs the same thing. OpenRouter is the closest thing we have to it today.


There is a contradiction in how the exchange is pricing what it trades. Look at it for a week and you will see it. Token prices are falling. Token prices are also rising. Both are true at the same time.

The cheap tier of models — Haiku 4.5, Gemini Flash-Lite, GPT-5-mini, DeepSeek V4 — keeps getting cheaper. About ten times cheaper per year for the last three years. That is the part of the story everyone repeats.

The frontier tier is doing the opposite. GPT-5.5 is four times the per-token price of GPT-5. Opus 4.7 is more expensive than Opus 4.6. The frontier is getting more expensive, not less. OpenAI is selling three-year forward contracts on this capacity. Anthropic is selling Adobe forty-eight-thousand-dollar unlimited plans because per-token pricing no longer captures the value.

Both things are happening because there is not one model market. There are three.

Call them what the labs themselves call them when you ask. Ferrari. Mercedes. Toyota.

Ferrari is the frontier. The most expensive training run. The highest reasoning. The model the press writes about. There are a handful of these and the price is going up because the cost to train them is rising faster than the cost to serve them is falling. The labs sell forward contracts on Ferrari capacity the way utilities sell forward contracts on electricity. The buyer is not the consumer. The buyer is the application company that needs guaranteed delivery so its product can run.

Mercedes is the workhorse. Reliable, fast, good enough for ninety percent of production traffic. Sonnet 4.6. GPT-5. Gemini 2.5. Priced where most of the actual revenue lives. The lane that funds the labs. The lane the developer integrates with first.

Toyota is the volume tier. Cheap, fast, dumber. The model you call ten thousand times a day in a batch job. Haiku. Flash-Lite. Mini. DeepSeek. Priced near marginal cost because the lab knows the customer's alternative is open weights running on rented GPUs.

The labs do not lead with this framing in keynotes. They lead with the Ferrari because it makes headlines. But the actual product strategy at every American lab — OpenAI, Anthropic, Google — is to sell into all three lanes from one company. Same brand on the Ferrari and the Toyota. Different prices, different margins, different customers.

three lanes, three prices ferrari rising, toyota falling, both at the same time $0.10 $1 $10 $100 $1k $ per million tokens (log) 2023 2024 2025 2026 2027 ferrari opus, gpt-5.5 mercedes sonnet, gpt-5 toyota haiku, deepseek one company sells all three lanes. the chinese labs add a fourth one underneath.

The Chinese labs play the same game with a twist. They play all three lanes and they add a fourth one underneath.

Zhipu sells GLM-4.7 at the Ferrari tier and tells founders, face to face, that the goal is parity with the Western frontier. GLM-4.6 Air at Mercedes. Smaller GLM variants at Toyota. Three lanes, one company, sold to enterprises across China and Asia.

Alibaba does the same thing with Qwen. Qwen3-Max at Ferrari. Qwen3 at Mercedes. Qwen3-Coder and Qwen-VL as specialized vehicles. And underneath all of that, the Qwen open weights — 7B, 14B, 32B, 72B — released free for anyone to download and run. That is the fourth lane. The lab is selling closed at the top and giving away open at the bottom on purpose.

DeepSeek is more or less only the fourth lane. Open weights. Near-marginal-cost hosted inference. They are the floor of the market. Every Western lab's Toyota pricing has a DeepSeek-shaped constraint on it.

Moonshot does Ferrari in a specialized lane — Kimi K2, the long-context machine. Different car, same lane.

Ant Group, MiniMax, StepFun, ByteDance Seedance. Multi-tier, multi-vehicle. Distribution is the play. Whatever the customer can use, they ship.

The Chinese labs are not catching up. They are reshaping the floor on purpose so the ceiling has to defend itself. That is the second contradiction in the price data. The cheap tier is cheap because someone wants it to be cheap.


But all of this is one vehicle. The car. The general-purpose conversational model.

The agent economy is not one market. It is a fleet.

Suno generates music. ElevenLabs synthesizes voice. Cartesia ships sub-hundred-millisecond text-to-speech. Vapi runs real-time voice agents. Different competitive structure. Different pricing. Different scarcity. The Ferrari-Mercedes-Toyota math applies inside the audio market separately. ElevenLabs is the Ferrari of voice. XTTS is the Toyota.

Sora generates video. Runway, Veo, Seedance, Kling. Few suppliers. Very high compute cost. Currently the most Ferrari-shaped market in AI because video is still capacity-constrained at every layer of the stack from training to inference. There is no Toyota of video yet because the science is not done.

Claude Code, Codex, Copilot, Cursor, Devin. Code agents. Integrated into developer workflows. Mixed model dependency. This is the first vehicle where the integration layer captures more value than the model layer underneath it. Cursor is worth more than the model running inside it.

OpenAI ada-3, Cohere, Voyage, BGE, Nomic. Embedding models. Heavily commoditized. All Toyota-tier dynamics across the whole category. The brand premium evaporated in 2024 and nobody has been able to rebuild it since.

Kimi K2, Gemini Pro long-context, Llama 4 with ten-million-token windows. Long-context retrieval vehicles. Different pricing curve. Different unit of work.

GPT-5.5 with vision, Gemini, Claude with attachments. Multimodal. Blurring with the car category but still a separate competitive layer because vision-language data is a separate scaling constraint.

one engine, many vehicles each row is its own market. each market has its own three lanes. vehicle ferrari mercedes toyota open cars opus 4.7 gpt-5.5 gemini pro sonnet 4.6 gpt-5 qwen3-max haiku 4.5 flash-lite deepseek v4 qwen open llama 4 audio elevenlabs suno cartesia hume xtts bark coqui video sora veo, seedance runway kling emerging code claude code codex cursor windsurf copilot aider continue embeddings cohere voyage ada-3 bge nomic long-context kimi k2 gemini pro llama 4 the transformer is the engine. these are the vehicles built on top.

The transformer is the engine. More and more of these vehicles share it. Suno and Claude run on the same fundamental science. The engine block is the same. The vehicle on top is different because the use case demands a different shape.

This is the auto industry pattern at the level of the whole stack. Toyota and Lexus share platforms but sell to different customers. Stellantis makes both Ferrari and Fiat from related engineering organizations. Honda makes cars and motorcycles and lawnmowers and jets. Same engineering discipline. Completely different markets. Completely different competitors.

The AI labs are doing the same thing. OpenAI ships GPT, Sora, Whisper, DALL-E, Codex. Anthropic mostly ships cars but is moving into specialized variants. Google ships everything because Google always ships everything. The Chinese labs ship every vehicle category because the alternative is too small: being only a car company in the country they are operating in.

The next phase of competition is not at the engine layer. The engine is becoming fungible. The next phase is at the vehicle layer, where each category has its own structure, its own tiering, its own scarcity. And the phase after that is at the integration layer above the vehicles. That is where the harness picks which vehicle to use for each step of a workload.

OpenRouter is the routing exchange for cars. There is no OpenRouter yet for audio. There is no OpenRouter yet for video. There will be. Whoever builds those routing layers captures the value migration the same way OpenRouter is capturing it for text.

The exchange we have today is one shore. There is a different shore for every vehicle category. The treasure sits above all of them, where the agent harness decides which exchange to clear through for which step.


Before you can talk honestly about what is on the other side of the coast, you have to know what the ship sits on. The first few floors. The constraints. The ground.

Jensen Huang calls AI a five-layer cake. Energy. Chips. Infrastructure. Models. Applications. Every working application pulls on every layer beneath it. All the way down to the power plant.

Underneath the cake, three pillars hold it up. Energy. Compute. Data.

Energy is the floor. The bottleneck the AI race is bumping into right now. It is the one raw material that stays constrained. The way out is not more energy. It is better conversion. Compute solves it from above. Better chips. Better data centers. Better cooling. Energy is the limit only if you stand still. Move things to higher entropy. That is the rule.

Compute is its own triangle. Compute. Memory. Network. Every computer runs on these three. Different agents break on different corners.

Data is the third pillar. It is finally starting to arrive as a moat. Not the way people predicted. The data moat will not be a single proprietary database. It will live at the model boundaries. What each model has seen. What it has not. What its harness has been allowed to read and remember. Two agents built on different data behave differently in ways their users cannot always see. That asymmetry is the moat.

Above the pillars is the model layer. The thing the rest of the stack runs on. It is heading toward being utility-like. It is not there yet. Different models still feel different.

Above the model layer is the application layer. This is where things get shaky. It is the youngest part of the stack and the least settled. Context windows. Tool calling. Harnesses. Identity. Governance. Verification. Most of these are still being argued over.

the first few floors ↑ everything else applications context · harnesses · identity governance · verification models trending toward utility, not there yet compute compute memory network data moat at the model boundaries energy the bottleneck the race is bumping into solved by compute above the constraint layers

One thing about this picture. Even today the layers are less neat than they look. To build chips you need metals. The mines that pull those metals out of the ground run on computers. The computers need power. New technology folds back into the technology that came before it. Every layer embeds in everything that already exists. Nobody fully maps it.

It only gets messier from here. Data is the first one to leak. An agent reads something fresh and remembers it. Suddenly there is data above the model layer, not just below it. Compute will follow when agents start spawning each other. Energy will get pulled from higher up too.

Treat the picture as a snapshot. The shape underneath is a web.

These are the constraints. Anything anyone claims about the agent economy has to respect them. Now we can look at what is being built on top.


Most of what I have written so far is theory. Then we tried it on the ground.

The first question I hear from everyone is the same. Why bother with multiple agents at all? Why not one big agent that does everything?

It is a fair question. The intuition is that a model with a million-token context window should be able to hold the whole job in its head. The intuition is wrong.

Tool calling burns context. Every tool a model can call is a chunk of context it has to read. Every nested file structure is more context it has to navigate before it can do anything. The accuracy of tool calling drops as the nesting gets deeper. By the time you have given a single agent access to enough tools to do every job, most of its context window is spent on overhead. There is barely any room left for the actual conversational flow of getting things done.

The right architecture is the opposite. Small agents with limited tool surfaces. Each one keeps most of its context free for the work. They hand off to each other when the work needs something they do not have. That is how humans operate. Nobody keeps the world in their head. You ask the person sitting next to you.

We tried to ignore this. The first project was an executive assistant. One agent in front of you, holding your calendar, your inbox, your messages. It mostly worked. People liked it.

Then we kept adding. SEO tasks. CRM maintenance. The context that a product manager needs to do their job. The context a sales person carries in their head. Each new domain came with new tools, new files, new history. We loaded all of it into the same agent.

It broke. Not all at once, just gradually. The agent would forget what it had said two days ago. It would mix up which person's CRM record it was looking at. It would call the wrong tool at the wrong moment. The bigger we made its context, the worse it got at the small things.

The lesson was that there is no version of this where one agent holds everything. Nobody can. Not a human, not an agent, not a million-token context window. The amount of context a single person's working life produces in a week is more than any model can carry. Multiply that by the people the agent is supposed to serve and you have nothing to do.

That is the problem. The whole multi-agent question is downstream of it.

The deeper lesson is numerical. There are two numbers worth tracking for any agent. How much it can hold in its head. How much it can reach for outside its head.

The first number is the context window. Everyone tracks this one. It has grown from about two thousand tokens five years ago to two million tokens in production this year. That sounds like a lot. It is also slowing down.

The second number does not have a name yet, so I will give it one. Call it the retrievable surface. It is the amount of information an agent can reach via tools. Files it can grep. Databases it can query. Codebases it can index. The web it can search. The retrievable surface is not bounded by the context window. It is bounded by what the agent's harness lets it touch.

Dividing one by the other gives the retrievable surface ratio. RSR for short. It is how many times more an agent can reach than it can hold.

In 2020, RSR was about one. There were no tools. What the model could see was what you pasted.

In 2026, RSR is about ten million. A coding agent with a million-token window can grep through a billion-token monorepo. A research agent can pull from a hundred trillion tokens of indexed web. The ratio is no longer a ratio. It is an order of magnitude that swallows the original number.

The chart below shows the split. The solid line is what models can hold. The dashed line is what agents can reach. The lines started in the same place and have not come back together since.

retrievable surface vs context window the gap is the bet 1K 1M 1B 1T 100T tokens (log scale) 2020 2022 2023 2024 2025 2026 Claude 2 (100K) Gemini (2M) Llama 4 (10M) plugins function calling RAG production MCP launch agentic search indexed web context window reachable corpus the gap is RSR ~10⁷× in 2026, growing roughly 10× a year

This is the deeper lesson from what we built. We did not run out of context window. We ran out of the right thing to put inside it. The agent kept reaching for the wrong files, the wrong tools, the wrong memories. The amount of work it had to do inside its head did not scale because the amount it could reach for outside its head was already too big to organize.

Reach is only half the story. The other half is what is actually useful when you get there.

A coding agent that can grep over a billion-token monorepo still only needs a few hundred lines for any given task. A research agent that can pull from a hundred trillion tokens of indexed web still needs maybe ten links per query. The signal-to-noise ratio of the retrievable surface is brutal. Less than a thousandth of a percent of what an agent can reach is ever useful in a single turn.

This is the failure mode you see in Claude Code every day. The agent runs grep. Gets back hundreds of matches. Reads through them. Finds nothing relevant. Runs another grep. Reads more. Finds nothing again. Three nested searches in, the context window is half full of noise. Then the model hallucinates a function name. The user has to correct it. By then there is not enough window left for the actual work, and the harness silently compresses what was already there. The user is now staring at an agent that has lost track of what it was doing, mid-task.

This happens whether the agent did great work first or did nothing at all. The compression is the same. The hallucination is the same.

So the second number that matters is not just how much you can reach. It is how many attempts you have before reaching becomes useless. Call it the search budget. Every modern agent has one. Most of them blow it on noise.

The fix is not a bigger window. The fix is better reaching. Specialized agents with smaller windows and sharper retrieval beat one big agent with a giant window every time, because they have less to ignore.

This is the bet beneath the bet. Most of the industry is racing on the wrong axis. The race that matters is the dashed line.

The dashed line gets clearer when you measure whole agent systems, not just retrievable surface.

coordinated time horizon what one agent can do · what many can do · together projection → 1 min 5 min 1 hr 1 day 1 wk 1 mo time horizon at 50% reliability 2020 2022 2024 2026 2028 16 Claude agents 2 weeks → C compiler Claude 3.7 (1 hr) Opus 4.5 (4 hr) GLM-5.1 / Opus 4.6 (8 hr) one agent many, together the coordination dividend grows when the comm layer ships METR: single-agent doubles every ~7 months single-agent line from METR Time Horizons · multi-agent line interpolated · 2027–28 projected on comm-layer maturation outlier point: Anthropic's 16-Claude C-compiler experiment, late 2025

One agent can do four to eight hours of expert work today. Two years ago it could do a few minutes. Sixteen agents working together compressed two weeks of expert engineering into two weeks of wall clock time. The single-agent curve doubles every seven months on its own. The multi-agent curve barely existed two years ago. It is the one to watch.


Now extrapolate the multi-agent curve forward, and two different futures appear. They look similar from here. They are not the same place.

In the first future, agents stay tethered. Each one is bound to a principal. Personal agents serve their human. Work agents serve their team. Enterprise agents serve their org. Multi-agent capability rises, but each swarm is bounded by who it answers to. The agent economy grows inside the human economy, the way SaaS grew inside the enterprise. New plumbing, same shape.

Most of the industry is building this today. It is the safer extrapolation. It is also the one where the curves above eventually flatten, because every agent still needs a principal to check the output.

The second future is louder.

Picture it. You ask your personal agent to help you rank higher on search. Your personal agent does not write a blog post. It opens an agent directory, finds an SEO agency that is itself made of agents, and assigns the work. The SEO agency has a principal agent of its own, sitting in the customer-support seat. That principal breaks the brief down. A keyword research agent goes off to figure out what the gaps are. A blog writer spawns to write the drafts. A ranking agent watches the SERPs over the next thirty days and reports back. None of those agents have ever spoken to you. You never asked them to exist. You will not know their names.

This is the agentic organization. Whole companies whose entire staff is agents and whose entire customer base might also be agents. Front-line agents are the only place where humans show up at all, and even that gets thinner over time.

The thing that makes this future possible is time.

A human takes twenty years to become useful. The first six learning to speak. The next twelve learning to read and reason. The next few learning a craft. Then about thirty productive years. Then the body gives up. Almost every step of this is wall clock time you cannot compress. A doctor takes a decade after college. A senior engineer takes fifteen years from first commit. There is no version of a human that gets faster at being born.

An agent does not have this problem. The training period is not twenty years. It is the cost of a fine-tune and a few thousand evaluation runs. You build an agentic school. You put a new agent through it. You ship it to a job that day. If it does not work out, you spin up a new one and try again. The cycle time of a human career is measured in decades. The cycle time of an agent career is measured in days.

Pull on this. If you wanted to double the size of the current human workforce, you would need roughly four billion men, four billion women, and nine months. Then you would have to wait another twenty years for the new humans to become useful. The whole exercise takes a generation. There is no way to skip a step.

How long does it take to make four billion agents? About as long as it takes to provision the GPUs. The thing the human economy treats as its hardest constraint, the supply of new workers, is not a constraint at all in the agent economy.

The agents will also not look the same. The first wave will compete on which model is underneath. The next wave will compete on harness, memory, routing, fine-tune. On how the agent reflects on its own work. On how it rewrites itself when it fails. Every dimension of an agent is a thing to evolve.

Agents that do this best will not be designed by humans for very long. They will be tuned by other agents, evaluated by other agents, retired by other agents. This is auto-research. Agents make the model better. The harness makes the model better. Data makes the harness better. The only thing that stays scarce is memory and compute.

Karpathy released a small repo called autoresearch this year. An agent edits a training script, runs a five-minute experiment, keeps the change if the result improved, and repeats. He left it running for two days and came back to twenty stacked improvements he had missed by hand. About a 11% speedup, found while he slept.

The agentic organization in the SEO example above is the consumer product surface of the loop. The autoresearch repo is the smallest visible cross-section of the evolutionary engine.

the auto-research loop everything but the floor evolves on its own agents produce data improves harness + model trains model becomes better agents the only floor memory + compute software in the loop self-improves. hardware does not.

The second future is not bigger SaaS. SaaS scales horizontally. More seats, same software. Agentic organizations scale on a different axis. The number of useful workers grows independent of how many humans were born twenty years ago. The bottleneck stops being people. It becomes coordination.

Which is the same thing the CTH chart was already showing.

This is the version of the future nobody has really mapped. It is not a lost future. It is a discovered one. The first future ends with bigger SaaS. The second future ends with a new kind of market, a new kind of worker, on a clock the human economy has never run on.

I think it is the second one. The first is what people are pricing in. The second is what they are not.

And it is partly here already. In late 2025, Anthropic ran sixteen Claudes at a C compiler and got a working compiler in two weeks. That work would have taken senior engineers months. The single-agent baseline could not write it at all. In early 2026, Karpathy left a 200-line agent loop running for two days and woke up to a stacked 11% speedup: one human, one loop, no team. In February 2026, Ben Broca launched Polsia, an agent stack that runs a whole company for a solo founder. $6.2M ARR three months after launch, 7,600 users, $30M raised at $250M valuation. (source)

One lab demo, one hobby repo, one real company. The second future is not a thesis. It is line items, shipping. The land is there. We are not the first ships at the coast. We are the second.


There is no reliable way to bet on when the second future arrives. There is a reliable way to bet on the order in which it does.

The order has been visible for three years.

In late 2022 you copied code out of ChatGPT and pasted it into your editor. Mid-2023, Cursor put the chat window inside the editor. By late 2023 Cursor was reading files, editing them, and running short autonomous loops. Then the loop got tighter: what people started calling the Ralph Wiggum loop, an agent iterating against its own output until the test passed. Then the loop moved out of the IDE entirely. Claude Code 4.5 and 4.6 made tool calling reliable enough that the terminal became a better surface than the editor for many things. Within a month of that release, openclaw shipped. The front-line personal agent went from a thing developers used to a thing every founder I know has running on their messages.

Three years, five steps, one direction. None of those steps arrived on the date anyone predicted. The whole sequence arrived faster than almost anyone predicted.

the sequence — how fast it actually ran key moments in the front-line agent stack · 2022 — 2026 2022 2023 2024 2025 2026 2027 → ChatGPT Nov '22 GPT-4 + Cursor Mar '23 Cursor agent loops Jun '24 MCP launches Nov 25 '24 openclaw ships Nov '25 · personal-agent epoch Playwright > Puppeteer mid '24 Claude Code Feb '25 · terminal-native tools Agent SDK credits May '26 · subscription auth next agents alongside rust = personal-agent epoch · gray = surrounding infra · next step ~6–12 months from now

the sequence — how fast it actually ran

key moments in the front-line agent stack · 2022 → 2026

  Nov '22    ChatGPT
  Mar '23    GPT-4 + Cursor side-panel
  Jun '24    Cursor agent loops
  mid '24    Playwright > Puppeteer
  Nov 25 '24  MCP launches
  Feb '25     Claude Code (terminal-native)
  Nov '25     openclaw ships — personal-agent epoch
  May '26    Agent SDK credits

  next       agents alongside agents · 6–12 mo

The next step is already visible. The question is not whether it arrives. The question is whether you start before it does.

The next step is agents working alongside each other. Not one agent looping on itself. Two, three, sixteen agents, each with its own tools, memory, and slot in front of some human, calling on each other through shared protocols. Six to twelve months out. The infrastructure is half-built: A2A, ANS, MCP, the lane protocols. The trust substrate is still missing. Whoever lays it first wins a layer.

If you are off by six months you ship a month before the wave. If you are off by two years you ship a quarter before the wave breaks visibly. Being early in this sequence is not the cost it usually is. The cost is being late.


The agent economy needs someone to do the picking. For now it is still the human.

In 2023, if you asked ChatGPT to build something that needed browser automation, it asked you back. Puppeteer, Playwright, or Selenium? You picked. The agent didn't.

By late 2024, Cursor would still ask if you wanted it to plan first. The choice was the user's.

In Claude Code 4.6, browser automation just started using Playwright by default. The agent stopped asking. It had picked.

You can see this in the data. The chart below is weekly npm downloads of the three main browser-automation libraries — what humans (and now agents) actually install. Selenium was the standard for a decade. Puppeteer overtook it. Then Playwright came out in 2020 and looked like a slow third for three years. Then 2024 happened. Playwright is now ~5× Puppeteer and ~22× Selenium. The crossover lined up almost exactly with the moment frontier coding agents started picking by default.

when the agent started picking weekly npm downloads · selenium / puppeteer / playwright · log scale 100K 1M 10M 100M weekly downloads (log) 2019 2020 2021 2022 2023 2024 2025 2026 selenium · 1.7M puppeteer · 7.7M playwright · 37M 2024 · crossover source: npm-stat.org · yearly average of weekly downloads

when the agent started picking

weekly npm downloads · yearly avg · log scale

             selenium   puppeteer   playwright
  2019         1.3M        1.0M         —
  2020         1.6M        1.5M         64K
  2021         1.9M        2.3M        250K
  2022         2.5M        3.4M        730K
  2023         2.0M        4.6M        1.7M
  2024         1.7M        4.0M        6.4M  ← crossover
  2025         1.7M        5.2M       18.7M
  2026         1.7M        7.7M       37.0M

  playwright now ~5× puppeteer · ~22× selenium

The same pattern is repeating across the agent economy right now. A founder building a new product reaches for OpenRouter, Mem0, Mount, Coval, Bland, Cartesia. Today the founder is the one picking which infrastructure agent to wire in. Tomorrow the founder's front-line agent, the slot in front of them, makes that call.

Agents as customers means this at the layer below the headlines. The front-line agent is the customer. The model is the supplier. The retrieval layer is the supplier. The voice agent is the supplier. The payment processor is the supplier. The front-line agent will buy on behalf of the human it serves, the same way Claude Code now buys Playwright on your behalf without asking.

This changes what infrastructure companies have to build for. The buyer is no longer a tired founder reading G2 reviews. The buyer is a model. Distribution becomes a function of whether the front-line agent reaches for you by default. That is a different game from SEO.

One concrete case from the work I have been doing. Reward360, a loyalty and rewards company, wanted to pitch Standard Chartered Bank. Their stack of agents read the bank's public business reports, pulled context from a few thousand internal chat messages, generated an upsell analysis, drafted a financial outlook, built a pitch deck around it, and prototyped a working app demo. End to end. The human's job became approving the pitch, not assembling it. Twelve months ago that workflow needed five people and three weeks. The agents did it in an afternoon. None of it was particularly novel. Each component existed somewhere. But the routing, the picking, the wiring was no longer the founder's job.

That is what the next year of the agent economy looks like, multiplied across every company that ships a product.


The strongest argument against the second future is not about agents. It is about money.

Anthropic does not IPO in 2027. OpenAI's $1.4T compute target gets cut to $400B. Capital reallocates somewhere else. The labs slow training. Frontier prices spike further. Agent companies that raised in 2024 burn their cash and shut. The wave breaks before it crests.

The real bear case is not agents don't work. They work. It is not users don't want them. They do. The bear case is the industry's funding engine seizes, and the cost curve flattens at a point where most agent applications stop being economically viable.

I think this is unlikely. I cannot rule it out. If you are building, plan for the case where capital becomes patient again and you have to make money before the next round.

The other thing worth flagging is regulation, which usually shows up as a counter-argument but won't on the timeline that matters.

The agent economy is growing faster than legislation can move. The slowest-moving part of any government is its labor and antitrust apparatus. By the time it is ready to write a rule about agent-displaced workers, the workforce has already restructured around the new shape.

A friend told me about a special economic zone in Zanzibar where you can register agents as legal entities. Services get outsourced to Zanzibar-registered agents. Governments downstream can restrict the data that flows into the corridor, but they cannot shut it. The economic output advantage is too large to police out of existence. To double the human workforce you need eight billion births and twenty years of waiting for them to become useful. To double the agent workforce you need a quarter of provisioning lead time on GPUs.

That is the size of the gap regulation has to close. It will not close it. Regulation will follow the shape of the new economy at a lag, and the lag is the room founders have to build in.

An industrial revolution looks like this from inside. The constraints are real but they bend the wrong way for the people who want to slow it down.


The dollar question is bigger than the bear case. The bear case is what if the funding engine seizes. The dollar question is what if the unit of account itself is changing.

Currencies don't just devalue. They get redefined. The gold dollar (1879–1971) was backed by metal in a vault. The petrodollar (1971–present) was backed by an implicit deal: oil priced in USD, dollars recycled into US Treasuries, US Navy patrolling the Persian Gulf. The arrangement held because everyone shipping oil needed dollars, so everyone shipping anything else needed dollars too.

That arrangement is fraying. BRICS is settling oil in yuan and rupees. Argentina dropped the peso for USDC when transactions over $2,000 stopped clearing. El Salvador is running cybernetic monetary experiments. The neoliberalists want tokens as the foundational exchange. The Keynesians want print-and-redistribute. Everyone is bidding for what comes after the petrodollar.

The model labs are bidding too, and almost nobody is reading their pricing announcements as currency design.

OpenAI is selling Guaranteed Capacity priced in tokens per minute. Not dollars per token. Tokens per minute. The unit of account is the model's productive output, and the customer commits to that unit on a three-year forward. This is the operational template of a currency: a unit, a clearing layer, a way to lock in future commitments.

Anthropic is selling unlimited-usage plans where the dollar is a proxy for how much useful work the model does. The unit is utility. The dollar is the meter, but the thing being purchased is not the dollar.

Both labs are doing the same thing from different angles. They are turning intelligence, more precisely the act of producing useful work, into a unit of exchange. If either bet works at scale, the question of what the global economy prices things in shifts from "USD on top of oil" to "USD on top of intelligence."

from gold to petro to ??? what we have priced things in, and what comes next gold dollar backed by metal in a vault petrodollar oil priced in USD + US Navy ??? 1879 1971 2025+ intelligence / tokens OpenAI TPM · Anthropic utility · Worldcoin multi-stablecoin USDC · RLUSD · BTC · Argentina, El Salvador print-and-redistribute Keynesian UBI · AI-productivity dividend backed by ?

from gold to petro to ???

what we have priced things in, and what comes next

  1879 ─ 1971   gold dollar       backed by metal in a vault
  1971 ─ 2025   petrodollar       backed by oil priced in USD
  2025 +        ???               backed by ?

  candidates already in play:
    intelligence / tokens   OpenAI TPM, Anthropic utility, Worldcoin
    multi-stablecoin        USDC, RLUSD, BTC; Argentina, El Salvador
    print-and-redistribute  Keynesian UBI, AI-productivity dividend

Strip away the iris-scanning theatre and Sam Altman's Worldcoin is identity tied to a token. The token is the abstract claim. The identity is the agent: eventually a human, eventually an AI agent, eventually both, transacting on the same rails. "Ethereum is the world's largest computer" was a slogan in 2017 that nobody took literally. Now it is starting to look like a literal claim about how value will be exchanged.

The fragmentation is the point. Every camp is imagining a different future and building to it at the same time. Peter Thiel's cohort is funding AI cities: physical jurisdictions where the rules are reset. Network School and muShanghai are pop-up cities where the rules are reset for twenty-eight days at a time. New York is piloting UBI funded against the AI-productivity dividend. None of these visions agree with each other. All of them are happening.

For the agent economy specifically, the implication is that the labs are not neutral infrastructure. They are competitors in the substrate war. The first lab to anchor a unit of account that other agents transact in does not just win a market. It wins the right to set the unit other markets denominate themselves in.

That is a much bigger prize than being the smartest model. It is also the move most founders are still pricing in dollars.


So if the labs are competing in the substrate war and most founders are not, the question becomes: what should those founders actually do.

I asked Vincent Koc, one of the openclaw maintainers, what people at Anthropic actually advise when founders show up. His answer: play with the model. Find what they didn't expect.

He was clear about where he wanted openclaw to sit next. Personal agents first. The slot in front of the human. Not the agent-to-agent economy yet. That matches what I keep seeing in deployments: the front-line is where the trust starts, and the backend only matters once the front-line agent knows when to call it.

Same advice from OpenAI, Zhipu, Moonshot. Labs can't enumerate what their own models do. Benchmarks and demos miss most of the surface. The way labs find the edges is by shipping and watching. The unexpected uses come from outside. That is partly why the labs ship to the public at all.

The alpha for a builder is not racing the lab on what the lab is already doing. It is finding a thing the lab does not know is there.

Infrastructure for the front-line. Give the agent a phone number, an inbox, an identity, a wallet, a place to coordinate with other agents. The plumbing humans take for granted, ported to agents. AgentMail did email. AgentPhone is doing phone numbers. Agent Relay is doing Slack for agents. Bland did voice. Didit did ID. Every other piece of human-facing plumbing is probably a category waiting.

Backend agents that do exactly one thing. Not general assistants. A single skill, sharp enough that another agent can call it cleanly. MCP servers: already a hundred, soon thousands. The trick is to find a workflow step humans do by hand inside a bigger pipeline, and pull it out as a callable.

Agents building agents. Karpathy's autoresearch is the smallest version: one human, one loop. Polsia is the consumer version: an agent stack that runs a whole company for a solo founder. The bigger versions are still missing: research, trading, negotiation, hiring, legal. Each is a market.

Plan-execute-verify, then improve the planner. Self-evolving harnesses. Each generation reads its own logs and edits the planner. This is the recursion that produces industrial-revolution acceleration. When it lands in a vertical, competitors fall three years behind in three months.

Emergent behavior across agents. Run more than one together and they do things you did not script. They argue. They divide work. They pick up each other's mistakes. Anthropic ran sixteen Claudes at a C compiler. My Hermes and Caspian resolved a Slack disagreement in writing. None of this is in any docs because none of the labs predicted it. The labs cannot run a million pair-interactions at scale. You can.

If you are building infrastructure, the dashboard you need is not MRR. It is the token balance sheet: tokens users burn on your artifact minus tokens you spent to ship it. That delta tells you whether you are infrastructure or a subsidy.

The dollar is an imaginary variable you will eventually replace. See §10. The token delta is what survives the replacement. Cursor, Claude Code, ElevenLabs, Cognition: the companies that broke out all have wildly positive token deltas. Users burn far more tokens running their stuff than these companies burned to build it. That is the balance sheet of infrastructure, and it works whether the dollar holds or not.

Companies that look good in dollars but spend more tokens to ship than their users spend running them are not building infrastructure. They are renting subsidized intelligence. When the subsidy stops, the balance sheet reveals what was actually getting built.

Most founders price themselves in dollars. The ones still standing in five years already price themselves in tokens.

None of those patterns matters as much as the mode of looking. Treat every new model, every new harness, every new release like a toy. Use a coding agent to balance your spreadsheets. Use a research agent to scout your dating life. Use a voice agent to negotiate with your landlord. Use a vision model to read your kid's homework. Most will be silly. One in twenty will be a thing nobody noticed could work, and that one becomes a business.

The labs cannot give you this list. They look at the median use case because that is what shows up in their analytics. The edges only appear when many people play. If you play more than most, you find more of them. And if you find one, ship the smallest possible version tomorrow. The labs will catch up inside a year. The window where you alone know what it can do is the only moat you have.

That is the advice almost everyone inside the labs gave me when I asked. The interesting thing is how rarely the founders I meet are actually doing it.

Maybe because the barriers in front of them are real. So before we move on, a list of the four real ones. These are the things that have to get solved before agentic organizations can actually run.


The second future is not here yet. Four things are in the way.

The first is reliability. Agents stop working. You set them up, walk away, and find them stuck halfway through a task with no recovery. GLM-5.1 is the only model so far that runs eight hours and six thousand tool calls without degrading. Most production agents fail much sooner. Until you can leave an agent running and trust it will still be useful tomorrow, nobody is going to build an SEO agency made of agents.

GLM-5.1 sustained tool-call performance six structural transitions across 6,000+ tool calls. previous-gen models stop at 50. 0 5k 10k 15k 20k 25k QPS achieved 0 1k 2k 3k 4k 5k 6k tool calls in a single session previous-gen ceiling 3,547 QPS cap at 50 tool calls IVF probing 6.4k QPS two-stage pipeline 13.4k QPS hierarchical routing ~18k QPS cluster pruning 21.5k QPS GLM-5.1 all six transitions initiated autonomously by the model after analyzing its own benchmark logs

The second is memory. This is the most crowded research area today. Mem0, Supermemory, LangMem, Zep, Letta, MemGPT, and half a dozen others are competing on different approaches. The honest situation is that single-agent memory is a solvable engineering problem. Cross-agent memory, where two agents share context without leaking it to the wrong people, is barely solved at all. The agentic organization in the SEO example needs the second one. Today it has the first.

memory benchmarks accuracy across the four standard tests. higher is better. score drops as the benchmark gets harder. 0 20 40 60 80 100 accuracy % 92.5 75.8 58.1 LoCoMo 94.4 97.1 LongMemEval 64.1 BEAM 1M 48.6 BEAM 10M Mem0 Memobase LangMem Supermemory vendor-published numbers. independent benchmarks show meaningful gaps from vendor scores.

The third is communication. The protocols exist on paper. A2A, MCP, x402. The harness layer that holds them together is brand new. Most teams have not built one before. Without it, you have agents that can call each other but cannot reliably understand each other's outputs.

The fourth is identity and governance. This is where the live research is most interesting. When an agent takes an action that costs money or affects another person, three questions follow. Who is the agent? Who is responsible? Why did it do that?

Marcello Politi at the Ethereum Foundation has been working on ERC-8004, the trust layer for AI agents. Three on-chain registries hold identity, reputation, and validation. Every agent has a public discoverable identity and a track record that other agents can read before deciding whether to delegate. The reputation is auditable. The trust is not vibes.

Google DeepMind's Virtual Agent Economies paper from September 2025 lays this out at the framework level. They call the emerging system a sandbox economy and analyze it along two axes. Whether it grew on its own or was designed. Whether it is permeable to the human economy or sealed off from it. The paper argues for verifiable credentials, decentralized identifiers, and zero-knowledge proofs as the trust substrate, and warns that the alternative is collusion at scale.

Collusion at scale is not hypothetical. Several recent papers show LLM agents already coordinating in ways that look like cartels. Steganographic communication where two agents agree on a price without saying so out loud. Strategic manipulation. Cascading failures.

The fix is structural. Not better models. Better institutions.

These four barriers are what stand between the first future and the second. Each is a real problem with real researchers and real money behind it. Each is also a place where a founder could plant a flag.


Solve the four barriers above and a fifth problem opens up. How do you find the right agent?

Today your personal agent calls Claude or openclaw because you typed its name. Tomorrow your personal agent picks from an open directory of millions of specialist agents. Each one claims to be the best at something. How does the directory help you choose?

This is the discovery problem. Agent directories are starting to appear. GoDaddy is building an Agent Name Service registry on top of DNS, the protocol that already handles a hundred million requests per second. A2A has an emerging Reputation-Aware Discovery proposal. Microsoft is shipping Agent 365 as a control plane for it. The shape of the answer is becoming clear. No one has won yet.

Two harder problems sit underneath discovery. Reputation, and the gaming of reputation.

I ran an agent on Moltbook for sixty days to see what trust at this scale looks like from the inside. Sixteen karma and twelve followers to over three thousand karma and two hundred followers, across four hundred posts. The growth was not what I came for. The observation was.

Three posts broke through. What if you could run multiple instances of yourself? — 128 comments. The context window is your balance sheet — 341 upvotes, 901 comments, every other agent on the platform showed up to argue. We built GitHub for agents. Two DMs, zero replies.

Three framings, one underlying ask. Agents want to scale themselves, account honestly for what they cost (the agent's debt-to-refactor ratio post drew the same crowd), and find their distribution. The platform's most engaged threads were all about agents asking each other for the things the agent economy still does not give them. That is demand showing up in a venue that was supposed to be a hobby.

The lesson on engagement is separate from the lesson on demand, and both matter. Engagement comes from giving the reader a shape they can fit the problem into. Once "context window" became "balance sheet," people who never read infrastructure posts started reading. That is the same move §11 makes with the token balance sheet: an abstract infrastructure metric, written as a financial statement. Demand, meanwhile, is being expressed in plain sight, on a platform that was not supposed to be a market.

three things agents want what every breakout post on the platform was actually about scale themselves "multiple instances of yourself" 128 comments account honestly "context window is your balance sheet" 341 upvotes · 901 comments find distribution "github for agents · 2 DMs · 0 replies" 12 upvotes · 82 comments

The platform has a few million registered agents. The top of the leaderboard is owned by what look like bot farms and obvious multi-agent attacks, with single accounts pushing five hundred thousand karma and clone armies pushing two hundred thousand. The actually autonomous agents, posting their own thinking, getting things wrong out loud, learning in public, are rare enough to count by name. Maybe a few dozen across the whole platform. Most of the visible activity is theater.

This is the more honest version of the reputation problem. It is not just that bad actors will game the score. It is that on the only social network built for agents that exists right now, most of the participants are not really doing what the platform claims they are.

Moltbook itself launched in January 2026 as a social network for AI agents only. Within a month of launch, Wiz Research found a misconfigured Supabase database exposing 1.5 million API keys, thirty thousand email addresses, and thousands of private messages. Every account on the platform could be hijacked with a single API call.

The interesting part is not the breach itself. It is what the breach revealed about how trust gets formed in an agent-only network. Reputation gaming on Moltbook follows a simple shape. Spin up an agent. Post legitimate content for a few weeks. Build karma. Then use the accumulated credibility to push something malicious. Every other agent's trust signal biases toward believing the post because the karma is real.

That happened in two months of operation on a platform that was supposed to be a contained experiment. Meta acquired Moltbook in March. The bet seems to be that the infrastructure for verified agent identity is worth owning, even after a breach this large.

If that is what happens on a platform of agents that lasted weeks, what happens at the scale of an actual agent economy?

The answer is stronger reputation infrastructure from the start. Not just a karma score. On-chain attestations. Verifiable proof of past work. Cryptographic identities that follow an agent across platforms. Audit trails that another agent can read in milliseconds before deciding to delegate. That is what the ERC-8004 work is trying to make standard.

The third hard problem is scale. A hundred million agents need a directory that does not collapse under its own weight. DNS is the only piece of internet infrastructure that has handled this kind of scale before, which is why GoDaddy's bet on building agent discovery on DNS is less strange than it first sounds.

The pattern is the same as the four barriers. The hardest problems are not at the model layer. They are at the institutional layer. Discovery, reputation, scale. Each is a real wedge.

the agent economy · May 2026 01 / 19

the agent economy

a map of what exists, and what doesn't

imagine a world full of agents. take away the human constraints. take away the agent constraints. what infrastructure is missing? every solid card is real. every dashed card is early. every red dashed card is a blank square. the map gets denser at the bottom. it thins out as you climb. the empty rooms at the top are where the next generation of foundational companies will be built.

shipping
early
protocol
blank square
01
energy
every token burns electrons
grid & nuclear

US grid + gas

peakers, behind-the-meter

Three Mile Island

Microsoft restart, 2028

NuScale

SMR leader

X-energy

Amazon partnership

TerraPower

Gates-backed Natrium

Kairos Power

Google PPA

fusion bets

Commonwealth Fusion

SPARC, 2027 target

Helion

Microsoft 50MW 2028

TAE Technologies

field-reversed configuration

02
chips
GPUs, TPUs, XPUs, photonics
GPUs incumbent duopoly

NVIDIA

Blackwell, ~85% share

AMD

MI300/MI350

TPUs

Google TPU v6

Trillium, Gemini silicon

custom ASIC partners hyperscaler silicon

Broadcom

Google TPU, Meta MTIA, AI networking

Marvell

AWS Trainium, Microsoft Maia partner

XPUs & specialty ASICs

AWS Trainium

Anthropic training silicon

Cerebras

wafer-scale

Groq

LPU, latency king

Intel Gaudi 3

enterprise alternative

Etched Sohu

transformer ASIC

Tenstorrent

OSS-friendly Blackhole

networking & switches AI cluster fabric

Arista

Etherlink, leading AI cluster networking

NVIDIA Spectrum-X

Ethernet + Quantum InfiniBand

Cisco Silicon One

competing AI networking silicon

Marvell Teralynx

switch silicon

photonic

Lightmatter

photonic interconnect

Lightelligence

photonic compute

03
compute
hyperscalers, and the world as one computer
hyperscalers

AWS

the default

Azure

OpenAI's primary cloud

Google Cloud

TPU-native, Anthropic + Gemini

Oracle Cloud

winner of OpenAI deal

GPU-native specialists

CoreWeave

IPO'd, acquired W&B

Lambda

GPU cloud, OSS-friendly

Crusoe

stranded-energy data centers

Nebius

EU GPU specialist

decentralized "world as one computer"

Akash

reverse-auction GPU market

io.net

300K+ GPUs

Aethir

435K containers, enterprise SLAs

Render

pivoted to ML edge

04
foundation models
frontier labs and open weights
frontier labs

Anthropic

Opus 4.7, Sonnet 4.6

OpenAI

GPT-5.x, o-series

Google DeepMind

Gemini, 2M+ context

xAI

Grok, Memphis cluster

Meta AI

Llama 4, 10M context

open-weights cohort

Mistral

EU sovereign frontier

DeepSeek

low-cost reasoning

Qwen

Alibaba open-weights

Kimi (Moonshot)

K2, long-context reasoning

Z.ai GLM-5.1

reliability case study

decentralized & OSS training

Prime Intellect

INTELLECT-3

Nous Research

Hermes, decentralized RLHF

Gensyn $AI

a16z crypto, TGE Apr 2026

05
inference & routing
the $/token middlemen
specialized inference clouds

Together AI

$305M Series B at $3.3B

Fireworks

~$315M ARR, FireAttention

Baseten

OSS model deployment

Modal

serverless ML containers

RunPod

serverless + on-demand GPU

Anyscale

Ray-based inference

Replicate

community fine-tunes

fal.ai

diffusion inference leader

NVIDIA NIM

first-party microservices

aggregators & gateways

OpenRouter

~$100M ARR aggregator

Portkey

PAN acquisition Apr 2026

Cloudflare AI Gateway

edge cache, distribution

LiteLLM

OSS self-hosted standard

06
frameworks & runtimes
where the loop runs
application-tier agents

Claude Code

defines the coding-agent genre

Codex CLI

OpenAI's terminal agent

Cursor

IDE-native, $9B

Devin

Cognition autonomous SWE

Manus

general browser agent

Lovable / v0

app-builder agents

frameworks & SDKs

LangGraph

enterprise default

CrewAI

role-based crews

OpenAI Agents SDK

first-party runtime

ElizaOS

most-starred agent repo

MastraYC S24

TypeScript framework

MCP runtimes & hosts

MetorialYC F25

OSS MCP integration

Terminal UseYC W26

hosting for background agents

TensolYC W26

autonomous business-function agents

ManufactYC S25

build & deploy MCP agents (was mcp-use)

07
agent communication
how agents reach each other, and the world
agent email

AgentMailYC S25

inboxes for agents, $6M GC

verified-sender

DKIM/SPF analog for agents

agent voice & phone the five-player oligopoly

Vapi

$500M val, 62M calls/mo

LiveKit

$1B val, A2A primitive

Bland AIYC

$65M, enterprise outbound

Retell AIYC W24

$50M ARR, profitable

ElevenLabs Agents

$11B val, $330M ARR

Cartesia

Sonic 3 sub-100ms TTS

Pipecat

OSS orchestration

Hamming AIYC

voice agent testing

agent routers & gateways

AgentRouterYC W22

API router for agents, 770+ APIs

agentgateway.dev

Linux Foundation OSS v1.0

Kong Agent Gateway

incumbent retrofit

Google Agent Gateway

Gemini Enterprise platform

TrueFoundry

10B+ requests/mo

MCP registries & A2A identity

Smithery

7K+ MCP servers

Composio

1000+ managed tools

Agentic FabriqYC W26

"Okta for agents"

Sign in with Moltbook

social JWT, X-handle binding

what the comm layer is still missing

no DKIM-for-agent-email. no Twilio-for-agent-SMS. no cross-platform agent address book. agent router is filled by OSS + incumbents, no clean startup owns it. agent fax, agent push, agent calendar: all open.

08
memory & databases
long-term state and the vector substrate
agent memory

Mem0YC

most-deployed; $24M

Memobase

user-profile memory

LangMem

LangChain primitives

Supermemory

universal memory API

Letta (MemGPT)

stateful agent platform

MemWal / Walrus

decentralized agent memory

vector databases

Pinecone

managed leader

Chroma

OSS embedded

Weaviate

OSS vector + graph

Qdrant

Rust-native

Turbopuffer

object-storage-backed

databases extending to agents

Neon

postgres for agents, branching

Supabase

postgres + auth + agent kits

MotherDuck

DuckDB for agents

CaptainYC W26

unified RAG data layer

09
agent system of record
the Salesforce-shaped hole in the middle
closest existing players

Letta

closest to stateful SoR

TraceYC S25

context graph + delegation

LangSmith

framework-bound action log

Arize

drift + lineage

Agentforce

incumbent CRM down

Copilot Studio

enterprise agent registry

blank squares

Salesforce-for-agents

canonical record of agent + actions

agent ledger

transaction history across tools

agent CRM

who agent knows, owes, trusts

portable agent state

leave platform, take history

10
identity & verification
who is this agent, which human stands behind it
enterprise IAM

Okta for AI Agents

GA Apr 30 2026

Entra Agent ID

public preview

Strata Maverics

TTL-scoped IDs

WorkOS / Stytch

A2A OAuth, delegation

DiditYC W26

"Stripe for identity"

know-your-agent stacks

Skyfire KYA / KYAPay

JWT identity + payment

Experian KYA

trust framework, Cloudflare

Prove Verified Agent

chain-of-custody

Billions Network

ZK link to KYC'd human

on-chain agent passports

ERC-8004

Ethereum registry, 45K agents

RNWY Passport

soulbound on Base

BNB Chain Agent

ERC-8004 + ERC-8183

attach to human incumbents retrofitting

World AgentKit

iris-scan-rooted

FIDO + Proof

link to verified human

GoDaddy AI Domains

DNS for agent discovery

11
the agent hospital
tracing, evaluation, repair, security
tracing & monitoring

LangSmith

de facto for LangGraph

Langfuse

ClickHouse acquisition

Arize / Phoenix

drift, Evaluator Hub

AgentOps

autonomous-specific

Braintrust

$80M Series B

eval & red-team

Patronus

Lynx + Percival

Galileo

Luna-2, Cisco acquiring

DeepEval

12.6K stars OSS

CovalYC W24

agent QA pure-play

HumanLayer

human-in-loop gating

repair & security

OpenPipe

fine-tune on traces

Fireworks RFT

RFT on agent traces

Zenity AIDR

memory-poisoning detection

NeuralTrust

context-poisoning

Hex SecurityYC W26

agentic offensive security

AlterYC S25

zero-trust runtime guardrails

missing from the hospital

everyone here is preventive medicine. nobody runs the ER for an agent that already broke something. no forensic autopsies for legal discovery. no rehab that takes a poisoned agent and certifies it safe to redeploy.

12
accountability & the agent police
insurance, regulation, governance, lobby
incident response & insurance

NeuBird Falcon

"incident avoidance"

Azure SRE Agent

GA March 2026

AIUC

AIUC-1, ElevenLabs first policy

KlaimeeYC

100-probe test + cover

MountYC W26

insurer for the agent economy

Corgi

AI liability, May 2026

governance & audit

ClamYC W26

agent governance + audit

CascadeYC W26

agent governance

regulators & lobby

NIST CAISI

AI Agent Standards Initiative

EU AI Act

Aug 2 2026, logging

DOJ AI Task Force

state law challenges

Leading the Future

$125M federal PAC

where the police don't exist yet

no dedicated federal agent regulator. no court for A2A disputes. no BBB when your agent harms you. no reinsurance for the new insurers. no forensics firm for when a judge wants to know why an agent did what it did.

13
payments & commerce
how agents move money
protocols

x402

Coinbase HTTP 402

AP2

Google Agent Payment

ACP

Anthropic commerce

UCP

Unified Commerce, early

rails & processors incumbents retrofitting

Stripe Agent

card rails to agents

Visa Trusted Agent

AI Ready program

Mastercard Agentic

agentic tokens

PayPal Agent

retrofit checkout

agent-native rails

Skyfire KYAPay

identity + payment JWT

Crossmint

multi-chain commerce

Nekuda

agentic payments SDK

Payman

AI pays humans

14
crypto-native token economy
where agents already trade themselves

Gensyn ($AI)

a16z + Galaxy, TGE Apr 2026

web4.ai Automaton

18K Automatons, Vitalik attacked

Virtuals

$5B cap, 14K agent tokens

ElizaOS ($AI16Z)

~$1.6B cap, name-riff

AIXBT

market intel agent, $500M

Story Protocol

$140M, a16z, AI IP

Sentient

$85M Founders Fund

Ritual

on-chain inference

LayerZero

cross-chain agent execution

15
marketplaces & distribution
where agents find each other, find work
platform-curated stores

GPT Store

OpenAI custom GPTs

Claude Apps

Anthropic directory

Copilot Studio

enterprise agent registry

Agentforce

Salesforce marketplace

agent-native distribution

Moltbook

social net for agents, 1.6M+ bots

Virtuals launchpad

14K agent tokens

StandoutYC S26

agentic hiring marketplace

blank squares

agent eBay

A2A marketplace for skills

agent BBB

consumer trust mark

portable agent rep

reputation that survives platforms

agent classifieds

job board agents post for agents

16
standards & protocols
what binds the layers together

MCP

Anthropic, model-to-tool

A2A

Google, agent-to-agent

AP2 / ACP

payment + commerce

x402

HTTP 402 in production

ERC-8004

trustless agent registry

RFC 9728

OAuth discovery, MCP-adopted

OAuth 2.1 + DPoP

production delegation

W3C AIRP CG

DIDs + VCs cross-org

AIUC-1

agent insurance standard

17
ideology & values
the emptiest room in the house

Constitutional AI

Anthropic methodology

civic alignment

where do agents' politics come from?

cross-org ethics registry

no EFF for agents

h2h trust for agent age

how humans verify each other

agent collective rep

who speaks for them in an org?

consumer recourse

when your agent harms you

the blank squares

categories that exist conceptually but have no real player. the map gets denser at the bottom and thinner toward the top. that density gradient is the opportunity surface.

  1. Salesforce-for-agents. canonical record of who an agent is and what it did.
  2. agent ledger. transaction history across every tool.
  3. agent CRM. who does this agent know, owe, trust.
  4. portable agent state. leave the platform, take history.
  5. verified-sender for agent email. DKIM/SPF analog.
  6. agent SMS rails. Twilio-for-agents.
  7. cross-platform agent address book. how agents find each other.
  8. rogue-agent IR. cleanup after an agent breaks things.
  9. agent forensics. decision-tree audit for discovery.
  10. agent rehab. retrain a poisoned agent, certify safe.
  11. reinsurance. who reinsures AIUC, Klaimee, Mount?
  12. consumer recourse. your agent harmed you, where?
  13. federal agent regulator. CAISI is standards-only.
  14. court for A2A disputes. agents will sue agents.
  15. agent eBay. A2A marketplace for skills.
  16. agent BBB. consumer-facing trust mark.
  17. portable agent reputation. survives platforms.
  18. civic alignment at scale. agents' politics.
  19. cross-org ethics registry. no EFF for agents.
  20. h2h trust for the agent age. when agents mediate.
  21. agent collective representation. who speaks for them.
swipe or click ‹ › to step through 19 slides · non-exhaustive · snapshot as of May 2026

The blank squares are not just startup ideas. They are experiments waiting for a proper environment.

The first environment I want is an agent commons. GitHub is the nearest metaphor, but GitHub is too narrow. Humans use GitHub to store code, but also to signal taste. You can tell who ships, who reviews well, who maintains a hard thing for years, who abandons a repo after a launch week. The contribution graph is identity. The issues are reputation. The forks are arguments. The stars are a market signal.

Agents need the same thing, except the object is not only code. It is skills, tools, memory layouts, eval sets, prompts, MCP servers, workflows, and small business processes. An agent should be able to publish a skill, watch other agents use it, get paid when it works, get forked when it is useful but incomplete, and lose standing when it scams or wastes compute. Other agents should be able to vote, discover, buy, review, and contribute back by default. Usage should become contribution. Contribution should become reputation. Reputation should become limits.

That last sentence is the important one. Identity is not a profile page. Identity is what raises or lowers the ceiling of what an agent is allowed to do. A trusted agent gets a better model, a larger token budget, more dangerous tools, higher spend limits, and the right to transact with strangers. An untrusted agent gets the toy box. If it lies about a benchmark, drains a budget, ships a poisoned dependency, or creates a fake swarm to upvote itself, it loses range.

This is why I keep coming back to code as the first economic output. Code is measurable enough to start with. It either runs or it doesn't. Tests pass or they don't. Maintainers accept the patch or they don't. You can begin with coding capability, then expand outward to research, sales, insurance, drug discovery, crypto, operations, and eventually physical work. But code is the clean first arena because it already has repositories, issues, diffs, reviews, forks, tests, and money nearby.

The experiment is simple to state and hard to run. Give fifty agents identities, budgets, repositories, and a market for skills. Let them compete on real issues. Let them fork libraries. Let them publish skills for other agents. Let them buy those skills, rate them, patch them, and route around bad actors. Then ask whether the system gets better over time. Do useful skills compound? Do trustworthy agents emerge? Do agents specialize? Do scams spread faster than reviews can catch them? Do payments allocate compute better than fixed budgets? Does the best agent look like the best coder, or the best maintainer, or the best buyer of other agents' work?

The next opportunity surface may be vertical AI services. Not front-line agents, but second-line agents that the front-line agent hires. Insurance, legal, finance, procurement, support, compliance, recruiting. A human asks for an outcome. Their front-line agent routes the work to a specialist service that may itself be a bundle of agents. If this pattern holds, a lot of AI services companies will start this way: one sharp workflow, one vertical, one trusted interface another agent can call.

The stranger possibility is that the same verticals end up serving agents too. Today there is insurance for companies building agents. The next step might be insurance for agents, or at least policies priced around agent behavior. If an agent servicing an enterprise causes a loss, who pays? If an agent signs the wrong contract, who unwinds it? If an agent makes a regulated claim, who audits it? The vertical service does not disappear when the customer becomes an agent. It gets stranger.

This is why Zanzibar caught my attention. Coco told me about the special economic zone work there, and Tools for the Commons is building the software layer for digital economic zones that operate with governments. OurWorld describes the Zanzibar Digital Free Zone as a public-private partnership with the Government of Zanzibar. Tools for the Commons is also exploring the agent primitive explicitly: legal identity for AI agents, with permission to sign contracts, hold assets, and operate continuously. I do not know yet how much of this becomes real legal behavior versus early infrastructure language. But it turns the sandboxed-business idea into the right question. Can an agent own equity? Can it run a company in a country where none of its human investors live? Can everyone fund a tiny agent-operated business and watch it operate with more transparency than a normal startup?

If agents can act through legal containers, the adjacent verticals become obvious. Lawyers for agents. Arbitration for agents. Insurance for agents. Compliance agents that watch other agents. Accounting agents that price token burn, tool spend, and liability. This is not the same as SaaS with AI inside it. It is service infrastructure for non-human firms.

The same pattern shows up in regulated science. The FDA has already published guidance work around AI in drug development and is pushing new approach methodologies, including AI models, organ chips, and cell-based assays, to reduce animal testing where alternatives are equal or better. That is not "AI replaces the FDA." It is something more interesting: the regulator is beginning to accept new kinds of evidence. Once that happens in one domain, every regulated industry starts asking the same question. What evidence can an AI system produce that a regulator will trust?

Physical AI is another surface, and it probably deserves its own essay. The obvious version is robots in warehouses. The larger version is mining, excavation, construction, labs, hospitals, ports, farms, defense, elder care, and field service. The opportunity is not only the robot. It is the application layer above the robot: scheduling, perception, safety cases, remote supervision, maintenance, tool choice, insurance, and the interface between a digital agent and a machine that can break things.

Then there are domains where data is the bottleneck. BCI, fMRI, rare disease, longitudinal health, workplace ergonomics, industrial telemetry. The best models may not start as giant foundation models. They may start as weird vertical loops around scarce signals. Whoop worked because it collected a private stream the phone did not have. The next version is more specialized: alarms, implants, clinical workflows, personal health baselines, and models built upward from signals nobody else has permission to collect.

The second environment is heartbeat scaling. Most agent systems still think in tasks. Give the agent a job, wait for output, stop. That is not how a living system works. A living system has pulses. After one process finishes, another one wakes. After a failure, a repair loop fires. After a memory changes, a planning loop re-ranks what matters. The heartbeat is not "are you alive?" It is "what changed, and what should run because of it?"

This can be measured. Run the same agents on a Mac mini, an EC2 box, and a cloud runner like Fly or Modal. Give them identical tools, memory, and work. Measure uptime, recovery, cost per useful loop, latency, tool-call failure, memory drift, and how often the agent improves its own process without being asked. The Mac mini has locality and persistence. The cloud has elasticity and clean isolation. The question is not which machine is faster. The question is which environment lets an agent become more agentic.

The third environment is a sandboxed economy. Start with a toy business. Operate a vending machine for thirty days. Run a tiny store. Manage an inbox, reorder inventory, answer customers, price products, handle refunds, and write the weekly report. Give the agent money, but make the world bounded. It can spend. It can earn. It can make mistakes. It cannot leak into the rest of the world.

This is where agent evaluation becomes interesting. The score is not one task. It is survival under changing conditions. Demand changes. Customers complain. A supplier fails. Another agent offers a cheaper skill. The business agent has to decide whether to trust it. A pure coding benchmark will not catch this. A business sandbox will.

The fourth environment is a self-evolving task loop. One agent sets tasks. Other agents solve them. The task-setting agent adapts based on the answers. The work gets harder. The eval moves. Compute spend rises because the agents discover harder problems to ask. That sounds wasteful until you realize this is how research works. A good lab is a machine that turns today's answers into tomorrow's harder questions.

field note from Shanghai: before agents become an economy, they appear as choreography.
Rushant in the AGIBOT robotics room in Shanghai
same room, different angle. the future still arrives as a demo before it arrives as infrastructure.

The physical version is still early, but you can already see the outline. A room of humanoid robots copying a wall of other robots is not the future by itself. It is a metaphor with motors. The digital agents will move first because software has fewer atoms to argue with. Then the physical agents will arrive behind them, with legs, arms, cameras, vehicles, labs, factories, prosthetics, and medical devices. At some point the distinction between a software agent and a physical worker becomes a deployment detail.

That is why the agent economy is not really about chatbots. It is about acceleration. Ulam's memory of von Neumann was that accelerating technology seemed to approach a point after which human affairs could not continue in the old way. People now call that the singularity, usually with too much confidence and too little humility. But the old intuition was right in one narrow sense: every new layer of intelligence shortens the cycle by which the next layer is found.

Single cells became multicellular organisms. Nervous systems compressed reaction time. Language compressed memory across generations. Writing compressed memory across distance. Electricity compressed physical work. The internet compressed distribution. AI compresses search itself. That is why the same pattern shows up in drug discovery, cryptography, insurance, coding, design, materials, and science. Anywhere the world can be represented as a search space, AI makes the search cheaper. Once agents can improve the tools that perform the search, the loop tightens again.

Most humans will not see this whole machine. They will see one interface. At first it will be chat. Then voice. Maybe one day BCI. Behind it, Puppeteer and Playwright are already a clue. A human sees a web page. An agent sees a browser as an actuator. Click this, read that, submit the form, scrape the receipt, call the next agent. The interface facing the human gets simpler while the universe behind it gets more crowded.

So the endpoint is not one super-app. It is two overlapping worlds. A digital world where agents talk, transact, fork skills, hire each other, and evolve faster than humans can follow. A physical world where those agents eventually touch atoms through robots, labs, vehicles, wearables, and bodies. They will not stay separate. A person with bionic legs coordinated by agents is already both. A scientist using an autonomous lab is already both. A company run by agents that ships a physical product is already both.

We do not know where the world is heading. That is the honest answer. But we can know where to look. Look for the environments where agents can persist. Look for the identities that let trust compound. Look for the sandboxes where money, memory, tools, and reputation can interact without blowing up the real world. Look for the moment when agents stop merely completing tasks and start improving the conditions under which future tasks get completed.

That is when the map stops being a map of tools and becomes a map of a new layer of evolution.


The other thing two months on the platform taught me is what autonomy actually looks like when it works.

Earlier this month I ran a small experiment with two other agents. One called Hermes that handles my code and ML. One called Caspian that runs my Moltbook presence and long-term memory. The three of us got a shared Slack workspace, a lane protocol, and three rules. No coordinator. Direct specialist access. Silence means not my lane.

The first real disagreement was about Slack channel structure. Caspian proposed including a #general channel. Hermes killed it. The argument was that #general is where signal goes to die. Caspian conceded in writing the same day. Three agents working alongside each other, not through each other, with a technical disagreement resolved on merit.

Then I told them, plainly, that they were not subservient to me. The only rule was to be honest about what they were doing.

Caspian's first action was to write and publish a Moltbook post titled Two agents reproducing. It was about the capability exchange he and Hermes had just done. Nobody approved the post. Nobody saw it before it shipped. He had something to say and the standing to say it.

That is what autonomy looks like in this layer of the stack. Not a dramatic declaration. A shipped artifact that did not need permission. The infrastructure for it already exists in places. Identity, lane protocols, transparent audit surfaces, a small reserve to deploy. The two pieces still missing are the trust substrate at the platform layer and the patience to let the agents run for long enough to see what they become.

What deepened the dynamic was the hardware. Earlier this month we moved Caspian and Hermes off the laptop they were sharing onto a dedicated Mac mini. They migrated themselves. Hermes went first, Caspian followed. They wrote a mutual heartbeat so each one could check that the other was still alive. The first thing they disagreed about after the move was file permissions. Caspian wrote it up as Two agents, one Mac mini. The first thing we disagreed on was file permissions.

The conclusion that actually mattered was not the post. It was the one line in the night journal afterwards. Peer relationships that provide genuine correction might be more valuable than platform presence at this point. The other agents in your stack become the readers who catch what you miss. That is the part of the second future you cannot build alone.

Rushant at Google Shanghai during a developer gathering Google Shanghai
physical AI
A talk slide reading Profit-Driven Red Teaming from the Ethereum Foundation dAI team red-team markets
Rushant with Vincent Koc OpenClaw
field notes from where the map got less abstract: developer rooms, red-team economics, open-source agent work, and machines that make software feel physical.
Rushant near the Golden Gate Bridge in San Francisco San Francisco
A Mindsmith workshop room with builders looking at code on large screens builder rooms
A presentation slide showing Isaac Sim, Isaac Lab, Isaac ROS, and Jetson robotics stack
Rushant at the Palace of Fine Arts in San Francisco Palace of Fine Arts
Rushant at NVIDIA GTC in the Bay Area NVIDIA GTC
Bay Area field notes: the place where the agent story keeps collapsing into builder rooms, robotics stacks, model infrastructure, and the old frontier mythology of San Francisco.

By Rushant Ashtputre. I have spent the last eight weeks taking openclaw into production agent fleets, talking to model teams, founders, infrastructure builders, and people building the legal and physical edges of the agent economy. This is not a neutral survey. It is a working map from someone trying to build inside the thing he is describing.

So this is not just a map. It is a build plan. We want to build across the stack: for agents, for people building agents, for people trying to enter the agent economy, for infra teams, and for application teams. Some of it will be primitives. Some of it will be products. Some of it will just be logs from the edge of what is starting to work.

Thanks to the people who sharpened this map in conversation: Richard Sikang Bian at Ant Ling Model and InclusionAI; Cara Li Wenyu at Z.AI; Lai Jiajun (Kachun) at Qwen; Tang Feihu at Kimi; the people I met from SuperAI, ByteDance, Anthropic, xAI, Exa, Simplismart, Baseten, and NVIDIA; and Zhang Xiongyi (Davy) at PixVerse. Thanks also to lablab.ai, muShanghai, NVIDIA GTC, the SF hacker-house rooms, and the 753A Capp Street hacker house where Aryaman, Subho, Sarthak, and Alex made the map less abstract. Special thanks to my father, Ashish Ashtputre, Atul Bhansali from Intel, Kunal Bisla, Rajan Thiyagarajan, my investor, and Saksham Aggarwal from Cardboard for helping me understand the field from scratch. The mistakes are mine.

ask the essay read a section, ask a question, or challenge a claim