The Day Google Stopped Selling Software

By Ashwin Krishnamoorthy | A FutureLab dispatch on Google I/O 2026

May 20, 2026

A note on method

I watched google I/O 2026 this year with my gemini agent.

The agent parsed the full livestream from YouTube as it streamed, then I interviewed it through the announcements. Every number cited below was extracted directly from the keynote by Gemini reading the source. The structure of the argument is mine. The source material is, recursively, the same stack the article is about.

That detail matters more than it sounds. Most I/O 2026 recaps were written by humans skimming press releases. This one was written by a human reading an agent reading the keynote. If you want to understand what changed yesterday, the way you read the news yesterday is already part of the answer.

The Shift Hiding Behind Thirty Product Launches

Yesterday Google held its annual developer keynote. Two hours, roughly thirty announcements, the usual mix of demos and stage choreography. The live blogs treated it as a product launch. It wasn’t.

What Google actually shipped is a thesis: the era of reactive software is ending, and the era of autonomous, always-on agents has begun. Every announcement was scaffolding for that single idea. The TPUs, the model, the harness, the search redesign, Spark, Omni: these aren’t six product lines. They’re one stack, assembled end-to-end, designed to do one thing: take work off humans and put it onto agents that run continuously in the background.

The interesting question isn’t what got announced. It’s whether anyone else on Earth can ship a comparable stack in the next eighteen months. I think the answer is no, and the reason it’s no tells you what to do on Monday morning.

If you want three numbers from the keynote that capture the whole story, here they are. Each one proves something different.

3.2 quadrillion tokens processed monthly across Google’s AI surfaces.

This proves Google has successfully transitioned its existing user base from traditional search into generative AI consumption at a scale no competitor can match.

$180 to $190 billion in 2026 capex, roughly six times what they spent in 2022.

This proves the infrastructure barrier to entry for frontier AI is now structurally insurmountable for all but two or three companies on Earth.

Under $1,000 to build a working operating system using Antigravity’s swarm of 93 subagents.

This proves that the economic cost of complex, multi-day engineering work has effectively collapsed for anyone willing to architect for agents.

The rest of the article is just walking the stack that produced these numbers.

The Stack, Built Bottom-Up

Layer 1: Silicon

Start here, because everything else follows from it.

Google announced its eighth-generation TPUs and, for the first time, split the architecture in two. TPU 8t is optimised for large-scale pretraining. TPU 8i is optimised strictly for ultra-low latency inference, clocking close to 1,500 tokens per second. The two chips are stitched together by a system called Pathways, which lets a single training run span multiple geographic data centres and over a million TPUs as one virtual cluster. Pichai claimed this lets them train frontier models in weeks instead of months.

The capex behind this is the part that ends the conversation about who can compete: roughly $180 to $190 billion this year, about six times what they spent in 2022.

OpenAI and Anthropic are renting Nvidia silicon at Nvidia’s margins, on Nvidia’s roadmap, with Nvidia’s supply constraints. Google designs the chips, owns the data centres, runs them on its own power contracts, and tunes its models directly to the hardware. The cost curve underneath every other layer of the stack is structurally lower than anything a competitor can match. That isn’t a marketing claim. It’s an accounting one.

The capex does have a ceiling, and it’s not financial. It’s the physical world: local power grid capacity, advanced cooling, and municipal permitting for data centres the size of small towns. Google can outspend everyone. They cannot outspend the planet.

Layer 2: The model

On top of that silicon sits Gemini 3.5 Flash, which is the announcement most people will under-read.

The model launched yesterday at $1.50 per million input tokens and $9.00 per million output tokens, generally available immediately, everywhere.

Pichai’s framing: 3.5 Flash performs at roughly 90% of frontier quality, runs four times faster than comparable models, and costs one-third to one-half as much.

Most strikingly, on coding and agentic benchmarks (Terminal-Bench 2.1, MCP Atlas, CharXiv Reasoning) it beats Gemini 3.1 Pro, which was Google’s flagship four months ago.

Read that sentence again. The cheap, fast model now outperforms what was the flagship one quarter ago.

That is what a working data flywheel produces. Google’s own internal usage on its Antigravity platform went from half a trillion tokens per day in March to over three trillion by mid-May. The model isn’t getting better in a lab. It’s getting better in production, because Google’s own engineers are using it at a scale no third party can replicate.

The Pro versus Flash distinction is collapsing for a reason. Flash is being positioned as the workhorse for the agentic loop, where you need thousands of fast, cheap iterations and self-correction. Pro is being reserved for the genuinely hard reasoning problems where you want one careful answer at a time. That’s not a product taxonomy. It’s an admission about what AI work actually looks like in 2026: mostly swarm execution, occasionally deep thought.

A quick price comparison, since this is where most engineering leaders will run the math. At $1.50 input, 3.5 Flash drastically undercuts both GPT-4o and Claude Sonnet on the read-and-plan side of agentic work, which is the majority of token spend in long-horizon tasks. At $9 output, it’s still premium enough that high-volume code generation will accumulate cost. The undercut is on reading and reasoning. The margins are preserved on writing and acting. That pricing structure is itself an argument about where Google thinks the agentic economy is going.

Layer 3: The developer harness

The third layer is Antigravity 2.0. Desktop app, CLI, SDK, all shipping simultaneously, all co-optimised with 3.5 Flash.

The interesting architectural choice is that Antigravity is unabashedly multi-agent. Claude Code and Codex still feel like pair programmers: single-threaded conversations between one human and one model. Antigravity is built as an orchestration hub. You fire off parallel subagents (one writing code, one generating assets, one planning architecture), each running in its own containerised environment, each able to compile, test, and fail without contaminating the others. The terminal stops being a chat window and becomes a project manager’s dashboard.

The demo that 93 subagents built a working operating system for under $1,000 in inference cost is the kind of number that should make every engineering leader sit forward. It also deserves the skepticism I’ll come back to in a section.

There’s a legitimate concern with the underlying flywheel. Google’s internal codebase is famously idiosyncratic: the monorepo, Blaze build system, internal RPC frameworks, in-house everything. A model fine-tuned on three trillion tokens per day of Google-flavoured engineering may underperform on standard open-source stacks (Next.js, Vercel, Docker, the things real startups actually run). We will find out within weeks. The direction of travel, though, is unmistakable.

Layer 4: Distribution

This is where the conversation about whether Anthropic or OpenAI can “catch up” stops being a conversation.

AI Mode in Google Search, the surface that serves billions of queries every day, is now powered by 3.5 Flash by default. From yesterday. Globally. Free.

It does more than answer questions now. Search can use Antigravity to write, compile, and serve bespoke interactive applications inside the result page itself. Ask Search to plan a family weekend and it doesn’t return a bullet list; it builds you a stateful planner with map routing, calendar availability, and a shareable link. The wedding planning and fitness tracking examples were the safe demos. The implication is broader: every Search query is now a potential one-off application.

This is the announcement that should keep Vercel, Lovable, Bolt, and v0 awake at night. Those platforms build UI for developers to embed in their products. Google has skipped the developer entirely and is shipping disposable applications directly to consumers, at the moment of intent, inside the surface where intent already lives.

If a user can ask Search to build them the tool they need, the demand for traditional consumer SaaS shrinks for an entire category of lightweight workflow apps.

The distribution math here is brutal. Google has 13 products with over a billion users and 5 products with over three billion. Pure API providers and standalone agent apps have to convince a user to download something, form a new habit, and connect their data. Google injected an agent into the Search bar, Gmail, and Android for half the planet in a single afternoon. The customer acquisition cost is zero. The default behaviour for billions of people just changed.

Layer 5: The proactive agent

The fifth layer is Gemini Spark, the always-on personal agent.

Spark runs 24/7 in its own dedicated virtual machine in Google Cloud. It integrates natively with Workspace and, importantly, with third-party tools through the Model Context Protocol. You can tell it to monitor flight prices and book when they cross a threshold, close your laptop, go to sleep, and let it work. It doesn’t wait for prompts. It runs.

This is the layer that breaks the assistant metaphor. Daily Brief summarises what already happened. Gemini Live responds in real time. Spark does work while you’re not there.

The closest comparable products are OpenAI’s Operator and Anthropic’s Claude agent capabilities, but neither has the OAuth-free native integration with Gmail, Drive, Calendar, and Photos that comes from owning the underlying surface. Spark’s weakness is cross-platform desktop control outside Chrome and the Google ecosystem. Spark’s strength is that it lives inside the products three billion people already use.

The MCP support deserves a separate note. Google knows it cannot build connectors for every SaaS tool on Earth. By adopting MCP as the third-party interoperability layer, Google is choosing ecosystem over enclosure. That’s a more open posture than they’ve taken with any previous platform. It’s also a quiet bet: if MCP becomes the universal agent-to-tool protocol, Google’s distribution advantage cascades into the entire enterprise software stack.

Layer 6: The media layer

Finally, sitting alongside everything else, there’s Gemini Omni.

The demos suggested something genuinely novel: not a better video generator, but a world model with physics-aware understanding. When the demo shifted a scene from day to night, the headlights recalibrated and the shadows redrew themselves consistently. In the protein folding and black hole demonstrations, the model showed an understanding of kinetic energy, gravity, and 3D spatial consistency. It didn’t regenerate a new video. It re-simulated the state of the environment based on altered variables.

That’s the difference between Veo 3 and Omni. Veo is a camera. Omni is a physics engine you can talk to.

For most enterprises this is a curiosity. For robotics simulation, interactive education, and game asset generation where physical rules must be maintained and manipulated, it’s foundational. The decision tree is clean: use Veo 3 if you need static cinematic shots or B-roll for a final cut. Use Omni if you need an interactive simulation where altering one variable should produce a correctly recalculated state. Omni will burn through tokens rapidly. It replaces a physics engine, not just a camera.

Why This Changes Things

Six layers, all owned end-to-end, all co-designed, all shipping together. No competitor has fewer than two of these layers outsourced. That’s the actual story.

Look at who can plausibly assemble a comparable stack. Microsoft and OpenAI are the only real candidates within eighteen months. They have silicon ambitions in Azure Maia, frontier models in GPT-next, a coding harness in Copilot and Codex, and distribution through Windows, Office, and Teams. The gap is silicon maturity: Microsoft is still primarily an Nvidia customer, and Google has been designing TPUs for nine generations.

Meta has frontier models and distribution but no enterprise developer harness and no silicon independence. Apple has world-class silicon and unmatched device distribution but is years behind on frontier models and has no enterprise developer story. Anthropic builds the best model for serious engineering work and an excellent harness in Claude Code and Cowork, but is a renter on every other layer of the stack. Each of these companies has two or three of the six layers. Google has all six.

The second-order effects start to compound when you sit with this. Three are worth naming.

The cost curve for software engineering work has structurally shifted.

If a frontier-capability model runs at one-third the price of competitors, and the harness around it is built for parallel subagent execution, then the economic floor for “what’s worth building” drops dramatically. Internal tools that were never worth a developer-week become worth an agent-hour. Disposable software (built for one workflow, used for a month, discarded) becomes a coherent strategy rather than a thought experiment. Bloated SaaS subscriptions start to look like cable TV in 2014.

The distribution moat for agentic features is now permanent.

OpenAI and Anthropic will continue to win at the high end with developers and serious knowledge workers. But the mainstream agentic experience for billions of users on Earth will be a Google product by default, not by choice. That’s not a market position. That’s gravity.

The unit of “what software is” is shifting.

The agent isn’t a feature inside an app. The agent is the surface, and the app is what it generates when you ask. This isn’t a UI change. It’s a category change. Companies that sell consumer apps are about to discover that the consumer doesn’t necessarily want an app: they want the outcome the app produced. Once the outcome can be generated on demand at near-zero cost, the app itself becomes optional.

The five companies most exposed to this shift in the near term, in order: OpenAI (attacked by Antigravity and Spark on agentic ground they thought they owned), Vercel and the broader frontend SaaS layer (attacked by generative UI in Search), Anthropic (attacked on price for swarm workloads), Apple (attacked through Android XR and Spark intercepting user intent before the iPhone is unlocked), and Perplexity (whose entire utility now ships free inside AI Mode, with 24/7 monitoring agents on top).

The Honest Pushback

None of this is yet proven at production scale, and an engineering audience deserves the asterisks named clearly.

The most polished I/O demos were exactly that: polished demos. The real test of Spark is whether it can complete a long-horizon task across a messy web of broken APIs, changing DOMs, and unexpected CAPTCHAs without getting trapped in a hallucination loop. The real test of Antigravity’s swarm architecture is whether it generalises beyond Google’s internal monorepo to the chaos of a real startup’s stack. The real test of generative UI in Search is whether ordinary users discover and trust it, or whether it stays a power-user feature that never breaks through.

The 93-subagent “operating system built from scratch” demo is the kind of claim that deserves the most scrutiny. Modern operating systems are extraordinarily complex artifacts. It is highly likely the agents leaned heavily on pre-existing Linux kernels, massive open-source boilerplate, or operated in a heavily constrained sandbox. The headline number is impressive. “From scratch” is doing a lot of work.

Google’s history with agentic demos has been uneven. Project Astra two years ago looked transformative on stage and is still slowly rolling out. The original Gemini launch had famously rocky months between announcement and reliable production behaviour. There is no reason to assume Spark and Omni won’t follow a similar arc, and the gap between “scripted keynote demo” and “flawless 24/7 background execution against the live internet” is the gap where most autonomous-agent products have died so far.

Google’s structural weakness is also real. They are still entirely reliant on third-party hardware partners for the Android XR ambition (Samsung for the frames, Snapdragon for the silicon inside the glasses), which means the eyewear story depends on partners executing. And the trust gap on autonomous agents handling money, calendars, and communication is unproven: one well-publicised Spark failure (the agent that booked the wrong flight, leaked the wrong email, spent the wrong amount) could set the entire category back by a year.

I don’t think any of these are reasons to disbelieve the thesis. They’re reasons to expect a bumpier rollout than the keynote suggested, on a longer timeline. The direction is right. The pace is the open question.

What to Do on Monday Morning

If you lead an engineering team, the takeaway from yesterday is not “switch to Gemini.” It’s that the unit of engineering work has changed underneath you, whether you’ve noticed or not.

Stop treating AI as a pair programmer for individual developers. Start architecting your codebase for agentic swarms.

Your bottleneck is no longer writing code. It’s evaluation and orchestration. That means containerised environments, ironclad CI/CD pipelines, comprehensive test harnesses, and clear specifications that an agent can read and verify against. If 93 subagents can build a working OS for under $1,000, the bottleneck on your team is no longer how fast humans can type. It’s how reliably you can verify what agents produce before it ships.

Your engineers need to transition from writers of code to reviewers and orchestrators of systems. The teams that figure this out in the next six months will compound on the teams that don’t, faster than any previous developer-tooling shift in the last decade. The teams that don’t will spend 2027 trying to retrofit agent-readiness onto a codebase that was architected for human pair-programming, and they will lose that race to whoever started in May.

That, and not any single product on the I/O stage, is what changed yesterday. Google built the stack. The rest of us have to decide whether to build for it.

Ashwin K is Head of Academics at Newton School of Technology, Bangalore, and former tech lead at CRED. He writes at the intersection of engineering and education - and increasingly, at the intersection of both with AI. FutureLab by Newton School of Technology is a community of engineering leaders working through what's actually shifting - not the announcements, but the decisions underneath them.

FutureLab's Substack

Discussion about this post

Ready for more?