The Subscription Was Doing Most of the Work

28 Apr, 2026

On June 1, 2026, GitHub Copilot will start charging customers per token used. The change is presented in the announcement post as a routine pricing update. It is not. It is the moment at which one of the largest mass-market software products of the last three years stops being a subscription and starts being a metered utility — and the reason for the change is that the prior model was losing the seller money on the most engaged users.

This post is about why subscription pricing was the load-bearing element of the AI services market, what it was hiding, and what changes once it ends.

The gym metaphor that broke

Subscription pricing works in domains where consumption variance between users is small relative to the average. A gym costs the same to operate whether a member visits twice a week or twice a year, and the heavy users don't cost more than the price they pay. Streaming services follow the same logic; cloud storage at consumer scale, mostly the same. Subscriptions implicitly average across a population, with the heavy-using minority subsidised by the light-using majority. The model is stable as long as the variance stays bounded.

Generative AI was sold using that same model from the beginning, and that turns out to have been the wrong choice in a quietly important way.

How wide the variance actually is

In October 2023 the Wall Street Journal reported that Microsoft was losing more than $20 per user per month on GitHub Copilot, with some users costing as much as $80 against a $10 subscription. The headline number was widely repeated. The number that mattered was the spread: an 8× variance between average loss and worst-case loss, on a single product, against a flat fee. Cloud storage doesn't have that variance. Music streaming doesn't have that variance. AI services do, because tokens are not bandwidth.

In early 2026, similar numbers became public from the other end of the market. Anthropic's own documentation listed average daily costs for "agentic" Claude Code use at $13 per developer per day — call it $150–$250 per month per user — against subscription tiers that capped out below those numbers. The startup Cursor, briefly the model exemplar of an AI-pure-play subscription business, was reported by The Information in January 2026 to be running at –23% gross margins, or –31% if non-paying users were counted.

If we let $S$ be the subscription price and $C (u)$ be the actual marginal cost of serving user $u$ , the population-average condition for a subscription business to break even is

S \geq 𝔼_{u} [C (u)] .

This holds, on average, for the gym. For AI products in 2025, several public data points suggest that for a non-trivial slice of users, $C (u) ≫ S$ ; and that the slice is growing because the subscription itself encourages exactly the use that drives cost.

What the subscription was hiding

The hidden subsidy was not, narrowly, the gap between price and cost. Sellers will tolerate that for a while if it buys market position. The hidden subsidy was the user's perception of free use. A monthly fee is paid once and then forgotten, and once forgotten the marginal cost of the next request feels like zero. Engineers ran agents in the background. Drafts were regenerated forty times instead of four. Long context windows were filled with logs nobody needed to read. None of this was pathological — it was the rational response to a price signal that said use as much as you want. The subscription, in other words, was not just a billing mechanism. It was producing the demand pattern that justified the infrastructure.

Sellers have been working around the resulting cost variance for some time, and the workaround is itself revealing. ChatGPT Plus does not operate on a fixed monthly quota — it operates on a system of rolling windows that shift by model, by system traffic, and by what OpenAI's documentation calls a "smart throttle" that tightens during peak hours. By March 2026 a Plus user was tracking six separate categories of cap (messages on the main model, advanced-reasoning messages, context window, file uploads, custom GPTs, image generation), each with its own reset cadence and dynamic adjustment. When a cap is hit, the user is silently routed to a smaller variant of the model rather than informed of the cost. The price stays flat; the variance is pushed into the user experience. This is a way of charging for consumption without admitting that consumption is being charged for.

Where that kind of throttling is the silent half of the management mechanism, the loud half is a steady stream of "celebration" resets. On April 28, 2026, an OpenAI representative posted in the developer forum: "I have reset Codex rate limits for ALL paid plans to celebrate a good week and allow everyone to build more with GPT-5.5." Three weeks earlier, Sam Altman had announced on X that he would reset Codex limits to celebrate Codex passing three million weekly users, and committed to repeating the gesture for every additional million users up to ten. The same April saw a new $100-per-month tier inserted between the existing $20 Plus and $200 Pro plans, carrying a launch promotion of ten times the Codex usage of Plus through May 31, plus temporary "higher five-hour" Codex bonuses on top. Each event was framed as generosity, and each was met by a wave of forum threads with names like "CODEX LIMITS — FINALLY GOOD after April 1st reset." Each also had the effect of training users to associate rate-limit changes with abundance rather than scarcity — to expect more headroom, more often, on emotional cues rather than financial ones.

This is a sophisticated piece of pricing psychology, and it points in the opposite direction of the "no longer sustainable" pivot the same companies are signalling elsewhere. The seller's right hand is migrating Codex pricing toward token-based metering — Codex switched from per-message to per-token pricing on April 2, 2026 — while the left hand is teaching users, every few weeks, that consumption has no upper bound and the seller will gladly raise it as a gift. The two motions are not contradictory if you read them as a single sequenced strategy: keep users emotionally trained to want more while quietly building the meter that will measure it. But for the user, it is a genuinely confusing signal. One day's "limit reset" arrives as a celebration; the next month's pricing announcement arrives as discipline. Both come from the same vendor.

When the pricing model changes, the demand pattern changes with it. This is the part that most coverage of AI economics in 2026 misses while arguing about whether $700 billion of hyperscaler capex will be paid back. The relevant question may not be how much demand exists but how much demand exists at marginal cost made visible. Those are different numbers.

The pivot

GitHub Copilot is not the first to make the switch — enterprise plans on most providers were already token-based — but it is the first consumer-facing service of comparable scale to do so. Anthropic moved in the same direction during April 2026, removing Claude Code from its lower subscription tier for new users and adding daily, weekly, and five-hour usage limits to remaining plans. Microsoft framed Copilot's change with unusually direct language: the prior pricing was "no longer sustainable." That phrase is rare in software pricing announcements, and worth pausing on.

Once the menu has prices on it, several second-order effects follow. Power users will reduce volume or accept the bill. Light users may cancel because they were never the target customer of the pricing model in the first place. Developers building tools on top of these APIs will rediscover an old discipline: caching, deduplication, prompt minimisation, smaller models for first-pass triage. Some of the productivity-gain studies from 2025, conducted on subsidised use, will need to be redone against actual costs. Budgets will be visible to managers who currently cannot see them.

The other half of the equation

The argument so far is one-sided. The demand-side story — variance, hidden subsidy, exposed marginal price — is real, but it is not the only thing happening. The cost side is moving in the opposite direction, fast.

Per-token inference cost for GPT-4-class capability has fallen roughly twentyfold since the original GPT-4 launched in March 2023. The current generation of small but capable models — GPT-4o-mini, Claude Haiku, Gemini Flash, Llama 3.x at 8B parameters — runs at roughly $0.10–$0.30 per million input tokens, against $30 at the original GPT-4 launch. The trend has multiple drivers running in parallel: better silicon (each new GPU generation costs roughly the same per unit but delivers more usable throughput), inference-time tricks (FP8, speculative decoding, mixture-of-experts routing), model distillation, and prompt-level optimisation. None of these has saturated.

If the cost curve continues at anything close to its 2023–2026 slope, the unit economics that look broken in mid-2026 will look fine within eighteen months — at the same retail price. The "no longer sustainable" pricing of June 2026 may simply be the moment at which subsidy ends and natural cost decline takes over. Consumer subscriptions could disappear entirely, get reborn as token bundles, or be quietly re-introduced once per-token cost falls below a threshold where average users are profitable.

Two forces are running at the same time: pricing exposure makes demand legible, and cost compression makes today's marginal cost obsolete. The honest version of the AI-economics question in 2026 is which curve moves faster.

There is also a wrinkle. Cheaper inference does not automatically translate into smaller bills. The Jevons paradox — that efficiency gains in a useful resource often increase total consumption rather than reducing it — applies cleanly here. If using a frontier model agentically becomes 10× cheaper, the rational response of a development team is to use it 20× more. Total compute consumption may rise even as per-token cost falls, and the bills delivered to users in the new metered regime may stay close to where they are now, just spread across more output. Whether that helps or hurts the seller depends on whose margins improve faster.

The structural observation

The most-cited debates about AI economics in 2026 — bubble or not, infrastructure justified or not — argue past each other in part because they share a hidden assumption. They assume that the demand pattern observed under subscription pricing is the demand pattern that will exist under metered pricing. The Copilot transition on June 1 is the cleanest test of that assumption to date. By the second half of 2026 we will have data, on a population of millions of users, on what AI usage looks like when its cost is legible request by request.

Whatever that number is, it will not be the same as the subscription number. The subscription was doing more work than it looked. Whether the cost curve gets there in time to put the work back is the open question of the next eighteen months.

Links: Microsoft cuts off some GitHub Copilot users to limit losses (Wall Street Journal) | Cursor margins reporting (The Information) | The Subprime AI Crisis Is Here (Where's Your Ed At) | AI's Economics Don't Make Sense (Where's Your Ed At) | Claude Code documentation (Anthropic)

#ai #business-models #economics #infrastructure #pricing