GitHub Copilot AI Credits and the Real Cost of Agents

On June 1, 2026, GitHub moved Copilot's active billing model over to usage-based AI Credits, replacing premium request units across the plans, and a lot of developers woke up to a product that felt different from the one they paid for the day before. The flat monthly fee is still there. What changed is that the meter is now visible. Instead of counting premium requests, every plan ships with a monthly allotment of GitHub AI Credits, one credit is worth a cent, and your usage gets calculated from the tokens an interaction consumes. Input, output, and cached context, priced per model, converted into credits, drawn down against your budget. Existing annual Pro and Pro+ subscribers are the exception, they stay on the legacy request-based pricing until their term runs out.

The reaction was about what you'd expect. The announcement discussion filled with hundreds of angry comments. Developers posted screenshots of allocations draining in hours. The Register found people already planning their exit, including one who said they'd burn their Pro+ credits in a week and then route the rest of the month through OpenRouter inside the same editor. The projections that went around ranged from mildly annoying to apocalyptic, some developers floated bills jumping from fifty dollars to three thousand. GitHub hasn't verified those numbers, and to be fair, the worst of them describe workflows that were already wasteful. But the anger is real, and I don't think it's misplaced. I just think half of it is pointed at the wrong thing, and the other half is pointed at exactly the right one.

Because here's what I keep coming back to. The sticker price didn't move. What moved is that the marginal cost of heavy usage is now exposed, and the meter didn't make agentic coding expensive. Agentic coding was always expensive. The meter just stopped hiding it. That part is honest. What Microsoft did around the meter is a different story, and that part is worth being angry about.

For the last year and change, Copilot ran on a useful fiction. You paid a flat fee, the assistant sat in your editor, and the actual economics of running large model inference stayed behind the curtain. That worked beautifully when Copilot was mostly autocomplete. Autocomplete is cheap and it's predictable. You type, it suggests, the marginal cost of any single suggestion is small enough that nobody felt exposed to it. The abstraction held because the thing underneath the abstraction was small.

Agentic coding broke that. When you tell an assistant to go fix a bug, you experience that as one request. The system experiences something very different. It reads multiple files, builds context, generates a patch, re-evaluates the errors, tries again, and hands you back output that's many times larger than the prompt you typed. One instruction from you, a long chain of token-consuming operations underneath. The cost was never tied to how much you typed. It was tied to how much the model did on your behalf, and with an agent, that's a number you mostly can't see while it's happening.

So the flat subscription wasn't really pricing the work. It was averaging it. The light users subsidized the heavy ones, and GitHub ate the variance, and everyone got to pretend the dev loop was a fixed-cost tool instead of a metered cloud resource. That's a fine arrangement right up until the variance gets too big to eat. Frontier-model agents doing multi-file work across a whole session are exactly the kind of variance that breaks an averaging model. GitHub says the old premium-request model was no longer sustainable, that it had been absorbing the escalating inference cost itself, and on the narrow point I'll take them at their word. Inference genuinely costs money, a flat fee is a poor fit for long multi-step model sessions, and a company that has to pay for compute eventually has to charge for compute. None of that is villainy.

The villainy, or at least the part that earns the side-eye, is everything Microsoft bolted onto the transition that has nothing to do with the honest economics.

Start with the fallback. Under the old model, when you exhausted your premium requests, you could often drop to a cheaper model and keep working at no extra charge. GitHub confirmed that cushion is gone, usage is now governed entirely by your available credits and whatever budget controls an admin has set. So there's no soft landing between you and the meter anymore, which means the spending ceiling that used to be implicit is something you now have to build yourself or learn about the hard way. Then there's the annual plans, which are being wound down. If you're on one, you keep request-based pricing until it expires and then you get dropped to Copilot Free unless you re-subscribe to a monthly plan, and in the meantime the model multipliers on your annual plan went up on June 1. So the customers who committed to a year got their existing pricing made worse on the way out the door.

New signups got paused across Student, Pro, Pro+, and Max. GitHub frames this as a reliability measure to protect the experience for existing customers, and maybe it partly is. It also has the effect of closing the student trial pipeline that feeds paid conversions during the exact window when the loudest, most public anger is peaking and people are actively shopping for alternatives. You can decide for yourself how much of that timing is coincidence. And quietly, the strongest models got pulled from the cheaper tier, with Opus no longer available on Pro at all.

But the one that actually made me sit up is code review. As of June 1, when Copilot reviews a pull request on a private repository, it draws down GitHub Actions minutes in addition to AI Credits, billed at the same per-minute rates as any other Actions workflow. The same automated review now hits two separate meters at once. Public repos are unchanged, Actions minutes stay free there, so this isn't every review everywhere. But for the private repositories where most real work actually happens, GitHub took a thing that used to sit inside your flat fee, moved it onto agentic infrastructure, and arranged for it to spend from two budgets per run. Read that again, because it's the whole pattern in one feature.

And it is a pattern, the kind that's hard to unsee once you've seen it. GitHub spent the past year-plus aggressively pushing developers toward agentic workflows, building product around them, marketing them as the future of the tool. Then it put the most expensive possible meter on exactly those workflows, removed the cushion that softened the blow, wound down annual-plan treatment, restricted model access on the cheaper tier, and made private-repo code review spend from two budgets at once. You can fully accept that inference costs real money and still notice that the structure was built to extract the maximum from the behavior they spent a year-plus training you into. Both things are true. The economics are real. The packaging is a squeeze.

What actually interests me, as a systems person, is the part nobody's talking about underneath all of it. We have decades of operational muscle for managing metered compute. Cloud budgets, cost alerts, capacity planning, rate limits, the entire discipline of FinOps exists because we learned the hard way that on-demand infrastructure will quietly bankrupt you if you don't watch it. We know how to do this. We just never did it for the thing running inside the IDE, because the thing inside the IDE didn't look like infrastructure. It looked like a feature you bought once a month.

That's the deeper shift, and it would have come for us no matter who shipped it first. GitHub isn't the first to meter AI usage this way, it's just the most visible mainstream developer platform to do it, which is why this is the version everyone's arguing about. Agentic coding is a compute workload wearing the costume of a software feature. The moment you let an agent run an autonomous session against your codebase, you're spinning up something closer to a short-lived compute job than a text-editor plugin, and it should have been on a cost model from the beginning. The meter is uncomfortable because it's forcing a category correction that was overdue. The dev loop is a cost center now. It always was. We just had a subscription that let us not think about it.

And once you frame it that way, even the panic reads differently. A developer reporting that a single session ate eight percent of their monthly Pro+ allocation isn't only describing a pricing scam. They're also describing an agent that did a lot of work, possibly more work than the task needed, with no governor on it. That's a workflow problem and a tooling-maturity problem sitting right next to the billing problem, the same way an unbounded cloud job is a workflow problem before it's an AWS problem. Microsoft built a meter designed to extract. You still have to decide how much you feed it.

So what do you actually do with this. A few things, and none of them are "rage-quit to OpenRouter," though that's a legitimate move if the economics genuinely don't work for you, and plenty of people are already making it.

First, treat model selection as a real decision instead of a default. The per-token rates vary enormously across models, and a frontier model running an agentic session is the single most expensive thing you can do in the product. A lot of the work people throw at agents doesn't need an agent at all, let alone the most capable one. Knowing when a cheap model or plain autocomplete is the right tool is now a cost skill, not just a quality preference. And completions stayed unlimited and unmetered on paid plans, which is GitHub quietly telling you where the cheap, predictable value still lives. Use that.

Second, set the budget controls before you need them, not after the bill. The spending caps and dashboards exist, and the user-level budgets can act as a hard stop on how many credits a single person burns in a cycle. The whole reason metered cloud horror stories happen is that nobody sets the alert until they've already had the incident. Don't be that team. And if you run automated code review on private repos, audit your pull request volume against your Actions minutes before the bill teaches you the number.

Third, and this is the one that matters most for anyone running a pipeline, start treating agentic AI spend as a line item you forecast, not a surprise you absorb. If you can't predict what your tooling costs and how it behaves under load, you don't actually control your pipeline. You're just hoping. That was true before June 1. The meter just made it visible, and visibility, even visibility a vendor hands you for self-interested reasons, is not the enemy. It's the thing that lets you plan.

The discomfort everyone's feeling is the discomfort of an abstraction getting stripped away. For a while, the flat fee let us treat a metered, variable, genuinely expensive compute workload as if it were a fixed-cost convenience. That was always going to end, on this product or the next one, because the economics underneath were never fixed. GitHub happened to be the loudest about it, and Microsoft made sure to wring every dollar it could out of the transition on the way through.

The meter isn't a price hike. It's a mirror. The number it's showing you was there the whole time. Heavy users will pay more now, that's real and the article shouldn't pretend otherwise, but the answer isn't to demand the curtain go back up. It's to read the meter, learn to plan around it, and pay close attention to who's holding it and what else they slipped into your bill while your eyes were on the headline number.

The Copilot Meter Didn't Raise the Price. It Showed You the Bill.

Comments

More from this blog

Hugging Face Wasn't the Target. It Was in the Way.

Delaware County's Cyberattack and the Missing 911 Boundary

When the Router Exfiltrates Its Own Configuration

AI Worked Both Sides of the Ledger This Week. The Lesson Isn't "Patch Faster."

Pennsylvania Says Its Statewide 911 Disruption Wasn't a Cyberattack. The Alternative May Be Harder to Defend Against.

Command Palette

Comments

More from this blog