Last quarter, I ran the numbers on our team’s AI tooling spend. What I found made me pause. The combined cost of AI coding assistants across a mid-sized engineering team had quietly crept past what we used to budget for cloud infrastructure. Nobody flagged it. Nobody had a policy around it. And nobody — from the engineers actually using the tools to the finance team signing off on expenses — had a shared understanding of what we were actually getting for that money.
This is not a unique situation. It’s happening quietly in engineering teams everywhere, and the silence around it is going to become a very loud problem very soon.
The Cost Nobody Budgeted For
Let’s talk numbers. GitHub Copilot runs at $19–39 per user per month. Cursor Pro sits at $20. Claude Code, Windsurf, and similar agentic tools push the ceiling toward $100–200 per engineer monthly when used heavily. For a 50-person engineering team, you’re looking at $60,000 to $120,000 per year — just on AI coding tools. That’s real infrastructure-level spend, yet it’s being approved through individual expense reports, team budgets, and “just try it out” decisions with no centralised visibility.
The problem isn’t the cost itself. AI tools are delivering real value — faster prototyping, reduced context-switching, better test coverage. The problem is that organisations are making $100K+ yearly commitments through a series of $20 monthly decisions, with no framework to evaluate whether those decisions are the right ones.
How We Got Here: The “Just Expense It” Phase
Every major technology shift goes through an experimentation phase where the rules are loose by design. We did this with cloud adoption — teams span up EC2 instances freely, and finance caught up later. We did it with SaaS tools — Slack, Notion, and Figma all made their way into organisations through individual expense reports before IT and procurement got involved.
AI coding tools followed the same path, but the experimentation phase is ending faster than most organisations realise. The “just try it, we’ll figure out governance later” season is over. Engineering leaders who don’t build a framework now will spend the next 18 months firefighting: duplicate subscriptions, unclear ROI, inconsistent tool usage, and security exposure from unvetted products handling proprietary codebases.
The window to get ahead of this is right now, before finance and procurement close it for you with a blanket policy that ignores nuance.
Not All AI Tools Are Created Equal — The Three-Tier Framework
The first thing an engineering leader needs to do is stop treating all AI tooling as one category. A Copilot subscription for a frontend developer is a very different investment than a Claude Code Max plan for a senior architect working on complex system design. Lumping them together leads to either over-cutting or under-governing.
Here’s a tiering framework I’ve found useful:
Tier 1 — Core Velocity Tools are tools that measurably reduce time-to-ship for most engineers on your team. These are non-negotiable investments if the ROI is clear. Think of a solid AI autocomplete tool for engineers writing repetitive CRUD logic, or a code review assistant that catches common bugs before they reach your CI pipeline. These justify the spend with minimal debate.
Tier 2 — Role-Specific Tools serve a narrower function — agentic coding environments for senior engineers, AI-powered architecture diagramming tools, or specialised database query assistants. These are valid but need to be evaluated per role, not blanket-approved for everyone.
Tier 3 — Nice-to-Have Experiments are tools that an individual engineer finds valuable but where the team-wide impact is unclear. These are fine to allow in a sandbox — but they shouldn’t be expensed without a short evaluation period and a clear exit criterion.
The exercise of sorting your current tooling into these three tiers alone will reveal redundancies you didn’t know you had.
How to Actually Measure Productivity (Without Lying to Yourself)
This is where most governance conversations fall apart. Engineering leaders reach for the easiest metrics — number of pull requests merged, lines of code written, sprint velocity — and end up with a distorted picture. AI tools can inflate all three of these while making your overall software quality worse.
A 2025 DORA report flagged that while AI-assisted teams are seeing PR volume increase by around 20% year-on-year, change failure rates are climbing alongside it. More code isn’t better code. Faster shipping isn’t safer shipping. Any productivity framework that doesn’t account for this is measuring the wrong thing.
What you actually want to track is the ratio of value delivered to defects introduced. That means looking at deployment frequency alongside change failure rate, measuring mean time to recovery on incidents that trace back to AI-generated code, and doing honest retrospectives on what your team is still manually fixing after AI generates a first draft. The goal is to know whether your AI tooling is increasing your team’s leverage or just increasing your team’s output — those are not the same thing.
Building a Lightweight Governance Policy in a Day
Governance doesn’t have to mean bureaucracy. The mistake most organisations make is waiting until they have a perfect policy before implementing any policy. You don’t need a 30-page document. You need four clear answers.
Who approves AI tool spend? It should be the engineering manager, not the individual engineer. This isn’t about control — it’s about giving someone the visibility to spot when three people on the same team are paying for three different tools that do the same thing.
What’s the team budget ceiling? Set a per-engineer monthly ceiling that covers a Tier 1 tool and one Tier 2 tool. Make it explicit. Engineers shouldn’t be guessing whether their tool choice will get flagged at month-end.
How do we evaluate new tools? Define a short evaluation window — 30 days is enough. Ask for a simple write-up at the end: what problem does this solve, is it solving it, and what would we lose if we stopped using it? That’s it. You’re not running a formal RFP. You’re making a $20/month decision with slightly more rigour.
When do we review? Quarterly is sufficient for most teams. The review should be lightweight — 30 minutes, focused on utilisation data, any security flags, and whether the tier assignments still make sense.
This four-question framework takes a morning to fill out and saves months of back-and-forth when someone in finance eventually asks why your software tooling line has tripled.
The Tool Sprawl Trap
Here’s something I’ve observed in fast-moving engineering teams: the more autonomy developers have to choose their own tools, the more likely you are to end up with a sprawl that creates problems no single tool was designed to cause.
Picture a team where some engineers use Cursor, some use Copilot, some use Claude Code, and someone in the corner is running Windsurf. On the surface this looks like healthy experimentation. Underneath, it means your team’s collective knowledge about AI-assisted development is fragmented. Best practices don’t transfer between tools. Prompt patterns that work in one environment don’t apply to another. And when something goes wrong in production — when AI-generated code causes an incident — diagnosing which tool contributed to which decision becomes genuinely hard.
Beyond productivity, there’s a security dimension that often gets overlooked. Each AI coding tool you allow access to your codebase is a product with its own data handling policies, telemetry practices, and training data agreements. Approving five tools without reviewing any of their data policies is a risk your security team hasn’t been asked to sign off on, but probably should be.
Standardising on one or two tools per tier is not about limiting engineers. It’s about concentrating your learning curve, your security review effort, and your feedback loops in places where they can compound.
What I’d Do If I Were Starting from Scratch Today
If I were building a governance framework for AI tooling from the ground up, I’d start with a simple audit: list every AI coding tool currently used by anyone on the team, what they’re paying, and what problem they say it solves. In most teams I’ve seen this exercise done, the list surprises everyone — including the people who approved the tools.
From there, I’d run each tool through the three-tier framework, set a per-engineer ceiling that covers Tier 1 and Tier 2 comfortably, and ask every engineer using a Tier 3 tool to either move it to Tier 2 with justification or sunset it at the next renewal.
I’d also introduce one rule that sounds simple but has a big impact: no AI coding tool gets access to production credentials or internal codebase data without a security review. This isn’t about blocking progress. It’s about making sure the speed gain from the tool doesn’t come with a risk you didn’t see coming.
Finally, I’d make the policy visible and revisable. Post it somewhere engineers can read it. Tell the team why it exists — not as a cost-cutting exercise but as a way to make sure the tools that genuinely help the team get properly resourced, and the ones that don’t get quietly retired before they become someone’s pet tool that nobody wants to cancel.
The Bottom Line
AI coding tools are here to stay, and the engineering teams that use them well will have a real advantage. But advantage requires intentionality. Right now, most organisations are accruing AI tool debt the same way they once accrued technical debt — one small, invisible decision at a time, with no one watching the total.
The engineering leaders who build governance frameworks now won’t be the ones who slow their teams down. They’ll be the ones who make sure their teams’ investment in these tools actually compounds — rather than quietly hemorrhaging money, attention, and security posture on tools that nobody evaluated and nobody is watching.
Getting ahead of this isn’t hard. But it does require someone deciding to look.
