Which AI pricing model protects your margin?
Enter your numbers once. Compare seat, usage, outcome and hybrid on the same business — and see which one survives a heavy-user tail and an inference-cost spike. Every pricing model is really one decision: who absorbs the inference variance — you, or the customer?
Start in QUICK mode with five numbers. Switch to FULL to add the cost-stress and hybrid controls. Hover the ⓘ for what each input means.
Nothing is saved or sent — your numbers stay in your browser and in the shareable link.
At the base case all four can look similar — the differences appear under a +50% inference cost spike and once a heavy-user tail is in the mix. Each card shows the stressed margin (big number), how much variance the model passes to the customer, how predictable the revenue is, and whether heavy users cover their own cost.
Hybrid is a two-part tariff: a predictable base fee plus a metered usage component. Slide the split and watch the trade-off — more base buys predictability, more metering buys margin protection.
What each pricing model is — and what it does to your variance
The same four base components in the financial model — Per Agent, Per Activity, Per Output, Per Outcome — combine into these four families. The difference that matters is who ends up holding the unpredictable inference cost.
Seat-based (Per Agent)
A flat fee per user per month. Simple, predictable revenue, ARR-friendly.
What it shows here: high predictability, but margin that collapses under a cost spike because the price can't move.
Best fit: uniform usage. Fails when: a heavy-user tail quietly eats the margin.
Usage-based (Per Activity / Output)
Price scales with consumption — per token, action, or unit.
What it shows here: margin holds under a spike (the customer's bill moves with cost), heavy users pay their way.
Best fit: high variance. Fails when: the customer needs a predictable budget — revenue gets lumpy.
Outcome-based (Per Outcome)
Charge per result — a resolved ticket, a qualified lead, a completed task.
What it shows here: looks aligned, but cost per outcome is volatile and the model can't pass a spike through.
Best fit: provable attribution + controlled cost. Fails when: outcomes are inference-heavy or disputed.
Hybrid (base + metered)
A base fee for predictable value plus a metered component for variable compute — a two-part tariff.
What it shows here: the only model that holds margin under a spike and keeps revenue predictable.
Best fit: most AI SaaS. Fails when: the product is too simple to justify metering.
Reading your result like a CFO
Four readings behind the cards above — what each number is, and what good looks like.
Margin after a +50% spike healthy: stays well above 0%
This is the stress test. Inference prices move — a model deprecates, a provider raises rates, an outage forces a costlier fallback. The big number on each card is your AI-layer gross margin after a 50% inference-cost spike. If a model goes negative here (seat and outcome often do), every active user becomes a loss the moment costs move against you. The Variance Buffer input is your planning cushion against exactly this.
Source: Bessemer AI Pricing Playbook →Variance passed to the customer seat 0% · hybrid = metered share · usage ~100%
This is the heart of the framework. A pricing model is a contract about who eats a cost spike. Per-seat passes 0% — you absorb all of it. Pure usage passes it through. Hybrid passes through exactly its metered share, which is why the base↔metered slider is the real control. The economics of two-part tariffs (Oi 1971; Png & Wang 2010) show the metered leg behaves like an insurance premium: the more uncertain the cost, the more you route through it.
Source: Png & Wang, Buyer Uncertainty and Two-Part Pricing →Heavy users pay their own cost ✓ they're covered · ✕ they're a subsidy
In AI, the top 5–15% of users can consume 4–6× the median's inference. Under a flat price they pay the same as everyone else — so a ✕ here means your most engaged customers are your least profitable. The canonical case is GitHub Copilot: reporting put Microsoft's loss near $20/user/month on power developers, which is why it moved to usage-based billing. High engagement can be worse than churn when the price is blind to consumption.
Source: GitHub Copilot → usage-based billing →Revenue predictability (and your multiple) predictable revenue = higher multiple
Protecting margin with pure usage has a cost on the other side: predictability. Investors pay more for revenue they can forecast — less predictable earnings carry a higher cost of capital (research puts it around 150–300 bps), and recurring revenue earns a higher multiple than transactional revenue. That's the second front: seat is predictable but margin-exposed; pure usage is margin-safe but lumpy; hybrid keeps a recurring base so you don't trade your valuation for your margin.
Source: Chen, The Subscription Economy (Columbia) →What this snapshot still cannot see
These also decide your round — and none can be answered by a single-month calculation:
- 36-month forecast under your chosen model
- SMB / Mid-Market / Enterprise segment split
- Cash flow, balance sheet, runway
- Cap table to exit and valuation
- LTV:CAC, Rule of 40, NDR, CAC payback, Burn Multiple
- Cohort retention and the cascade stress test
How this calculator works
Every model is scored on the same cost reality. Inference cost is set by usage, not by your pricing — so at the base case the four look alike. They diverge on three things: a +50% inference-cost spike, the heavy-user tail, and revenue predictability. The only thing the model itself changes is how much of a cost spike the contract lets you pass to the customer (pass-through).
The 9 inputs
| Input | Meaning |
|---|---|
| Average active users | Monthly active users generating AI calls; blended across tiers. |
| Target revenue / user / mo | The blended ARPU you intend to capture, before choosing the model. |
| LLM cost / user / mo | Median per-user inference (LLM API) spend per month. |
| Heavy User Multiplier | How much more inference the heavy tail consumes vs the median. |
| % Users that are Heavy | Share of users in that heavy tail. |
| AI Cost Variance Buffer | Planning multiplier on base AI COGS for a cost surprise. |
| HITL cost % | Human quality-assurance time as % of revenue — counted as COGS. |
| GPU + Vector DB fixed | Common AI infrastructure not allocated per user. |
| Hybrid base share | For hybrid: fixed base vs metered split (the two-part-tariff dial). |
Default values and why
| Variable | Default | Basis |
|---|---|---|
| Variance Buffer | 1.20× | Practitioner default, Series A — D. Perelygin |
| Heavy User Multiplier | 2.0× | Practitioner estimate; aligns with MS Copilot tier analysis |
| % Heavy users | 10% | Practitioner estimate, typical Pareto tail |
| HITL cost | 7% | Practitioner estimate, Series A median — D. Perelygin |
Framework & sources
Framework: Inference Variance Allocation. Two-part-tariff economics — Oi (1971), Sundararajan (2004), Png & Wang (2010), Wong (2018). Heavy-tail / inference cost — Bai et al. (2026), Gomes (2026). Outcome-pricing risk — Saig et al. (2024), Iyer et al. (2025). Revenue-predictability valuation — Francis et al., Dechow & Schrand (2004), Chen (2024). Benchmarks: Bessemer, ICONIQ. The "5×/10×" multiple gap is a market observation (Software Equity Group), not a settled academic figure. Full citations are in the AI SaaS pricing models guide.
What this calculator does NOT compute
A 36-month forecast · customer-segment split · cash flow, balance sheet, cap table, valuation · LTV:CAC, Rule of 40, NDR, payback, Burn Multiple · cohort retention · the cascade stress test. Those need the full AI SaaS Financial Model.
Common questions
The Gross Margin Calculator answers “what is my AI-layer margin right now?” This one answers “which pricing model protects that margin?” Use them together: check your margin first, then choose the model that keeps it positive under stress.
Because inference cost is driven by usage, not by how you price. At the calm base case the models collect a similar ARPU. The differences only appear under a cost spike, with a heavy-user tail, and in revenue predictability — which is exactly the point: the dashboard stays green until conditions move.
It’s how much of an inference-cost spike your contract lets you recover from the customer instead of absorbing it. Per-seat passes 0% (you eat it), pure usage passes ~100%, and hybrid passes its metered share. It is the single most important risk number on the page.
It should stay comfortably positive. A model that turns negative under a 50% spike is one bad token-price week from losing money on every user. Below ~25% post-spike is a warning; negative is a stop.
No — uncapped per-seat is the problem. A flat seat price works when usage is uniform, or with caps, tiers, fair-use, or a metered overage on top. The calculator flags when your heavy tail makes a bare seat price unsafe.
Usually hybrid: a base platform fee plus metered usage or outcome bands. Agent workloads have very high, variable inference cost, so a flat per-agent price exposes you to runaway cost while a pure usage price alone makes the customer’s bill unpredictable.
There’s no email gate. Nothing is stored or sent. “Copy shareable link” puts your inputs in the URL so you can save or share them — your numbers stay in your browser.
This is one blended snapshot, illustrative by design. The full AI SaaS Financial Model computes the same logic across 36 months and three customer segments, plus cash flow, balance sheet, cap table, valuation and 87 cited benchmarks.
One snapshot is not a financial model.
To walk into a VC meeting you need the whole economy modelled — 36 months, three customer segments, AI Layer and Traditional GM decomposed, cash flow, balance sheet, cap table to exit, and 87 cited benchmarks. That is the AI SaaS Financial Model bundle — built by the same fractional CFO behind this calculator.
Learn how to build the full AI startup financial model →
See the full AI SaaS model →