Free calculator · no email

Which AI pricing model protects your margin?

Enter your numbers once. Compare seat, usage, outcome and hybrid on the same business — and see which one survives a heavy-user tail and an inference-cost spike. Every pricing model is really one decision: who absorbs the inference variance — you, or the customer?

1 · Your inputs

Start in QUICK mode with five numbers. Switch to FULL to add the cost-stress and hybrid controls. Hover the ⓘ for what each input means.

Average active users iHow many users generate AI usage each month.

Target revenue / user / mo iThe blended $/user/mo you aim to charge (any model).

LLM cost / user / mo iInference cost for a typical (median) user.

Heavy User Multiplier iHow much more a heavy user burns vs the median.

% Users that are Heavy iShare of users in that heavy tail.

AI Cost Variance Buffer iPlanning cushion on inference cost (1.20× default).

HITL cost iHuman QA as % of revenue (counts as COGS).

GPU + Vector DB fixed iFixed monthly AI infrastructure ($/mo).

Hybrid base share i

50%

Hybrid only: fixed base vs metered split.

Nothing is saved or sent — your numbers stay in your browser and in the shareable link.

2 · Your result

At the base case all four can look similar — the differences appear under a +50% inference cost spike and once a heavy-user tail is in the mix. Each card shows the stressed margin (big number), how much variance the model passes to the customer, how predictable the revenue is, and whether heavy users cover their own cost.

Hybrid is a two-part tariff: a predictable base fee plus a metered usage component. Slide the split and watch the trade-off — more base buys predictability, more metering buys margin protection.

base 50% · metered 50%

—

AI Layer gross margin (base case)

—

Margin after +50% inference spike

—

Variance passed to the customer

—

Revenue predictability (multiple)

The four models

What each pricing model is — and what it does to your variance

The same four base components in the financial model — Per Agent, Per Activity, Per Output, Per Outcome — combine into these four families. The difference that matters is who ends up holding the unpredictable inference cost.

Company bears variance

Seat-based (Per Agent)

A flat fee per user per month. Simple, predictable revenue, ARR-friendly.

What it shows here: high predictability, but margin that collapses under a cost spike because the price can't move.

Best fit: uniform usage. Fails when: a heavy-user tail quietly eats the margin.

Customer bears variance

Usage-based (Per Activity / Output)

Price scales with consumption — per token, action, or unit.

What it shows here: margin holds under a spike (the customer's bill moves with cost), heavy users pay their way.

Best fit: high variance. Fails when: the customer needs a predictable budget — revenue gets lumpy.

Variance concentrated

Outcome-based (Per Outcome)

Charge per result — a resolved ticket, a qualified lead, a completed task.

What it shows here: looks aligned, but cost per outcome is volatile and the model can't pass a spike through.

Best fit: provable attribution + controlled cost. Fails when: outcomes are inference-heavy or disputed.

Shared, by design

Hybrid (base + metered)

A base fee for predictable value plus a metered component for variable compute — a two-part tariff.

What it shows here: the only model that holds margin under a spike and keeps revenue predictable.

Best fit: most AI SaaS. Fails when: the product is too simple to justify metering.

What your numbers mean

Reading your result like a CFO

Four readings behind the cards above — what each number is, and what good looks like.

Margin after a +50% spike healthy: stays well above 0%

This is the stress test. Inference prices move — a model deprecates, a provider raises rates, an outage forces a costlier fallback. The big number on each card is your AI-layer gross margin after a 50% inference-cost spike. If a model goes negative here (seat and outcome often do), every active user becomes a loss the moment costs move against you. The Variance Buffer input is your planning cushion against exactly this.

Source: Bessemer AI Pricing Playbook →

Variance passed to the customer seat 0% · hybrid = metered share · usage ~100%

This is the heart of the framework. A pricing model is a contract about who eats a cost spike. Per-seat passes 0% — you absorb all of it. Pure usage passes it through. Hybrid passes through exactly its metered share, which is why the base↔metered slider is the real control. The economics of two-part tariffs (Oi 1971; Png & Wang 2010) show the metered leg behaves like an insurance premium: the more uncertain the cost, the more you route through it.

Source: Png & Wang, Buyer Uncertainty and Two-Part Pricing →

Heavy users pay their own cost ✓ they're covered · ✕ they're a subsidy

In AI, the top 5–15% of users can consume 4–6× the median's inference. Under a flat price they pay the same as everyone else — so a ✕ here means your most engaged customers are your least profitable. The canonical case is GitHub Copilot: reporting put Microsoft's loss near $20/user/month on power developers, which is why it moved to usage-based billing. High engagement can be worse than churn when the price is blind to consumption.

Source: GitHub Copilot → usage-based billing →

Revenue predictability (and your multiple) predictable revenue = higher multiple

Protecting margin with pure usage has a cost on the other side: predictability. Investors pay more for revenue they can forecast — less predictable earnings carry a higher cost of capital (research puts it around 150–300 bps), and recurring revenue earns a higher multiple than transactional revenue. That's the second front: seat is predictable but margin-exposed; pure usage is margin-safe but lumpy; hybrid keeps a recurring base so you don't trade your valuation for your margin.

Source: Chen, The Subscription Economy (Columbia) →

What this snapshot still cannot see

These also decide your round — and none can be answered by a single-month calculation:

36-month forecast under your chosen model
SMB / Mid-Market / Enterprise segment split
Cash flow, balance sheet, runway
Cap table to exit and valuation
LTV:CAC, Rule of 40, NDR, CAC payback, Burn Multiple
Cohort retention and the cascade stress test

Methodology

How this calculator works

Every model is scored on the same cost reality. Inference cost is set by usage, not by your pricing — so at the base case the four look alike. They diverge on three things: a +50% inference-cost spike, the heavy-user tail, and revenue predictability. The only thing the model itself changes is how much of a cost spike the contract lets you pass to the customer (pass-through).

The 9 inputs

Input	Meaning
Average active users	Monthly active users generating AI calls; blended across tiers.
Target revenue / user / mo	The blended ARPU you intend to capture, before choosing the model.
LLM cost / user / mo	Median per-user inference (LLM API) spend per month.
Heavy User Multiplier	How much more inference the heavy tail consumes vs the median.
% Users that are Heavy	Share of users in that heavy tail.
AI Cost Variance Buffer	Planning multiplier on base AI COGS for a cost surprise.
HITL cost %	Human quality-assurance time as % of revenue — counted as COGS.
GPU + Vector DB fixed	Common AI infrastructure not allocated per user.
Hybrid base share	For hybrid: fixed base vs metered split (the two-part-tariff dial).

Default values and why

Variable	Default	Basis
Variance Buffer	1.20×	Practitioner default, Series A — D. Perelygin
Heavy User Multiplier	2.0×	Practitioner estimate; aligns with MS Copilot tier analysis
% Heavy users	10%	Practitioner estimate, typical Pareto tail
HITL cost	7%	Practitioner estimate, Series A median — D. Perelygin

Framework & sources

Framework: Inference Variance Allocation. Two-part-tariff economics — Oi (1971), Sundararajan (2004), Png & Wang (2010), Wong (2018). Heavy-tail / inference cost — Bai et al. (2026), Gomes (2026). Outcome-pricing risk — Saig et al. (2024), Iyer et al. (2025). Revenue-predictability valuation — Francis et al., Dechow & Schrand (2004), Chen (2024). Benchmarks: Bessemer, ICONIQ. The "5×/10×" multiple gap is a market observation (Software Equity Group), not a settled academic figure. Full citations are in the AI SaaS pricing models guide.

What this calculator does NOT compute

A 36-month forecast · customer-segment split · cash flow, balance sheet, cap table, valuation · LTV:CAC, Rule of 40, NDR, payback, Burn Multiple · cohort retention · the cascade stress test. Those need the full AI SaaS Financial Model.

Who built this

$Dmitry Perelygin, fractional CFO$

Dmitry Perelygin

ACMA · CGMA · MBA University of Manchester · 25+ years

Fractional CFO. I have sat on both sides of the table — raised capital as a CFO and evaluated deals from the buy side. The math here is the same financial work I do with AI SaaS founders preparing for their first round. More about Dmitry →

FAQ

Common questions

How is this different from the AI Gross Margin Calculator?

The Gross Margin Calculator answers “what is my AI-layer margin right now?” This one answers “which pricing model protects that margin?” Use them together: check your margin first, then choose the model that keeps it positive under stress.

Why do all four models look the same at the base case?

Because inference cost is driven by usage, not by how you price. At the calm base case the models collect a similar ARPU. The differences only appear under a cost spike, with a heavy-user tail, and in revenue predictability — which is exactly the point: the dashboard stays green until conditions move.

What does “variance passed to the customer” mean?

It’s how much of an inference-cost spike your contract lets you recover from the customer instead of absorbing it. Per-seat passes 0% (you eat it), pure usage passes ~100%, and hybrid passes its metered share. It is the single most important risk number on the page.

What’s a healthy margin after the +50% spike?

It should stay comfortably positive. A model that turns negative under a 50% spike is one bad token-price week from losing money on every user. Below ~25% post-spike is a warning; negative is a stop.

Is per-seat pricing dead?

No — uncapped per-seat is the problem. A flat seat price works when usage is uniform, or with caps, tiers, fair-use, or a metered overage on top. The calculator flags when your heavy tail makes a bare seat price unsafe.

How should I price AI agents?

Usually hybrid: a base platform fee plus metered usage or outcome bands. Agent workloads have very high, variable inference cost, so a flat per-agent price exposes you to runaway cost while a pure usage price alone makes the customer’s bill unpredictable.

Why no email?

There’s no email gate. Nothing is stored or sent. “Copy shareable link” puts your inputs in the URL so you can save or share them — your numbers stay in your browser.

How accurate is this versus the full model?

This is one blended snapshot, illustrative by design. The full AI SaaS Financial Model computes the same logic across 36 months and three customer segments, plus cash flow, balance sheet, cap table, valuation and 87 cited benchmarks.

One snapshot is not a financial model.

To walk into a VC meeting you need the whole economy modelled — 36 months, three customer segments, AI Layer and Traditional GM decomposed, cash flow, balance sheet, cap table to exit, and 87 cited benchmarks. That is the AI SaaS Financial Model bundle — built by the same fractional CFO behind this calculator.

Learn how to build the full AI startup financial model →

5 files · 17 sheets · ~140 inputs · 87 cited benchmarks · from purchase to VC-ready in ~90 minutes

See the full AI SaaS model →

Not sure your margin even survives today? Start with the free AI Gross Margin Calculator →

Calculator version v1.0 · Methodology aligned with AI SaaS Financial Model bundle v1.2 · Numbers are illustrative projections from your inputs.

Educational utility — not professional financial advice. Nothing is saved or sent; your inputs live only in this page and the shareable link.