For management reporting and investor models, customer-facing AI inference should normally be treated as COGS / cost of revenue — it is incurred to deliver the paid product and scales with usage. Inference with no revenue attached (internal tools, experimentation, a free tier) is OpEx. Model training books separately (R&D or capitalized software). Statutory classification ultimately depends on your contracts, facts, and auditor judgment. The stakes are real: park inference in OpEx and you can show a ~80% gross margin on a product whose honest AI-native peers averaged near 52% in 2026 (ICONIQ).
- If the inference is required to deliver a paid customer output → model it as COGS / cost of revenue.
- If it supports internal work, experimentation, or a free tier with no enforceable revenue contract → normally outside cost of revenue (often sales/marketing or product OpEx).
- If it is training or fine-tuning → treat it separately from inference (R&D or capitalized software).
Most founders treat "where does inference go on the P&L?" as a bookkeeping detail to hand off. It isn't. The line you choose decides what your gross margin says — and gross margin is the number an investor uses to judge whether your revenue is worth funding. Put the cost that scales with usage in the wrong place and your dashboard stays green right up until the margin gives way.
This is a practical decision guide for three readers at once: the CFO or finance lead making the call, the accountant or controller who has to defend it, and the founder who has to explain the resulting margin to an investor. Each section gives you the thesis, the accounting standard behind it, how the Big Four read it, what the research says, what it means for your raise, and how to model it.
Who this is for: AI-SaaS founders and operators (seed through Series A), the finance leads building the investor model, and the accountants closing the books — anyone whose product carries real inference, GPU, vector-DB, or human-in-the-loop cost. If you want the number before the theory, the free AI Gross Margin Calculator restates your margin with inference in the right place.
The AI Inference COGS Test
Almost every case resolves with three questions.
- Is the inference call triggered by a paying customer?
- Is it required to deliver the contracted output?
- Does its cost vary with usage?
One honest nuance — and it is also the point: no line in the accounting codification says "inference is COGS." The standards give you a principle (costs incurred to fulfill a contract, matched to the revenue they produce) and a category (cost of sales). Applying that principle to a brand-new cost type is the CFO's job — which is exactly why companies get it wrong, and why "everyone else buries it in hosting" isn't a defense.
The decision table
The same logic in full. Where US GAAP and IFRS diverge, both are noted. Copy it into your model.
| Cost type | Normal treatment | Governing standard | Effect on GM |
|---|---|---|---|
| Customer-facing production inference | COGS | ASC 340-40 / 705; IFRS 15 §95 | Compresses |
| Internal-tool inference (no revenue) | OpEx | ASC 350-40 | None |
| Human-in-the-loop — delivery-tied (per paid output) | COGS | ASC 340-40 | Compresses |
| Human-in-the-loop — data labeling for training | OpEx (R&D) | ASC 730-10 | None |
| Eval / observability tied to live delivery | COGS (else OpEx) | ASC 340-40 / 730-10 | If delivery-tied |
| Initial model training | Capitalize or expense | ASC 350-40 / 985-20 / 730; IAS 38 (IFRS) | Below the line |
| Fine-tuning / retraining | OpEx / R&D (periodic) | ASC 730-10 / 350-40 | Below the line |
| Vector DB / RAG — serving retrieval | COGS | ASC 705 / IFRS 15 | Compresses |
| Vector DB / RAG — building the data asset | Capitalize | ASC 350-40 | Below the line |
| Hosting — customer-facing | COGS | ASC 705 / IFRS 15 | Compresses |
Thesis 1 — Customer-facing inference is cost of revenue
If a model call is required to deliver the result a customer pays for, the clean management-accounting treatment is to model its cost as cost of revenue. Calling it "infrastructure" or "R&D" flatters the margin, but it isn't what the cost behavior supports.
The standard Inference isn't named in the codification, so the treatment follows principle. Under ASC 340-40, costs that relate directly to a customer contract and are incurred to fulfill it fall within the contract-cost framework. Where customer-facing inference is consumed immediately to deliver the paid output and creates no separately recoverable asset, the management-reporting treatment is to expense it through cost of revenue (ASC 705), matched to the revenue it produces under ASC 606. IFRS 15 §§95–104 frames "costs to fulfil a contract" the same way. The standards don't name inference; the principle places it in cost of revenue.
The auditors Public Big Four guidance on AI and software costs generally points back to existing software, contract-cost, and revenue-recognition principles rather than creating a special AI exception. KPMG's Software data costs Hot Topic (Feb 2026) applies the existing rules to AI software "without any specific exceptions"; reading that 2024–2026 practice, SFAI Labs concludes that where inference delivers the paid product, it is cost of revenue.
The research Academic work shows why this is the structurally correct home: in The Economics of AI Inference (arXiv 2025), inference sits on a "production frontier" with a genuine, recurring marginal cost per call — it behaves like a variable cost of production, not a fixed overhead. A cost that re-incurs on every unit of delivery is, by nature, cost of revenue.
For the founder & the investor conversation Inference in COGS produces an honest view of the AI layer — what a sophisticated investor underwrites, not the blended company number. Bessemer's State of AI 2025 split AI companies into "Supernovas" at roughly 25% gross margins on unoptimized infrastructure and "Shooting Stars" at roughly 60% after disciplined engineering — same market, same year. Showing inference correctly placed signals you understand your own economics.
"Inference is 23% of revenue." It isn't. ICONIQ's figure is 23% of total AI product cost (per Vista Equity's attribution) — a different denominator and a much smaller share of revenue. Quoting it correctly is an easy way to look sharper than the room.
How to model it Make inference its own variable cost-of-revenue line, driven by usage units, not a fixed monthly number lost in "hosting." For the full margin math and benchmark bands, see the companion AI Layer Gross Margin guide — this guide is about where the cost books; that one is about what the resulting margin should be.
Thesis 2 — The OpEx exception: inference with no revenue attached
The half of this question most pages skip: not all inference is COGS. Inference that isn't in a revenue-attribution chain is OpEx — and classifying it that way is correct, not a loophole.
The standard A general-purpose internal AI tool — employees drafting documents, an internal copilot, a non-billed assistant — has no performance obligation and no contract behind it. Its compute is operating expense, treated as a cloud service under ASC 350-40 (no asset, no cost of sales). The classification only switches on once the inference call is attached to something a customer pays for. Free-tier inference with no enforceable revenue contract is normally modeled outside cost of revenue — often as sales/marketing or product OpEx — unless your accounting policy treats free usage as part of a contract-specific fulfillment obligation.
The auditors Deloitte's technology guidance draws the same internal-use-versus-hosting boundary, and KPMG's Software and website costs Handbook (Feb 2026) walks the internal-use classification in detail. The dividing question is always: is this compute delivering a paid obligation, or running the business?
The research The academic LCOAI metric (a "levelized cost of AI," modeled on levelized cost of energy) is useful here because it forces you to separate deployment contexts — the per-unit cost of a customer-serving deployment is a different number from internal experimentation, and conflating them misstates the margin in both directions.
For the founder & the investor conversation Discipline cuts both ways: don't inflate COGS with internal tooling either, because it makes the AI layer look worse than it is and an investor will catch the inconsistency. A clean split — customer-facing inference in COGS, internal inference in OpEx — is what makes the whole model auditable, and therefore credible.
How to model it Tag inference in two streams from day one: customer-facing (COGS, scales with paid usage) and internal/free-tier (OpEx, scales with headcount and experimentation). A free tier with no revenue is a customer-acquisition cost, not cost of revenue — model it as such.
Thesis 3 — Training is not inference, and it books separately
The most common margin distortion isn't hiding inference in OpEx — it's smearing the two AI cost types together. Training is the one-time-and-periodic build; inference is the recurring delivery. They sit in different places, and blurring them flatters or wrecks the margin depending on which way you blur.
The standard Initial training and fine-tuning are development activities, not delivery. Under US GAAP they are generally expensed as research and development as incurred (ASC 730-10-25-2); for software to be sold, ASC 985-20-25-1 expenses costs until technological feasibility, then capitalizes; for internal-use or hosted software, ASC 350-40 capitalizes qualifying development costs. IFRS diverges: IAS 38.54–57 permits capitalizing development-phase costs when criteria are met, where US GAAP would expense the R&D — a real difference international founders must flag. And the new FASB ASU 2025-06 (No. 2025-06, September 2025) modernizes ASC 350-40: it removes the old "project stages," introduces a "probable-to-complete" threshold for when capitalization begins, and is effective for annual periods beginning after December 15, 2027 (early adoption permitted). The Board expects it to reduce how much companies capitalize — so a model built on the old stage-gates is worth revisiting.
The auditors Deloitte's Accounting for the Development of Generative AI Software Products (Oct 2024) routes fine-tuning, adversarial training, and data-acquisition costs through ASC 985-20 or ASC 350-40, and expenses pre-feasibility data work under ASC 730-10. EY's To the Point on ASU 2025-06 (Sept 19, 2025) confirms the project-stage removal and the December 2027 effective date; KPMG's Hot Topic addresses training and retraining data costs specifically.
The research The scale point that surprises founders: training is the minority of lifetime cost. Some infrastructure analyses (S&P Global / 451 Research) estimate inference can account for the large majority of an AI system's lifetime cost, with training a minority — and Menlo Ventures found only 9% of production models are fine-tuned at all. The recurring delivery cost — inference — is what defines your margin, not the training run.
For the founder & the investor conversation Don't bury recurring inference inside "R&D" to lift gross margin. An investor or auditor who sees a suspiciously high AI-product margin will ask where the per-call cost went, and "we classified it as research" is the answer that invites deeper diligence. The honest framing is the stronger one: training is capex-like — plan it, capitalize or expense it once, report it below the gross-margin line; inference is the recurring cost of revenue.
How to model it Three separate lines: initial training (capitalized or R&D, below gross margin), retraining/fine-tuning (periodic, scheduled — not continuous), and inference (variable COGS). Don't let a quarterly retraining cost contaminate your per-unit delivery economics.
Thesis 4 — "Inference will get cheaper" doesn't move it out of COGS
The most common objection a founder hears — sometimes from their own board — is that inference prices are collapsing, so the cost doesn't really matter and shouldn't weigh on the margin. It's the argument used to justify parking inference in OpEx and waiting for the problem to disappear. It's weak on two counts.
The standard Classification follows function, not a price forecast. As long as the cost is incurred to deliver the paid product, it is a cost to fulfill the contract — whether the unit price is rising or falling doesn't change where it books.
The auditors Cost behavior drives the line: a variable, usage-tied cost is cost of revenue regardless of its trend. No Big Four guidance lets you reclassify a delivery cost to OpEx because you expect it to shrink.
The research And it isn't shrinking as fast as the optimists claim. The Price of Progress (arXiv 2026) finds the price for a fixed level of model performance is falling closer to ~10× per year than the 1,000× sometimes implied — real, but gradual and uneven. The frontier pushes the other way: Epoch AI, summarizing Toby Ord, argues that reinforcement-learning and test-time "reasoning" scaling primarily increase inference cost, creating a "persistent economic burden." Cheaper tokens, more tokens per task — the net is not a free lunch.
For the founder & the investor conversation You'll meet these objections often: investors like Altimeter's Jamin Ball argue the marginal cost of creation is trending toward zero, and a16z's Martin Casado points to historical precedent where compute became almost free. The honest CFO answer isn't doom — it's that unit price does fall, but that doesn't fix the margin, because variability and heavy-user concentration compress it faster than price recovers it, and you don't get to assume a price cut you can't control. You can even concede the nuance — analyst Martin Alderson has shown input processing is far cheaper than generation — without giving up the line.
How to model it Never model a price decline as a guarantee. Apply a Variance Buffer (≈1.2× on variable AI cost as a default) to stress for token-price swings and retry storms, and a Heavy-User Multiplier so a handful of power users don't quietly eat the margin of the many. A model that assumes inference only gets cheaper has never met a bad agent loop.
Worked example: from the financial-model template
Take the template's demo company (figures illustrative). It reports a blended gross margin of 84.3% — the number that makes a deck look like classic SaaS. Apply the test above: pull customer-facing inference and delivery-tied human-in-the-loop into cost of revenue, and the isolated AI layer margin lands at 40.6%. The 84.3% wasn't wrong as a blend — it was averaging a healthy software layer over a thin AI layer and reporting the blend as if it were the whole story.
Stress it further. Hold pricing flat and let the LLM cost rise 50% — a realistic heavy-usage or model-switch scenario — and the AI layer margin falls toward an illustrative ~12%, dragging a once-comfortable LTV:CAC of 4.16 down toward roughly 2:1. None of that is visible in the blended 84.3%. That is why the classification isn't trivia: the line you pick determines whether you see the cliff while you can still steer.
What to do this week
In the models I pressure-test, the first red flag is the same one almost every time: an AI product reporting an 80%-plus gross margin with no separate inference line on the P&L. The cost didn't vanish — it was filed somewhere that flatters the margin.
Don't wait for diligence to do this for you. Pull every AI cost off the P&L and re-sort it with one question: is this consumed to deliver a paid output, or to build the product and run the business? Move customer-facing inference and delivery-tied human-in-the-loop into cost of revenue. Leave internal tooling and training where they belong. Then recompute the margin on the AI layer alone, and decide which lever — routing, caching, usage caps, pricing — to pull before the next board meeting.
See your real AI gross margin in minutes
The free AI Gross Margin Calculator restates your margin with inference in cost of revenue and shows exactly which line is doing the damage.
Open the AI Gross Margin CalculatorFAQ
Is AI inference cost COGS or OpEx?
Is model training a COGS cost?
Is AI inference cost part of gross margin?
What's the difference between AI inference and model training in accounting?
Does inference belong in COGS under GAAP and IFRS?
How do I record inference cost in my books?
Source notes
- Standards: FASB ASC 340-40 (costs to fulfill a contract), ASC 606, ASC 705, ASC 730-10, ASC 350-40, ASC 985-20; FASB ASU 2025-06 (Sept 2025, effective for annual periods beginning after Dec 15, 2027); IFRS 15 §§95–104; IAS 38.54–57. Standard text is copyrighted (FASB / IFRS Foundation) — paragraph references are given rather than reproduced text.
- Auditor guidance: Deloitte, Accounting for the Development of Generative AI Software Products (Oct 2024); EY, To the Point — FASB modernizes internal-use software (Sept 19, 2025); KPMG, Software data costs Hot Topic and Software and website costs Handbook (Feb 2026); PwC, Software costs guide (Nov 2025).
- Benchmarks: ICONIQ State of AI 2026 (AI-native gross margin ~52% in 2026, from 41% in 2024); Bessemer State of AI 2025 (~25% vs ~60% cohorts); Vista Equity (inference = 23% of AI product cost); S&P Global / 451 Research (inference a large majority of lifetime cost, estimate); Menlo Ventures (9% of production models fine-tuned).
- Academic: The Economics of AI Inference (arXiv:2510.26136); The Price of Progress (arXiv:2511.23455); Epoch AI / Toby Ord on the persistence of inference cost.
- Practitioner judgment: the AI Inference COGS Test, the Variance Buffer (~1.2×) and the Heavy-User Multiplier are the author's frameworks, presented as such.
This is an editorial decision framework, not accounting or financial advice — validate against your own facts with your auditor. Published June 2026.
What to do next
Reading isn’t doing. Three options, in ascending order of investment:
