AI Inference Cost: COGS or OpEx? CFO Guide

Q: Is AI inference cost part of gross margin?

Yes. When customer-facing inference is in cost of revenue, it directly reduces gross margin. That is the whole reason the classification matters: moving inference to OpEx lifts reported gross margin without changing the economics, which is how an AI product can look like an 80% SaaS business while the layer that scales with usage runs far lower.

Q: Does inference belong in COGS under GAAP and IFRS?

No standard names inference specifically. The treatment follows the costs-to-fulfill-a-contract principle in ASC 340-40 and IFRS 15 paragraphs 95 to 104, plus cost-of-sales guidance in ASC 705. Applied to customer-facing inference, that principle places it in cost of revenue. Public Big Four guidance generally points back to these existing principles rather than creating a special AI exception, subject to your contracts and auditor judgment.

inference COGS or OpEx AI inference accounting AI cost of revenue AI SaaS gross margin

Quick answer

For management reporting and investor models, customer-facing AI inference should normally be treated as COGS / cost of revenue — it is incurred to deliver the paid product and scales with usage. Inference with no revenue attached (internal tools, experimentation, a free tier) is OpEx. Model training books separately (R&D or capitalized software). Statutory classification ultimately depends on your contracts, facts, and auditor judgment. The stakes are real: park inference in OpEx and you can show a ~80% gross margin on a product whose honest AI-native peers averaged near 52% in 2026 (ICONIQ).

The decision rule

If the inference is required to deliver a paid customer output → model it as COGS / cost of revenue.
If it supports internal work, experimentation, or a free tier with no enforceable revenue contract → normally outside cost of revenue (often sales/marketing or product OpEx).
If it is training or fine-tuning → treat it separately from inference (R&D or capitalized software).

Most founders treat "where does inference go on the P&L?" as a bookkeeping detail to hand off. It isn't. The line you choose decides what your gross margin says — and gross margin is the number an investor uses to judge whether your revenue is worth funding. Put the cost that scales with usage in the wrong place and your dashboard stays green right up until the margin gives way.

This is a practical decision guide for three readers at once: the CFO or finance lead making the call, the accountant or controller who has to defend it, and the founder who has to explain the resulting margin to an investor. Each section gives you the thesis, the accounting standard behind it, how the Big Four read it, what the research says, what it means for your raise, and how to model it.

Who this is for: AI-SaaS founders and operators (seed through Series A), the finance leads building the investor model, and the accountants closing the books — anyone whose product carries real inference, GPU, vector-DB, or human-in-the-loop cost. If you want the number before the theory, the free AI Gross Margin Calculator restates your margin with inference in the right place.

Decision framework, not accounting advice. The standards references below are real and cited, but classification depends on your contracts, controls, and facts. Treat this as an editorial framework and validate it with your own auditor before you file.

The AI Inference COGS Test

Almost every case resolves with three questions.

The AI Inference COGS Test

Is the inference call triggered by a paying customer?
Is it required to deliver the contracted output?
Does its cost vary with usage?

Three "yes" answers → cost of revenue (COGS). Any "no" pushes the cost toward OpEx, R&D, or capitalized software, depending on its function.

One honest nuance — and it is also the point: no line in the accounting codification says "inference is COGS." The standards give you a principle (costs incurred to fulfill a contract, matched to the revenue they produce) and a category (cost of sales). Applying that principle to a brand-new cost type is the CFO's job — which is exactly why companies get it wrong, and why "everyone else buries it in hosting" isn't a defense.

The AI Inference COGS Test, applied across every AI cost type.

The decision table

The same logic in full. Where US GAAP and IFRS diverge, both are noted. Copy it into your model.

Cost type	Normal treatment	Governing standard	Effect on GM
Customer-facing production inference	COGS	ASC 340-40 / 705; IFRS 15 §95	Compresses
Internal-tool inference (no revenue)	OpEx	ASC 350-40	None
Human-in-the-loop — delivery-tied (per paid output)	COGS	ASC 340-40	Compresses
Human-in-the-loop — data labeling for training	OpEx (R&D)	ASC 730-10	None
Eval / observability tied to live delivery	COGS (else OpEx)	ASC 340-40 / 730-10	If delivery-tied
Initial model training	Capitalize or expense	ASC 350-40 / 985-20 / 730; IAS 38 (IFRS)	Below the line
Fine-tuning / retraining	OpEx / R&D (periodic)	ASC 730-10 / 350-40	Below the line
Vector DB / RAG — serving retrieval	COGS	ASC 705 / IFRS 15	Compresses
Vector DB / RAG — building the data asset	Capitalize	ASC 350-40	Below the line
Hosting — customer-facing	COGS	ASC 705 / IFRS 15	Compresses

Thesis 1 — Customer-facing inference is cost of revenue

If a model call is required to deliver the result a customer pays for, the clean management-accounting treatment is to model its cost as cost of revenue. Calling it "infrastructure" or "R&D" flatters the margin, but it isn't what the cost behavior supports.

The standard Inference isn't named in the codification, so the treatment follows principle. Under ASC 340-40, costs that relate directly to a customer contract and are incurred to fulfill it fall within the contract-cost framework. Where customer-facing inference is consumed immediately to deliver the paid output and creates no separately recoverable asset, the management-reporting treatment is to expense it through cost of revenue (ASC 705), matched to the revenue it produces under ASC 606. IFRS 15 §§95–104 frames "costs to fulfil a contract" the same way. The standards don't name inference; the principle places it in cost of revenue.

The auditors Public Big Four guidance on AI and software costs generally points back to existing software, contract-cost, and revenue-recognition principles rather than creating a special AI exception. KPMG's Software data costs Hot Topic (Feb 2026) applies the existing rules to AI software "without any specific exceptions"; reading that 2024–2026 practice, SFAI Labs concludes that where inference delivers the paid product, it is cost of revenue.

The research Academic work shows why this is the structurally correct home: in The Economics of AI Inference (arXiv 2025), inference sits on a "production frontier" with a genuine, recurring marginal cost per call — it behaves like a variable cost of production, not a fixed overhead. A cost that re-incurs on every unit of delivery is, by nature, cost of revenue.

For the founder & the investor conversation Inference in COGS produces an honest view of the AI layer — what a sophisticated investor underwrites, not the blended company number. Bessemer's State of AI 2025 split AI companies into "Supernovas" at roughly 25% gross margins on unoptimized infrastructure and "Shooting Stars" at roughly 60% after disciplined engineering — same market, same year. Showing inference correctly placed signals you understand your own economics.

⚠ Myth to retire

"Inference is 23% of revenue." It isn't. ICONIQ's figure is 23% of total AI product cost (per Vista Equity's attribution) — a different denominator and a much smaller share of revenue. Quoting it correctly is an easy way to look sharper than the room.

How to model it Make inference its own variable cost-of-revenue line, driven by usage units, not a fixed monthly number lost in "hosting." For the full margin math and benchmark bands, see the companion AI Layer Gross Margin guide — this guide is about where the cost books; that one is about what the resulting margin should be.

Thesis 2 — The OpEx exception: inference with no revenue attached

The half of this question most pages skip: not all inference is COGS. Inference that isn't in a revenue-attribution chain is OpEx — and classifying it that way is correct, not a loophole.

The standard A general-purpose internal AI tool — employees drafting documents, an internal copilot, a non-billed assistant — has no performance obligation and no contract behind it. Its compute is operating expense, treated as a cloud service under ASC 350-40 (no asset, no cost of sales). The classification only switches on once the inference call is attached to something a customer pays for. Free-tier inference with no enforceable revenue contract is normally modeled outside cost of revenue — often as sales/marketing or product OpEx — unless your accounting policy treats free usage as part of a contract-specific fulfillment obligation.

The auditors Deloitte's technology guidance draws the same internal-use-versus-hosting boundary, and KPMG's Software and website costs Handbook (Feb 2026) walks the internal-use classification in detail. The dividing question is always: is this compute delivering a paid obligation, or running the business?

The research The academic LCOAI metric (a "levelized cost of AI," modeled on levelized cost of energy) is useful here because it forces you to separate deployment contexts — the per-unit cost of a customer-serving deployment is a different number from internal experimentation, and conflating them misstates the margin in both directions.

For the founder & the investor conversation Discipline cuts both ways: don't inflate COGS with internal tooling either, because it makes the AI layer look worse than it is and an investor will catch the inconsistency. A clean split — customer-facing inference in COGS, internal inference in OpEx — is what makes the whole model auditable, and therefore credible.

How to model it Tag inference in two streams from day one: customer-facing (COGS, scales with paid usage) and internal/free-tier (OpEx, scales with headcount and experimentation). A free tier with no revenue is a customer-acquisition cost, not cost of revenue — model it as such.

Thesis 3 — Training is not inference, and it books separately

The most common margin distortion isn't hiding inference in OpEx — it's smearing the two AI cost types together. Training is the one-time-and-periodic build; inference is the recurring delivery. They sit in different places, and blurring them flatters or wrecks the margin depending on which way you blur.

The standard Initial training and fine-tuning are development activities, not delivery. Under US GAAP they are generally expensed as research and development as incurred (ASC 730-10-25-2); for software to be sold, ASC 985-20-25-1 expenses costs until technological feasibility, then capitalizes; for internal-use or hosted software, ASC 350-40 capitalizes qualifying development costs. IFRS diverges: IAS 38.54–57 permits capitalizing development-phase costs when criteria are met, where US GAAP would expense the R&D — a real difference international founders must flag. And the new FASB ASU 2025-06 (No. 2025-06, September 2025) modernizes ASC 350-40: it removes the old "project stages," introduces a "probable-to-complete" threshold for when capitalization begins, and is effective for annual periods beginning after December 15, 2027 (early adoption permitted). The Board expects it to reduce how much companies capitalize — so a model built on the old stage-gates is worth revisiting.

The auditors Deloitte's Accounting for the Development of Generative AI Software Products (Oct 2024) routes fine-tuning, adversarial training, and data-acquisition costs through ASC 985-20 or ASC 350-40, and expenses pre-feasibility data work under ASC 730-10. EY's To the Point on ASU 2025-06 (Sept 19, 2025) confirms the project-stage removal and the December 2027 effective date; KPMG's Hot Topic addresses training and retraining data costs specifically.

The research The scale point that surprises founders: training is the minority of lifetime cost. Some infrastructure analyses (S&P Global / 451 Research) estimate inference can account for the large majority of an AI system's lifetime cost, with training a minority — and Menlo Ventures found only 9% of production models are fine-tuned at all. The recurring delivery cost — inference — is what defines your margin, not the training run.

For the founder & the investor conversation Don't bury recurring inference inside "R&D" to lift gross margin. An investor or auditor who sees a suspiciously high AI-product margin will ask where the per-call cost went, and "we classified it as research" is the answer that invites deeper diligence. The honest framing is the stronger one: training is capex-like — plan it, capitalize or expense it once, report it below the gross-margin line; inference is the recurring cost of revenue.

How to model it Three separate lines: initial training (capitalized or R&D, below gross margin), retraining/fine-tuning (periodic, scheduled — not continuous), and inference (variable COGS). Don't let a quarterly retraining cost contaminate your per-unit delivery economics.

Thesis 4 — "Inference will get cheaper" doesn't move it out of COGS

The most common objection a founder hears — sometimes from their own board — is that inference prices are collapsing, so the cost doesn't really matter and shouldn't weigh on the margin. It's the argument used to justify parking inference in OpEx and waiting for the problem to disappear. It's weak on two counts.

The standard Classification follows function, not a price forecast. As long as the cost is incurred to deliver the paid product, it is a cost to fulfill the contract — whether the unit price is rising or falling doesn't change where it books.

The auditors Cost behavior drives the line: a variable, usage-tied cost is cost of revenue regardless of its trend. No Big Four guidance lets you reclassify a delivery cost to OpEx because you expect it to shrink.

The research And it isn't shrinking as fast as the optimists claim. The Price of Progress (arXiv 2026) finds the price for a fixed level of model performance is falling closer to ~10× per year than the 1,000× sometimes implied — real, but gradual and uneven. The frontier pushes the other way: Epoch AI, summarizing Toby Ord, argues that reinforcement-learning and test-time "reasoning" scaling primarily increase inference cost, creating a "persistent economic burden." Cheaper tokens, more tokens per task — the net is not a free lunch.

For the founder & the investor conversation You'll meet these objections often: investors like Altimeter's Jamin Ball argue the marginal cost of creation is trending toward zero, and a16z's Martin Casado points to historical precedent where compute became almost free. The honest CFO answer isn't doom — it's that unit price does fall, but that doesn't fix the margin, because variability and heavy-user concentration compress it faster than price recovers it, and you don't get to assume a price cut you can't control. You can even concede the nuance — analyst Martin Alderson has shown input processing is far cheaper than generation — without giving up the line.

How to model it Never model a price decline as a guarantee. Apply a Variance Buffer (≈1.2× on variable AI cost as a default) to stress for token-price swings and retry storms, and a Heavy-User Multiplier so a handful of power users don't quietly eat the margin of the many. A model that assumes inference only gets cheaper has never met a bad agent loop.

Worked example: from the financial-model template

Take the template's demo company (figures illustrative). It reports a blended gross margin of 84.3% — the number that makes a deck look like classic SaaS. Apply the test above: pull customer-facing inference and delivery-tied human-in-the-loop into cost of revenue, and the isolated AI layer margin lands at 40.6%. The 84.3% wasn't wrong as a blend — it was averaging a healthy software layer over a thin AI layer and reporting the blend as if it were the whole story.

One company, three views. Only the blended number flatters; the AI layer is where growth actually compounds (figures illustrative, from the model template).

Stress it further. Hold pricing flat and let the LLM cost rise 50% — a realistic heavy-usage or model-switch scenario — and the AI layer margin falls toward an illustrative ~12%, dragging a once-comfortable LTV:CAC of 4.16 down toward roughly 2:1. None of that is visible in the blended 84.3%. That is why the classification isn't trivia: the line you pick determines whether you see the cliff while you can still steer.

84.3%

Blended GM — the number that looks like SaaS

40.6%

AI Layer GM once inference sits in COGS

~12%

AI Layer under a +50% LLM-cost stress

What to do this week

In the models I pressure-test, the first red flag is the same one almost every time: an AI product reporting an 80%-plus gross margin with no separate inference line on the P&L. The cost didn't vanish — it was filed somewhere that flatters the margin.

Don't wait for diligence to do this for you. Pull every AI cost off the P&L and re-sort it with one question: is this consumed to deliver a paid output, or to build the product and run the business? Move customer-facing inference and delivery-tied human-in-the-loop into cost of revenue. Leave internal tooling and training where they belong. Then recompute the margin on the AI layer alone, and decide which lever — routing, caching, usage caps, pricing — to pull before the next board meeting.

See your real AI gross margin in minutes

The free AI Gross Margin Calculator restates your margin with inference in cost of revenue and shows exactly which line is doing the damage.

Open the AI Gross Margin Calculator

FAQ

Is AI inference cost COGS or OpEx?

Customer-facing inference should normally be modeled as COGS / cost of revenue: it is incurred to deliver the paid product and varies with usage, which fits the costs-to-fulfill-a-contract principle in ASC 340-40 and IFRS 15. Inference with no revenue attached — an internal employee tool, for example — is OpEx, because there is no performance obligation behind it. Statutory presentation can vary with your facts and auditor judgment.

Is model training a COGS cost?

Normally no. Training is a development cost, not a delivery cost. Under US GAAP it is generally expensed as R&D (ASC 730) or capitalized as software (ASC 350-40 for internal-use/hosted, ASC 985-20 for software sold); IFRS may capitalize development costs under IAS 38. Either way it sits below the gross-margin line, unlike inference, which is the recurring cost of revenue.

Is AI inference cost part of gross margin?

Yes — when customer-facing inference is in cost of revenue, it directly reduces gross margin. That is the whole reason the classification matters: moving inference to OpEx lifts reported gross margin without changing the economics, which is how an AI product can look like an 80% SaaS business while the layer that scales with usage runs far lower.

What's the difference between AI inference and model training in accounting?

Inference is the recurring cost of delivering output to a paying customer, so it's normally cost of revenue. Training (and fine-tuning) is the cost of building or improving the model — a development activity that is expensed as R&D or capitalized as software, and reported below the gross-margin line. One is delivery; the other is build.

Does inference belong in COGS under GAAP and IFRS?

No standard names "inference." The treatment follows the costs-to-fulfill-a-contract principle in ASC 340-40 and IFRS 15 §§95–104, plus cost-of-sales guidance in ASC 705. Applied to customer-facing inference, that principle places it in cost of revenue. Public Big Four guidance generally points back to these existing principles rather than creating a special AI exception — subject to your contracts and auditor judgment.

How do I record inference cost in my books?

Split inference into two streams. Customer-facing inference is a variable cost-of-revenue line, driven by usage units and matched to the revenue it serves. Internal or free-tier inference is OpEx. Keep training and retraining on separate lines (R&D or capitalized), and don't net them against inference — the delivery cost should stay visible in cost of revenue.

Source notes

Standards: FASB ASC 340-40 (costs to fulfill a contract), ASC 606, ASC 705, ASC 730-10, ASC 350-40, ASC 985-20; FASB ASU 2025-06 (Sept 2025, effective for annual periods beginning after Dec 15, 2027); IFRS 15 §§95–104; IAS 38.54–57. Standard text is copyrighted (FASB / IFRS Foundation) — paragraph references are given rather than reproduced text.
Auditor guidance: Deloitte, Accounting for the Development of Generative AI Software Products (Oct 2024); EY, To the Point — FASB modernizes internal-use software (Sept 19, 2025); KPMG, Software data costs Hot Topic and Software and website costs Handbook (Feb 2026); PwC, Software costs guide (Nov 2025).
Benchmarks: ICONIQ State of AI 2026 (AI-native gross margin ~52% in 2026, from 41% in 2024); Bessemer State of AI 2025 (~25% vs ~60% cohorts); Vista Equity (inference = 23% of AI product cost); S&P Global / 451 Research (inference a large majority of lifetime cost, estimate); Menlo Ventures (9% of production models fine-tuned).
Academic: The Economics of AI Inference (arXiv:2510.26136); The Price of Progress (arXiv:2511.23455); Epoch AI / Toby Ord on the persistence of inference cost.
Practitioner judgment: the AI Inference COGS Test, the Variance Buffer (~1.2×) and the Heavy-User Multiplier are the author's frameworks, presented as such.

This is an editorial decision framework, not accounting or financial advice — validate against your own facts with your auditor. Published June 2026.

About the author

Dmitry Perelygin is a fractional CFO based in Piedmont, Italy. ACMA / CGMA, MBA Manchester, twenty-five years inside the financial machinery of IT and SaaS companies — from listed groups to seed-stage AI startups. The AI Inference COGS Test and the layered-margin method in this guide are the same tools he uses to pressure-test AI-SaaS unit economics with advisory clients. Full author profile and credentials: About Dmitry →

What to do next

Reading isn’t doing. Three options, in ascending order of investment:

Open the free AI Gross Margin Calculator. Eight fields, sixty seconds — get your AI-layer margin from your own assumptions.
Read the full method — How to Design an AI SaaS That Survives. 14 chapters, every cited benchmark, the complete bibliography in one volume.
Get the full AI SaaS financial model template. Seventeen sheets, the Helix AI demo, scenarios, cap table to exit, and a 1-page Investor Summary — inference modeled as cost of revenue from the start. View the bundle on Gumroad.

Should AI Inference Cost Be COGS or OpEx? A CFO’s Decision Guide

The AI Inference COGS Test

The decision table

Thesis 1 — Customer-facing inference is cost of revenue

Thesis 2 — The OpEx exception: inference with no revenue attached

Thesis 3 — Training is not inference, and it books separately

Thesis 4 — "Inference will get cheaper" doesn't move it out of COGS

Worked example: from the financial-model template

What to do this week

See your real AI gross margin in minutes

FAQ

About the author

What to do next

Read next

AI Layer Gross Margin: Why Blended AI SaaS Margin Hides a 27% Reality

Why Did Stability AI Fail? A CFO Autopsy of the $99M-to-$11M Runway Cliff