Research Note No. 07 Q1 · 2026 21 pp · Internal+

Commoditization of the base layer, Q1 2026.

Over the last six quarters, frontier-equivalent inference has fallen ≈ 94% on a dollars-per-megatoken basis. Six providers now ship within a narrow capability band. We argue the consequence is a relocation of margin from weights to runtime — and we sketch where it is accruing.

Inference · blended
$0.42/M tok
▼ 94% vs Q3 2024
Frontier-equiv providers
6labs
+ 4 in 18 mo
Capability spread · top 6
3.1pp
on internal eval suite
§ 01 · Summary

What we believe, in a paragraph.

The base layer — pretrained frontier-grade models accessible by API — is commoditizing on every dimension that matters to an operator: price, latency, availability, and capability spread. This is not a claim about quality ceilings; labs still differentiate at the frontier. It is a claim about the median call, which is where economics lives. For the workloads that ninety percent of agent products actually run, choice of model is rapidly becoming a procurement decision, not a strategic one.

Key finding

A product shipping today can assume three or more providers within ±5% capability on its use case. Portability is no longer aspirational; it is table stakes. The strategic question is where the harness accrues value when the model does not.

§ 02 · Prices

Price collapse is compounding, not linear.

Blended input+output cost for frontier-equivalent inference has dropped from $7.10 / M tokens in Q3 2024 to $0.42 / M tokens in Q1 2026 — a 94% decline over seven quarters. The decline is not uniform across providers, but the floor has moved everywhere.

Fig. 1 · Blended inference price · frontier-equivalent tier

Dollars per million tokens, median across top-6 providers. Source: Priors tracking, public pricing + rate-limited probes.
$7.00 $5.25 $3.50 $1.75 $0 Q3'24 Q4'24 Q1'25 Q2'25 Q3'25 Q4'25 Q1'26 $0.42

Two notes on the curve. First, the cliff from Q2'25 forward coincides with the third-cohort open-weight releases; margin compression at the low end pulled paid APIs down with it. Second, the Q1'26 point is a blended median — the cheapest frontier-equivalent provider is meaningfully below it.

§ 03 · Providers

The field, ranked on four axes.

Table 1 · Frontier-equivalent providers, Q1 2026 · Priors internal scoring
Provider Model · tier Eval (Priors) Latency p50 $/M tok Notes
Atlas Labs Atlas-4 · reasoning 87.2 680 ms $0.31 Best price-to-quality on long-context.
Kite Kite-Ultra · 09 86.8 540 ms $0.44 Fastest at median. Tool-call reliability leads.
Meridian Mer-3 · pro 86.1 710 ms $0.52 Strong on code, weaker on retrieval-heavy tasks.
Nth Research Nth-L2 85.4 820 ms $0.28 Cheapest at tier. Open-weight fork available.
Cardinal C-Prime · 26 84.9 620 ms $0.48 Best structured-output adherence in class.
Orbit Orbit-Max 84.1 900 ms $0.36 Enterprise distribution; slower but sticky.

Eval scores use the Priors internal suite (v3.2), which weights long-horizon tool use, structured output fidelity, and retrieval fidelity above contest-style reasoning. The spread between provider 1 and provider 6 is 3.1 points. For comparison, the equivalent spread in Q3 2024 was 14.6.

§ 04 · Convergence

Capabilities are converging faster than interfaces.

Below: the same six providers scored on four operator-visible axes. The bars show how tightly the field is clustering. A year ago this chart had daylight between providers; today, on most axes, it does not.

Reasoning (eval)
Δ 3.1 pp
Tool-call fidelity
Δ 5.4 pp
Structured output
Δ 4.2 pp
Latency (p50 inv.)
Δ 12.7 pp
Price (inv.)
Δ 24.1 pp

The interesting story is the two unresolved axes: latency and price. Providers still differentiate meaningfully here, which is why procurement teams are reopening multi-vendor contracts. But the axes that determine whether a product can be built — reasoning, tools, structured output — are now essentially solved problems at the base layer.

§ 05 · Implications

Five consequences we're underwriting to.

  1. Model-portability becomes a product feature.

    Customers are asking for it in diligence calls. Startups that cannot swap providers with a config change lose procurement deals to ones that can.

  2. Margin relocates to the harness.

    Context, memory, tool-surface, and evaluation are where differentiation is now built — and where the next generation of platform companies will sit.

  3. Evaluation is the new observability.

    Shipping without evals is shipping on faith. The category is about two years behind where DataDog was in 2014; the opportunity is correspondingly large.

  4. Vertical runtimes beat horizontal ones, early.

    Taste compounds at the domain. A legal-agent runtime built by lawyers-turned-engineers will beat a horizontal platform for at least the next four years.

  5. Open-weight pressure sets the floor, not the ceiling.

    The frontier remains a lab story. But the median call — and therefore the unit economics of the products above — is governed by what Nth or its successors ship next.

§ 06 · Watchlist

Companies we are tracking, not yet priced.

contextd

Drop-in retrieval runtime for agents with reversible writes. Spun out of a search lab. Watching for first enterprise design partners.

Stage · Seed · SF

Evalry

Eval-as-code for agent pipelines. Open-source core with managed cloud. Adoption curve looks like early Datadog.

Stage · Seed+ · NYC

Harness.law

Vertical runtime for legal research & drafting agents. Founded by ex-partners at a litigation firm. Distribution is real.

Stage · Series A · LON

We expect to underwrite two of the three within the year. We will publish a follow-up note on vertical runtimes in Q2.

§ 07 · Methodology & sources

Pricing data aggregated from published provider sheets and rate-limited API probes against identical prompt batches (n=12,400 per provider per quarter). Evaluation suite v3.2 combines internal tool-use tasks (n=240), structured-output fidelity (n=180), and long-context retrieval (n=120). Latency measured from us-east-1 and us-west-2 endpoints, p50 of 1000-token completions over 14 consecutive days.

Scores are model-level, not provider-level — a provider's best-in-class model is used for the table. Internal+ distribution: LPs and active portfolio companies.

Disclosure · Priors is invested in two of the companies referenced on the Watchlist. No investment in any model-provider above.