Skip to main content
A model host or inference proxy serves AI agents at variable cost: short responses cost cents, long ones cost dollars, and the cost is only known after the work is done. A flat per-call price either overcharges short calls or underprices long ones. With the x402 v2 upto scheme, the buyer authorizes a maximum spend per call; the server settles for the actual amount, capped at that maximum. Each settlement reflects the real tokens generated, compute time, or output size. This is the right primitive for inference billing, and it is supported on Polygon’s mainnet and Amoy facilitators today. Who this is for:
  • Hosted-model providers and inference proxies opening an agent-buyer channel
  • AI platform teams replacing token-bucket subscriptions with usage-based pricing
  • Commercial leaders at AI companies who need per-call billing without overcharging

How it works

upto scheme flow
1AgentAPIPOST /v1/chat/completions
2APIAgent402 with scheme: upto, maxAmountRequired (per-call ceiling)
3AgentFacilitatorAuthorizes max via Permit2 (no funds move yet)
4APIRuns inference, measures actual usage (tokens, time, bytes)
5APIFacilitatorSettles actual amount, ≤ authorized max
6APIAgent200 OK with completion + PAYMENT-RESPONSE receipt
The upto scheme has two phases: at verification time, maxAmountRequired is the ceiling the buyer authorizes; at settlement time, the server passes the actual amount it computed from real usage. Replay protection comes from Permit2 nonces. Authorizations carry validAfter and deadline bounds so unsettled authorizations expire safely. Polygon’s mainnet (x402.polygon.technology) and Amoy (x402-amoy.polygon.technology) facilitators run x402 v2, so upto works today.

Get started

Add x402 v2 middleware with the upto scheme in front of your inference route. The buyer authorizes a maximum spend; your server settles the actual amount after measuring real usage. Polygon’s mainnet and Amoy facilitators run v2.

Install

bun install @x402/express @x402/core @x402/evm express
Full middleware setup, including configuration and the facilitator client, is in the x402 Quickstart for Sellers.

Declare a maximum per call

Return an accepts block with scheme: "upto" and a ceiling. maxAmountRequired is the maximum the buyer authorizes. USDC has six decimals, so 50000 = $0.05.
{
  "x402Version": 2,
  "accepts": [
    {
      "scheme": "upto",
      "network": "eip155:137",
      "maxAmountRequired": "50000",
      "description": "Inference on model-7b-instruct, up to 4096 output tokens"
    }
  ]
}
After running inference, compute the real cost from your measured units and submit the actual amount to the facilitator’s /settle endpoint. The actual amount must be less than or equal to maxAmountRequired.

Test as a buyer

polygon-agent x402-pay --url https://api.example.com/v1/chat/completions \
  --method POST \
  --body '{"model":"model-7b-instruct","messages":[{"role":"user","content":"hi"}]}'

Implementation

x402 Quickstart for Sellers

Add x402 v2 middleware to Express, Next.js, or Hono.

x402 How It Works

Verification vs. settlement phases, accepts shape, receipts.

Using the Polygon facilitator

Point middleware at the v2 facilitator on mainnet or Amoy.

Polygon Agent CLI

Test as a buyer with x402-pay.