AI Token Cost Calculator

Estimate API costs from token count, model, and usage volume.

Runs in your browser. Your data stays on your device.

Input

Model

Input tokens per request

tokens

Output tokens per request

tokens

Requests per user / day

req

Active users

Result

Cost per month$315.00

Cost per request$0.0,105

Cost per user / month$3.15

Daily requests1,000

Daily token consumption1.5M

Annual projection$3,780.00

$315.00

Monthly AI cost

$3.15

Per user / month

45.0M

Tokens processed / month

$3,780.00

Projected annual spend

Model routing architecture visualization

Model routing saves 70–90%

Route simple tasks (classification, extraction) to small models like Haiku or GPT-5 Nano. Reserve frontier models for complex reasoning. A routing layer that sends 90% of requests to a cheap model can cut your monthly API bill by an order of magnitude.

Prompt caching

If your system prompt or context documents are repeated across requests, prompt caching reduces input costs by up to 90%. Both Anthropic and OpenAI support caching — it pays for itself after a single cached read.

What are tokens?

Tokens are the units of text that language models process. A token is roughly three-quarters of a word in English — about 4 characters. A full page of text is approximately 750 tokens. Every API call consumes input tokens (your prompt, system instructions, and context) and output tokens (the model's response). Both are metered separately, and output tokens are significantly more expensive than input tokens.

How LLM API pricing works

All major LLM providers (OpenAI, Anthropic, Google, xAI, DeepSeek) use pay-per-token pricing. Costs are quoted per million tokens. Input tokens typically cost 3x to 8x less than output tokens — for example, Claude Sonnet 4.6 charges $3 per million input tokens but $15 per million output tokens. This asymmetry means that optimizing response length is often the most effective cost lever.

Input vs. output tokens

Input tokens include everything you send to the model: the system prompt, conversation history, retrieved context (in RAG setups), and the user's message. Output tokens are the model's generated response. In production applications, input tokens often outnumber output tokens by 3:1 or more — but output tokens dominate the cost because of their higher per-token price. A 1,000-token prompt with a 500-token response on Claude Sonnet 4.6 costs $0.003 (input) + $0.0075 (output) = $0.0105.

Choosing the right model

Not every task requires a frontier model. Simple classification, extraction, or summarization tasks perform well on smaller models like Claude Haiku 4.5 ($1/$5 per MTok) or GPT-5 Nano ($0.05/$0.40). Complex reasoning, code generation, and multi-step analysis benefit from mid-tier models like Claude Sonnet 4.6 or GPT-5.2. Reserve flagship models (Claude Opus 4.6, GPT-5.2 Pro) for tasks where quality directly impacts business outcomes. A model routing strategy — where requests are classified and sent to the cheapest capable model — can reduce API costs by 70-90%.

Cost optimization strategies

Prompt caching reduces input costs by up to 90% for repeated context (system prompts, reference documents). Batch processing offers 50% discounts for non-urgent workloads. Shorter, more precise prompts reduce both input and output tokens. Setting a max_tokens limit on responses prevents unexpectedly long outputs. These optimizations compound — caching plus batching plus prompt optimization can reduce effective costs by 80% or more compared to naive implementation.

When to use this calculator

Use this calculator before integrating an LLM API into your product to estimate infrastructure costs. Use it to compare providers and model tiers for the same workload. Use it to build the AI cost line in your financial projections or pitch deck. The cost-per-user-per-month metric is particularly useful for SaaS founders modeling unit economics — it translates abstract token costs into a figure that fits directly into a P&L statement.