AI Token Cost Calculator
Estimate API costs from token count, model, and usage volume.
Runs in your browser. Your data stays on your device.
Input
Result
$315.00
Monthly AI cost
$3.15
Per user / month
45.0M
Tokens processed / month
$3,780.00
Projected annual spend

Model routing saves 70–90%
Route simple tasks (classification, extraction) to small models like Haiku or GPT-5 Nano. Reserve frontier models for complex reasoning. A routing layer that sends 90% of requests to a cheap model can cut your monthly API bill by an order of magnitude.

Prompt caching
If your system prompt or context documents are repeated across requests, prompt caching reduces input costs by up to 90%. Both Anthropic and OpenAI support caching — it pays for itself after a single cached read.
What are tokens?
Tokens are the units of text that language models process. A token is roughly three-quarters of a word in English — about 4 characters. A full page of text is approximately 750 tokens. Every API call consumes input tokens (your prompt, system instructions, and context) and output tokens (the model's response). Both are metered separately, and output tokens are significantly more expensive than input tokens.
How LLM API pricing works
All major LLM providers (OpenAI, Anthropic, Google, xAI, DeepSeek) use pay-per-token pricing. Costs are quoted per million tokens. Input tokens typically cost 3x to 8x less than output tokens — for example, Claude Sonnet 4.6 charges $3 per million input tokens but $15 per million output tokens. This asymmetry means that optimizing response length is often the most effective cost lever.
Input vs. output tokens
Input tokens include everything you send to the model: the system prompt, conversation history, retrieved context (in RAG setups), and the user's message. Output tokens are the model's generated response. In production applications, input tokens often outnumber output tokens by 3:1 or more — but output tokens dominate the cost because of their higher per-token price. A 1,000-token prompt with a 500-token response on Claude Sonnet 4.6 costs $0.003 (input) + $0.0075 (output) = $0.0105.
Choosing the right model
Not every task requires a frontier model. Simple classification, extraction, or summarization tasks perform well on smaller models like Claude Haiku 4.5 ($1/$5 per MTok) or GPT-5 Nano ($0.05/$0.40). Complex reasoning, code generation, and multi-step analysis benefit from mid-tier models like Claude Sonnet 4.6 or GPT-5.2. Reserve flagship models (Claude Opus 4.6, GPT-5.2 Pro) for tasks where quality directly impacts business outcomes. A model routing strategy — where requests are classified and sent to the cheapest capable model — can reduce API costs by 70-90%.
Cost optimization strategies
Prompt caching reduces input costs by up to 90% for repeated context (system prompts, reference documents). Batch processing offers 50% discounts for non-urgent workloads. Shorter, more precise prompts reduce both input and output tokens. Setting a max_tokens limit on responses prevents unexpectedly long outputs. These optimizations compound — caching plus batching plus prompt optimization can reduce effective costs by 80% or more compared to naive implementation.
When to use this calculator
Use this calculator before integrating an LLM API into your product to estimate infrastructure costs. Use it to compare providers and model tiers for the same workload. Use it to build the AI cost line in your financial projections or pitch deck. The cost-per-user-per-month metric is particularly useful for SaaS founders modeling unit economics — it translates abstract token costs into a figure that fits directly into a P&L statement.