How Much Does 1 Million AI Tokens Cost in 2026?
1 million AI tokens costs $0.06 to $75 in 2026 depending on the model and direction. Full pricing breakdown across OpenAI, Claude, Gemini, Llama, and DeepSeek.
One million AI tokens costs between $0.06 and $75 in 2026 depending on the model and direction (input vs output). Output tokens cost 3–5× more than input tokens on most providers because generation is compute-bound. This guide shows exactly what 1M tokens costs across 22 models and helps you calculate your real bill before committing budget. For real-time pricing across every model, use our Token & Pricing Comparator.
The "1 million tokens" unit is the standard pricing reference because it's roughly the unit at which provider bills become non-trivial. A 100k-request/month chatbot at typical sizes burns through 200M+ tokens — multiply by your blended per-million rate and that's your monthly inference bill.
What does 1 million tokens cost across all major models?
The complete 2026 pricing table for output tokens (typically the dominant cost):
| Model | Input / 1M | Output / 1M |
|---|---|---|
| Amazon Nova Lite | $0.06 | $0.24 |
| Mistral Small 3 | $0.20 | $0.60 |
| Google Gemini 2.5 Flash | $0.30 | $2.50 |
| DeepSeek V3 | $0.27 | $1.10 |
| GPT-5 mini | $0.40 | $1.60 |
| Cohere Command R | $0.15 | $0.60 |
| Claude Haiku 4.5 | $0.80 | $4.00 |
| Amazon Nova Pro | $0.80 | $3.20 |
| DeepSeek R1 | $0.55 | $2.19 |
| Llama 4 70B (Together) | $0.88 | $0.88 |
| Mistral Large 2 | $2.00 | $6.00 |
| Cohere Command R+ | $2.50 | $10.00 |
| OpenAI GPT-4o | $2.50 | $10.00 |
| Google Gemini 2.5 Pro | $2.50 | $15.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| Llama 4 405B (Together) | $3.50 | $3.50 |
| xAI Grok 4 | $5.00 | $25.00 |
| OpenAI o3 | $10.00 | $40.00 |
| OpenAI GPT-5 | $10.00 | $30.00 |
| Claude Opus 4.7 | $15.00 | $75.00 |
That's a 1,250× spread between cheapest input (Nova Lite $0.06) and most expensive output (Opus 4.7 $75). The strategy that wins in 2026 is tiered model routing: cheap model for 80% of requests, premium model only when needed.
Why does output cost more than input?
Three reasons output is structurally more expensive than input:
- Sequential generation. Output tokens are produced one at a time — token N depends on tokens 1 to N-1. Each output token requires a full forward pass through the model. Input tokens can be processed in parallel (one pass for the entire prompt).
- Memory bandwidth dominates. At inference time, the bottleneck is reading the model weights from GPU HBM for each output token, not the compute. Output is ~5× more bandwidth-intensive per token.
- GPU utilization patterns. Output generation underutilizes large GPU clusters (small batch = low parallelism). Providers price this opportunity cost.
Practical implication: if your workload is heavy on input (RAG, document analysis), you can use models that have a lower output-to-input price ratio. Claude has a 5:1 ratio on Sonnet; Llama on Together is 1:1 because they don't differentiate.
How big is 1 million tokens in real terms?
Practical sizing for 1 million tokens:
- ~750,000 words in English text (1.33 tokens/word average)
- ~4 average novels of prose (200k words each)
- ~3 million characters of code (more tokens than English due to syntax)
- ~50 hours of transcribed speech at 150 words/minute
- ~600 typical chatbot conversations of 10-turn each (1,700 tokens/conversation)
For a real production calibration:
- A customer-support chatbot: 200k-300k tokens per 1,000 conversations
- A code-completion product (Copilot-style): 100k-500k tokens per active user per day
- A research-agent product (Devin-style): 50k-200k tokens per task
Use these to calibrate your monthly forecast. Multiply daily tokens × 30 × your model's per-million rate.
What does the median app actually pay per month?
Industry survey of 2025-2026 AI startup AWS bills shows the median monthly LLM spend by application category:
| Application | Monthly tokens (median) | Monthly spend (Sonnet 4.6 blended) |
|---|---|---|
| Internal AI tools | 10M | $90 |
| B2B SaaS with AI features | 50M | $450 |
| Customer support chatbot | 150M | $1,350 |
| Coding assistant | 400M | $3,600 |
| Consumer chat product | 2,000M (2B) | $18,000 |
| AI agent platform | 10,000M (10B) | $90,000 |
The 10× spread between consumer chat and B2B SaaS reflects raw user volume difference, not architectural choice. Even an "expensive" model is fine at B2B scale.
How do I calculate my per-million-token cost?
For a chatbot with 2,000 input tokens + 400 output tokens per request, using Claude Sonnet 4.6 at $3 input + $15 output per million:
Per request:
Input: 2000 × $3 / 1M = $0.006
Output: 400 × $15 / 1M = $0.006
Total: $0.012
Per 100,000 requests:
$0.012 × 100k = $1,200
That's effectively $5 per million tokens blended (200M tokens used to generate $1,200 of bill).
Notice the blended rate depends on your input-to-output ratio. A RAG-heavy workload with 95% input and 5% output sees a much cheaper effective rate on the same model. Plug your specific numbers into our Token & Pricing Comparator for real-time math.
How do I reduce per-million-token cost in 2026?
Three highest-leverage moves:
1. Switch models (5-50× reduction possible)
Most workloads run fine on Claude Haiku 4.5 ($0.80 input, $4 output) instead of Claude Sonnet 4.6 ($3 input, $15 output) — a 4× cost cut. Or drop to Gemini 2.5 Flash ($0.30 / $2.50) for another 3× cut. Always run a 100-example eval before switching; many workloads tolerate Haiku-class quality.
2. Prompt caching (40-80% input cost reduction)
For RAG workloads where the same context (system prompt + retrieved documents) gets reused across multiple queries, Anthropic charges only 10% of normal input price on cache hits. OpenAI charges 50%. Google 25%. Real-world cache hit rates are 50-70% steady-state.
3. Batch APIs (50% discount on non-realtime)
OpenAI Batch API charges 50% of normal pricing for jobs that can wait up to 24 hours. Anthropic Batch API similar. Most providers offer some form of batch discount. Use for: nightly summarization, content moderation backfills, embedding generation, evaluation runs.
A typical mature production workload combines all three: tiered routing + caching + batch. Total reduction vs naive: 70-90%. Plug your numbers into the LLM Monthly Cost Estimator to see the projection.
What hidden costs make the per-million rate misleading?
Five line items that aren't in the headline rate:
- Output rate limits. Some providers throttle output tokens/minute. Bursting traffic incurs queuing latency, not cost — but user experience suffers.
- Failed generations. Safety refusals, malformed JSON outputs, mid-stream disconnects. Real-world wastage is 3-8% of token budget.
- Speculative decoding. Some providers charge for speculatively-generated tokens that get rejected. Adds 5-15% to the bill.
- Long context surcharges. Google Vertex charges 2× per token for contexts >128k. Anthropic has no surcharge but TTFT degrades.
- Cross-region transfer. Self-hosted models incur egress fees that the per-token rate doesn't capture.
Budget a 15-20% buffer above your raw per-million-token math. The Agent Dev Cost Calculator bakes this in as the "inference tax" default of 30% — appropriate for agentic workloads.
What is the cheapest path to 1 million tokens in 2026?
If you only care about per-million-token cost (not quality or reliability):
- Amazon Nova Lite at $0.06 input, $0.24 output — 100M tokens for $30 total
- DeepSeek V3 at $0.27 input, $1.10 output — strong reasoning at cheap pricing
- Self-hosted Llama 4 8B on rented H100 — break-even at ~500M tokens/month
- Together Llama 4 8B at $0.22 / $0.22 — open-weight on hosted infra
For the cheapest flagship-class quality, Claude Sonnet 4.6 at $3 / $15 is the sweet spot. GPT-5 at $10 / $30 is premium-priced; rarely worth it over Sonnet unless you specifically need OpenAI ecosystem features.
The 2026 best practice is to track effective cost per resolved task, not per-million-token. A workload that uses 30% fewer tokens on Sonnet (because it gets answers right faster) can beat a workload that uses cheap Haiku tokens but loops 3× on bad outputs. Measure outcomes, not unit cost.
For comprehensive cost modeling across token + GPU + vector DB + everything else, use the Agent Dev Cost Calculator. For just the LLM piece across 22 models, use the Token & Pricing Comparator.