AITOT

Calculator

AI Inference Benchmark & Cost

Benchmark inference speed and cost per million tokens across hardware (H100, A100, consumer GPUs) and models.

Benchmarks refreshed: 2026-05-01

Cheapest

DeepInfra

$69.00/mo

Fastest

SambaNova

580tok/s

HostTokens/secTTFTResponse time$ / 1M outTotal / mo
DeepInfra70410 ms7.55 s$0.60$69.00
SambaNova580110 ms0.97 s$0.60$90.00
Groq320180 ms1.74 s$0.79$98.50
Cerebras450120 ms1.23 s$0.85$107.50
Together92320 ms5.75 s$0.88$132.00
Fireworks110290 ms4.84 s$0.90$135.00
Self-host (H100 SXM ×4, vLLM)

AWS p5 spot reference

85380 ms6.26 s$1.95$292.50
Self-host (B200 ×4)165220 ms3.25 s$2.10$315.00

Numbers are batch=1 streaming-decode (chat UX). Production back-end batches can hit 5–20× higher tokens/sec at the same per-token cost. Cross-check against artificialanalysis.ai for the latest.

Frequently Asked Questions

How accurate are these calculators?+
Pricing is sourced from official provider documentation and refreshed monthly. Real bills can vary by 5–15% due to caching, batching, and region.
Are prices in USD?+
Yes, all prices are quoted in USD per the providers' billing currency.
How often is data updated?+
Pricing tables are reviewed and updated on the first of every month.
Can I trust these for budgeting?+
Use them as estimates. For production budgets, always validate with a 1-week pilot using your real workload.