AITOT
Blog

GPU Cloud Pricing 2026: AWS vs RunPod vs Vast.ai

An honest 2026 comparison of GPU rental prices across AWS, GCP, Azure, RunPod, Vast.ai, Lambda Labs, and more — H100, A100, and B200 hourly rates.

6 min read· By AITOT Editorial

GPU cloud pricing in 2026 spans a 10× range for identical hardware — an NVIDIA H100 rents for $1.49/hour on Hyperbolic and $12.29/hour on AWS, same GPU, same generation. The difference is reliability, networking, ecosystem, and how willing you are to handle the rough edges. This guide compares 12 providers across the GPU lineup that matters in 2026 (H100, H200, A100, B200, L40S, RTX 4090) so you can pick the right vendor for your workload.

For real-time math on monthly cost, including optional electricity, use our GPU Pricing Calculator. For tokens/sec and dollar-per-million-tokens at each provider, see the Inference Benchmark.

Which GPU should you actually rent in 2026?

A quick decision tree based on workload type:

  • LLM inference (70B class) — H100 SXM is the sweet spot. Move to B200 if your throughput requirement exceeds 150 tokens/sec/user.
  • LLM inference (405B class) — B200 ×8 is the new floor; H100 ×8 still works but takes ~50% longer.
  • Fine-tuning (LoRA on 7B-70B) — A100-80GB or H100-PCIe; PCIe is fine because LoRA isn't NVLink-bound.
  • Full pre-training — H100 SXM5 with NVLink, minimum 8-GPU node. Skip A100 unless budget is brutal.
  • Embedding generation or batch inference — L40S or even RTX 4090 if you don't need >24GB VRAM.
  • Experimentation — RTX A6000 (48GB) on Vast.ai under $1/hour, or RTX 4090 if 24GB is enough.

The most common mistake teams make is renting H100 SXM5 when they actually need H100 PCIe. The PCIe version is 35% cheaper at most providers and identical for any workload that fits on a single GPU.

What does an H100 actually cost across providers?

H100 SXM5 80GB hourly rates as of May 2026, sorted cheapest first:

ProviderOn-demandSpot / communityNotes
Hyperbolic$1.49Spot-style; community reliability
Vast.ai$2.40$1.8024-hour median; community
RunPod (Community)$2.39$1.65Cheapest with decent uptime
RunPod (Secure)$2.99$1.99Datacenter-grade
Lambda Labs$2.99Reserved pricing improves further
CoreWeave$3.30Enterprise; contract usually required
Paperspace$5.95Friendly UI; consumer-grade pricing
GCP A3 (us-central1)$11.06$5.50Per-GPU from A3 8-GPU node
AWS p5 (us-east-1)$12.29$6.40Per-GPU from p5.48xlarge
Azure ND-H100-v5$12.96$6.80Per-GPU

The on-demand price spread is 8.7×. The spot price spread is 4×. Which one you pick depends on how much your runtime values the cloud's networking, IAM, and existing data residency.

A real-world rule of thumb: if your training run is inside an existing VPC with proprietary data, the AWS/GCP/Azure tax is worth paying. If you're doing research, distillation, fine-tuning, or inference for a startup, hyperscaler GPU rates are 4-8× overpriced for what you get.

What about B200, the new flagship?

The Blackwell B200 (192GB HBM3e, 1,000W TDP) shipped to clouds in late 2025. By May 2026 reliable supply has reached:

  • RunPod (Secure): $6.39/hour on-demand — cheapest production-grade
  • Crusoe Cloud: $5.50/hour on-demand
  • Lambda Labs: $6.95/hour reserved
  • AWS (p6e instances): $18-21/hour per GPU — limited regions
  • GCP A3 Ultra: $13.40/hour on-demand

For inference serving, B200 delivers ~165 tokens/sec on Llama 4 70B at batch=1 versus ~85 tokens/sec on H100 SXM. Combined with ~1.6× the cost, B200 wins for sustained inference. For one-off experimentation, H100 is still cheaper to spin up.

If you have access to the new GB200 NVL72 rack (72 B200s with NVLink switch), inference throughput scales sub-linearly past 8 GPUs — but rack-scale rentals are still gated behind enterprise contracts in 2026.

How much can you save with spot or community GPUs?

Spot saves 30-70% in exchange for eviction risk. The risk varies by provider:

TierEviction frequencyBest for
AWS Spot, GCP PreemptibleMedian 1-3 days uptimeLong-running training with checkpointing
Azure Low PrioritySimilar to AWSSame
RunPod CommunityHours to daysInference experiments, batch jobs
Vast.ai communityMinutes to hours, highly variableResearch only

A safe pattern is mixed-tier deployment: keep on-demand capacity for the baseline serving rate, and burst onto spot for traffic peaks. Tools like SkyPilot, Kueue, and dstack make this practical.

For training runs, modern frameworks (PyTorch Lightning, DeepSpeed, Hugging Face Accelerate) checkpoint every N steps. With 5-minute checkpointing on a 24-hour training run, a single eviction costs you 5 minutes — a $2 loss to save 50% on a $200 run. Spot wins decisively.

What hidden costs should you watch for?

Headline GPU prices exclude these line items that frequently double the real bill:

  • Egress bandwidth. AWS charges $0.09/GB egress. For inference apps streaming long outputs to thousands of users, egress can rival GPU cost.
  • Storage. EBS, GCP Persistent Disk, and Azure Managed Disks bill separately. Plan $50-200/month for a 1TB attached volume.
  • Networking between regions. Cross-region transfer is $0.02-0.10/GB and adds up quickly for distributed training.
  • Snapshots / images. Custom AMIs and snapshots are billed at storage tier.
  • Idle instances. The most expensive GPU is the one running with no traffic. Use auto-shutdown and queue-based serving.
  • Reserved instance lock-in. 1-year and 3-year commitments save 30-60% but stranded capacity costs more than headline retail.

For a complete breakdown including optional electricity cost (TDP × PUE × hours × your rate), see the GPU Pricing Calculator.

When should you self-host versus rent?

The crossover point in 2026:

  • Renting wins below ~4,000 GPU-hours per month per GPU type (~5.5 GPUs running 24/7). Below this, the operational overhead of running your own datacenter racks isn't worth it.
  • Co-location wins between 4,000-15,000 GPU-hours. Lease space in an existing datacenter, buy the GPUs outright (~$30k/H100), and pay $0.10/kWh for power + $200/U/month for space.
  • Owning wins above 15,000 GPU-hours per month per type. You amortize the GPU cost over 2-3 years and pay marginal cost for power.

That's roughly: small startup → rent (probably RunPod or Lambda). Mid-scale AI infra team → mix of reserved cloud + co-lo. Hyperscale (>50 GPUs) → either AWS/GCP enterprise contract or own DC.

The hidden factor most teams under-budget is operations: GPU drivers, CUDA versions, firmware updates, power/cooling alarms, hardware RMA. A 32-GPU cluster needs at least 0.5 FTE of platform engineering even in a managed colo.

Putting it all together

Plug your hours/day, GPU type, and pricing tier into the GPU Pricing Calculator to see a sortable monthly cost across 12 providers. If you're also paying for inference at scale, cross-reference with the Inference Benchmark — sometimes a pricier-per-hour provider wins on dollar-per-million-tokens because their throughput is higher. And for agentic workloads where compute is just one line item, the Agent Dev Cost Calculator breaks out compute alongside orchestration and observability.

We re-verify every price in this article against the provider's official page on the first of every month. Last verified: May 1, 2026.