What is the cheapest cloud for renting an H100 GPU in 2026?

Hyperbolic at $1.49/hour and Vast.ai at $1.80/hour offer the lowest H100 SXM rates as of May 2026, both with community-tier reliability. RunPod Community Cloud is $1.99/hour with better uptime. AWS on-demand is $12.29/hour — 8× more expensive.

Is RunPod cheaper than AWS for AI workloads?

Yes. RunPod H100 on-demand at $2.99/hour is 76% cheaper than AWS p5 on-demand at $12.29/hour per GPU. The trade-off is fewer regions, smaller networking bandwidth, and less enterprise tooling — fine for training and batch inference, harder for production HTTP serving.

Should I use GPU spot instances for production?

Only with checkpointing. Spot/preemptible GPUs are 50-75% cheaper but can be reclaimed in 30 seconds to 2 minutes. Safe for training runs that checkpoint every 5 minutes; risky for production HTTP serving unless you front it with a queue.

How does H100 compare to B200 in terms of price-performance?

B200 typically delivers 2-2.5× the throughput of H100 SXM for inference, but rents at only 1.5-2× the price. So B200 is the better deal for steady inference workloads. H100 is still cheaper per GB-of-VRAM-hour for memory-bound training.

What is included in the GPU hourly rate?

The hourly rate covers GPU access plus a baseline CPU, memory, and network. Egress bandwidth, storage above the included tier, and managed Kubernetes are usually billed separately. Always add 10-20% to the headline GPU price for a realistic total.

Do GPU prices change throughout the day?

Spot prices fluctuate by the hour on Vast.ai and AWS. On-demand prices are static for months at a time. Always quote a 7-30 day median for spot, not the instant price you see when you check.

Blog

GPU Cloud Pricing 2026: AWS vs RunPod vs Vast.ai

An honest 2026 comparison of GPU rental prices across AWS, GCP, Azure, RunPod, Vast.ai, Lambda Labs, and more — H100, A100, and B200 hourly rates.

Updated 2026-05-116 min read· By AITOT Editorial

GPU cloud pricing in 2026 spans a 10× range for identical hardware — an NVIDIA H100 rents for $1.49/hour on Hyperbolic and $12.29/hour on AWS, same GPU, same generation. The difference is reliability, networking, ecosystem, and how willing you are to handle the rough edges. This guide compares 12 providers across the GPU lineup that matters in 2026 (H100, H200, A100, B200, L40S, RTX 4090) so you can pick the right vendor for your workload.

For real-time math on monthly cost, including optional electricity, use our GPU Pricing Calculator. For tokens/sec and dollar-per-million-tokens at each provider, see the Inference Benchmark.

Which GPU should you actually rent in 2026?

A quick decision tree based on workload type:

LLM inference (70B class) — H100 SXM is the sweet spot. Move to B200 if your throughput requirement exceeds 150 tokens/sec/user.
LLM inference (405B class) — B200 ×8 is the new floor; H100 ×8 still works but takes ~50% longer.
Fine-tuning (LoRA on 7B-70B) — A100-80GB or H100-PCIe; PCIe is fine because LoRA isn't NVLink-bound.
Full pre-training — H100 SXM5 with NVLink, minimum 8-GPU node. Skip A100 unless budget is brutal.
Embedding generation or batch inference — L40S or even RTX 4090 if you don't need >24GB VRAM.
Experimentation — RTX A6000 (48GB) on Vast.ai under $1/hour, or RTX 4090 if 24GB is enough.

The most common mistake teams make is renting H100 SXM5 when they actually need H100 PCIe. The PCIe version is 35% cheaper at most providers and identical for any workload that fits on a single GPU.

What does an H100 actually cost across providers?

H100 SXM5 80GB hourly rates as of May 2026, sorted cheapest first:

Provider	On-demand	Spot / community	Notes
Hyperbolic	$1.49	—	Spot-style; community reliability
Vast.ai	$2.40	$1.80	24-hour median; community
RunPod (Community)	$2.39	$1.65	Cheapest with decent uptime
RunPod (Secure)	$2.99	$1.99	Datacenter-grade
Lambda Labs	$2.99	—	Reserved pricing improves further
CoreWeave	$3.30	—	Enterprise; contract usually required
Paperspace	$5.95	—	Friendly UI; consumer-grade pricing
GCP A3 (us-central1)	$11.06	$5.50	Per-GPU from A3 8-GPU node
AWS p5 (us-east-1)	$12.29	$6.40	Per-GPU from p5.48xlarge
Azure ND-H100-v5	$12.96	$6.80	Per-GPU

The on-demand price spread is 8.7×. The spot price spread is 4×. Which one you pick depends on how much your runtime values the cloud's networking, IAM, and existing data residency.

A real-world rule of thumb: if your training run is inside an existing VPC with proprietary data, the AWS/GCP/Azure tax is worth paying. If you're doing research, distillation, fine-tuning, or inference for a startup, hyperscaler GPU rates are 4-8× overpriced for what you get.

What about B200, the new flagship?

The Blackwell B200 (192GB HBM3e, 1,000W TDP) shipped to clouds in late 2025. By May 2026 reliable supply has reached:

RunPod (Secure): $6.39/hour on-demand — cheapest production-grade
Crusoe Cloud: $5.50/hour on-demand
Lambda Labs: $6.95/hour reserved
AWS (p6e instances): $18-21/hour per GPU — limited regions
GCP A3 Ultra: $13.40/hour on-demand

For inference serving, B200 delivers ~165 tokens/sec on Llama 4 70B at batch=1 versus ~85 tokens/sec on H100 SXM. Combined with ~1.6× the cost, B200 wins for sustained inference. For one-off experimentation, H100 is still cheaper to spin up.

If you have access to the new GB200 NVL72 rack (72 B200s with NVLink switch), inference throughput scales sub-linearly past 8 GPUs — but rack-scale rentals are still gated behind enterprise contracts in 2026.

How much can you save with spot or community GPUs?

Spot saves 30-70% in exchange for eviction risk. The risk varies by provider:

Tier	Eviction frequency	Best for
AWS Spot, GCP Preemptible	Median 1-3 days uptime	Long-running training with checkpointing
Azure Low Priority	Similar to AWS	Same
RunPod Community	Hours to days	Inference experiments, batch jobs
Vast.ai community	Minutes to hours, highly variable	Research only

A safe pattern is mixed-tier deployment: keep on-demand capacity for the baseline serving rate, and burst onto spot for traffic peaks. Tools like SkyPilot, Kueue, and dstack make this practical.

For training runs, modern frameworks (PyTorch Lightning, DeepSpeed, Hugging Face Accelerate) checkpoint every N steps. With 5-minute checkpointing on a 24-hour training run, a single eviction costs you 5 minutes — a $2 loss to save 50% on a $200 run. Spot wins decisively.

What hidden costs should you watch for?

Headline GPU prices exclude these line items that frequently double the real bill:

Egress bandwidth. AWS charges $0.09/GB egress. For inference apps streaming long outputs to thousands of users, egress can rival GPU cost.
Storage. EBS, GCP Persistent Disk, and Azure Managed Disks bill separately. Plan $50-200/month for a 1TB attached volume.
Networking between regions. Cross-region transfer is $0.02-0.10/GB and adds up quickly for distributed training.
Snapshots / images. Custom AMIs and snapshots are billed at storage tier.
Idle instances. The most expensive GPU is the one running with no traffic. Use auto-shutdown and queue-based serving.
Reserved instance lock-in. 1-year and 3-year commitments save 30-60% but stranded capacity costs more than headline retail.

For a complete breakdown including optional electricity cost (TDP × PUE × hours × your rate), see the GPU Pricing Calculator.

When should you self-host versus rent?

The crossover point in 2026:

Renting wins below ~4,000 GPU-hours per month per GPU type (~5.5 GPUs running 24/7). Below this, the operational overhead of running your own datacenter racks isn't worth it.
Co-location wins between 4,000-15,000 GPU-hours. Lease space in an existing datacenter, buy the GPUs outright (~$30k/H100), and pay $0.10/kWh for power + $200/U/month for space.
Owning wins above 15,000 GPU-hours per month per type. You amortize the GPU cost over 2-3 years and pay marginal cost for power.

That's roughly: small startup → rent (probably RunPod or Lambda). Mid-scale AI infra team → mix of reserved cloud + co-lo. Hyperscale (>50 GPUs) → either AWS/GCP enterprise contract or own DC.

The hidden factor most teams under-budget is operations: GPU drivers, CUDA versions, firmware updates, power/cooling alarms, hardware RMA. A 32-GPU cluster needs at least 0.5 FTE of platform engineering even in a managed colo.

Putting it all together

Plug your hours/day, GPU type, and pricing tier into the GPU Pricing Calculator to see a sortable monthly cost across 12 providers. If you're also paying for inference at scale, cross-reference with the Inference Benchmark — sometimes a pricier-per-hour provider wins on dollar-per-million-tokens because their throughput is higher. And for agentic workloads where compute is just one line item, the Agent Dev Cost Calculator breaks out compute alongside orchestration and observability.

We re-verify every price in this article against the provider's official page on the first of every month. Last verified: May 1, 2026.