AITOT
Blog

Cheapest Vector Database in 2026: Self-Host vs Managed

The cheapest vector database in 2026 is self-hosted pgvector ($20/month) for small RAG. At scale, Turbopuffer ($0.04/GB) beats every managed alternative.

7 min read· By AITOT Editorial

The cheapest vector database in 2026 depends entirely on scale. For under 5M vectors with low query rate, self-hosted Postgres pgvector on a $20/month VM beats every managed option. For 10M+ vectors with infrequent queries, Turbopuffer at $0.04/GB storage is dramatically cheaper than alternatives. For very high query rate (>1M queries/day), Qdrant's per-node pricing wins. This guide walks through five scale tiers with cost math at each. For real-time comparison across 9 providers with your specific numbers, use our Vector DB Cost Estimator.

What is the cheapest vector database at each scale?

WorkloadVectorsQueries/dayCheapest optionApprox. monthly
Experiment / POC100k1,000Self-hosted pgvector$20 (VM only)
Small RAG1M5,000Supabase pgvector$25
Medium RAG10M50,000Pinecone Serverless OR self-host pgvector$40-100
Large RAG100M200,000Turbopuffer$250-400
Enterprise1B+1M+Turbopuffer or self-host Qdrant cluster$1,000-3,000

Notice the cheapest provider changes at every scale tier. No single vector DB wins at all scales.

Why is pgvector so cheap?

Three reasons pgvector dominates the budget end of the market:

1. Zero managed-service markup

Pinecone, Qdrant Cloud, Weaviate Cloud all charge platform fees on top of raw storage and compute. pgvector runs on any Postgres host — even a $4/month Hetzner VM works for small workloads. No vector-DB-specific markup.

2. Mature HNSW indexing

pgvector v0.8 (released late 2024) added efficient HNSW indexing with native PostgreSQL planner integration. Recall and speed match dedicated vector DBs within 10-20% for typical RAG workloads. The historical "pgvector is slow" reputation no longer applies.

3. Operational reuse

If you already have Postgres in your stack (most products do), pgvector is zero additional ops overhead. Backups, monitoring, security patching, IAM — all the existing Postgres infrastructure works. Versus running and maintaining Pinecone, Qdrant, etc. as separate systems.

The catch: pgvector struggles above ~50M vectors or >100 queries/second. Above those thresholds, dedicated vector DBs start to win on latency and throughput.

How does Turbopuffer beat everyone at large scale?

Turbopuffer's architecture is fundamentally different. Instead of keeping vectors in RAM (Pinecone, Qdrant, Weaviate), Turbopuffer stores them in S3 with intelligent caching. The trade-off is latency:

  • Pinecone Serverless (RAM-resident): 30-80ms warm reads
  • Turbopuffer (S3-backed): 200-500ms cold, 50-150ms warm

For RAG where queries take 1-2 seconds anyway (generation dominates), the extra 100-300ms is imperceptible. But the storage cost difference is massive:

ProviderStorage / GB-month
Pinecone Serverless$0.33
Qdrant Cloud (per-node, no per-GB)effective ~$10-20 at 1GB nodes
Weaviate Cloud$0.10 + node fees
Supabase pgvector$0.125
Turbopuffer$0.04

A 100M-vector index at 1536-dim float32 with HNSW overhead is ~600GB. Monthly storage:

  • Pinecone: $200
  • Weaviate: $60
  • Supabase: $75
  • Turbopuffer: $24

That's an 8× cost difference at scale. Combined with cheaper query pricing ($3/M reads vs $8.25/M for Pinecone), Turbopuffer is dramatically cheaper at 10M+ vectors.

The mature 2026 pattern is two-tier storage: hot tier on Pinecone/Qdrant for recent/high-query data, cold tier on Turbopuffer for the archive.

When does Pinecone Serverless actually win on cost?

Pinecone wins below ~5M vectors with moderate query rate. Math for 1M vectors with 30k queries/day:

Pinecone Serverless:
  Storage: 1M × 1536 × 4 bytes × 1.4 overhead / (1024^3) = 8.0 GB
  Storage cost: 8 × $0.33 = $2.64/month
  Read cost: 30k × 30 × $8.25 / 1M = $7.43/month
  Write cost: small, ignore
  Total: ~$10/month

vs Qdrant Cloud (1GB node minimum):

Node cost: $76/month flat (need 2 nodes for 8GB)
Total: $152/month

vs self-hosted pgvector on $20 VM:

VM cost: $20/month
Operational time: 1-2 hours/month for monitoring/backups
Total monetary: $20/month, time cost = real

For this workload, Pinecone wins on absolute cost ($10 vs $20 vs $152). The reason is that Pinecone Serverless storage is genuinely cheap per-GB and your query rate is below the threshold where per-query pricing dominates.

When does it pay to run your own Qdrant cluster?

Above 10M vectors AND >100k queries/day. Self-hosted Qdrant on rented hardware:

Hardware: 2× $20 Hetzner VMs (1 primary + 1 replica)
Total: $40/month
Capacity: ~20M vectors at 1536-dim with HNSW
Query throughput: ~1000 QPS

vs Qdrant Cloud equivalent: $300-500/month for 2 nodes of similar size.

For high-query workloads with predictable patterns, self-hosting Qdrant is dramatically cheaper than the managed version. The trade-off is operational complexity: you handle backups, failover, monitoring yourself. Realistic platform engineering overhead: 4-8 hours/month for a properly-tooled setup.

What about MongoDB Atlas Vector Search and Redis Vector?

Both are "we already use this database" options, not standalone vector DB choices:

  • MongoDB Atlas Vector Search: bundled into Atlas pricing starting M10 at $57/month. Useful if you're already on Atlas and want vector search alongside document storage. Not the cheapest standalone choice.
  • Redis Vector: bundled into Redis Cloud at higher tiers. Useful for sub-millisecond query latency (ad serving, recommendations). Expensive per-vector.

If you're not already on MongoDB or Redis, neither is the cheapest path. They're consolidation plays, not cost-optimization plays.

What are the hidden costs of "free tier" vector DBs?

Five gotchas:

1. Pinecone "Starter" plan: $0 but capped

Pinecone's $0 Starter tier supports 1M vectors total, but only on legacy pod-based architecture (not Serverless). For real production, you'll need Serverless or a paid pod tier.

2. Qdrant Cloud free tier: useful for experiments only

1GB storage and 1M vectors free. Sufficient for testing, not production. Upgrade is to a $76/month per-node tier.

3. Weaviate Cloud free: heavily limited

100MB storage. Truly POC-only.

4. Supabase free tier: no pgvector access at scale

Supabase free tier limits include only 500MB database. To use pgvector usefully, you need Pro ($25/month).

5. Self-host on free trials

DigitalOcean, AWS, GCP all offer GPU/compute free trials of $200-300 in credits. Lasts 1-3 months of small Postgres + pgvector workload. Useful for kicking the tires but not sustainable.

What is the cheapest production architecture for RAG in 2026?

Three reference architectures by scale:

MVP / startup (under 1M vectors)

  • Vector DB: Supabase pgvector on Pro plan ($25/month)
  • Embeddings: OpenAI text-embedding-3-small ($0.02/M tokens)
  • Generation: Claude Haiku 4.5 ($0.80/$4.00 per million tokens)
  • Total monthly: $40-80 depending on traffic

Growth-stage (1M-10M vectors)

  • Vector DB: Pinecone Serverless ($40-150/month based on traffic)
  • Embeddings: Voyage 3 ($0.06/M tokens — better retrieval, mid-cost)
  • Generation: Claude Sonnet 4.6 ($3.00/$15.00 per million tokens) with Haiku fallback
  • Total monthly: $300-2,000

Scale (10M+ vectors)

  • Vector DB: Turbopuffer for cold storage + Qdrant Cloud for hot tier
  • Embeddings: Voyage 3 Large ($0.18/M tokens — top retrieval)
  • Generation: Claude Sonnet 4.6 with intelligent routing
  • Total monthly: $2,000-10,000+

For real-time forecasting across this stack, use our RAG Total Cost Calculator (composite tool) and Vector DB Cost Estimator (vector DB specifically).

What's coming for vector DB pricing in 2026?

Three trends to watch:

  1. Object-storage-backed indexes (Turbopuffer-style) becoming the default for cost-sensitive workloads. Expect Pinecone, Qdrant, Weaviate to launch similar tiers by end of 2026.
  2. pgvector quantization improvements. Half-precision and int8 indexes natively in Postgres, cutting storage 50-75%.
  3. Embedding-DB consolidation. Provider-side embeddings + vector storage as a single bundle. OpenAI is rumored to launch this; if so it could compress pricing across the market.

The vector DB landscape is more diverse than the LLM landscape — there's no single winner. Pick by scale and operational preference. The right choice today may be the wrong choice 12 months from now as your traffic scales.

For broader infrastructure planning, the RAG Total Cost Calculator bundles vector DB cost with embedding + generation costs. The Embeddings Cost Calculator handles the upstream pricing.

The cheapest vector database is the one that matches your scale. Don't pick what's cheapest at someone else's scale.