What is the cheapest vector database for a small RAG app?

Supabase pgvector at $25/month covers 8GB of storage with unlimited queries — enough for ~5M small vectors. Turbopuffer is even cheaper at scale ($0.04/GB storage) but charges per query. For under 100k vectors, self-hosted pgvector on a $20 VM beats every managed option.

Is Pinecone or Qdrant cheaper at 1 million vectors?

For 1M vectors at 1536 dimensions with 50,000 queries/day, Qdrant Cloud is roughly $80/month while Pinecone Serverless is roughly $40-60/month depending on query volume. Below 10M vectors Pinecone wins; above 10M Qdrant's per-node pricing scales better.

How do I calculate vector storage size?

Raw size = vectors × dimensions × bytes_per_float. Float32 = 4 bytes, float16 = 2, int8 = 1, binary = 0.125. Add 30-50% index overhead for HNSW. So 1M float32 vectors at 1536 dim = 5.7GB raw + ~2GB index = 8GB total.

Does quantization save money on vector databases?

Yes, significantly. Switching from float32 to int8 cuts storage 75% with about 5% recall loss. Binary quantization cuts 97% but requires reranking. On Pinecone Serverless that 75% storage saving translates to a 60-70% bill reduction for storage-heavy workloads.

What is a vector database query unit?

Pinecone bills 'Read Units' where 1 RU equals 1 query returning roughly 1KB of payload. Most providers bill per million queries directly. Qdrant Cloud bills per node-hour with unlimited queries inside that capacity.

Should I just use Postgres pgvector instead?

For under 10M vectors and queries under 100/sec, pgvector on managed Postgres (Supabase, Neon, Render) is the cheapest and lowest-operations option. Above 10M vectors or 1000 queries/sec, a purpose-built vector DB starts to win on latency.

Blog

Vector Database Pricing 2026: Pinecone vs Qdrant vs Supabase

A practical 2026 vector database cost comparison — Pinecone, Qdrant, Weaviate, Supabase pgvector, Turbopuffer, and more, with real RAG workload examples.

Updated 2026-05-117 min read· By AITOT Editorial

Vector database pricing in 2026 ranges from $0 (self-hosted Postgres pgvector) to $400+ per month for the same 1-million-vector RAG workload, depending on provider, query rate, and quantization choices. This guide breaks down nine providers across realistic RAG workloads (100k to 100M vectors) so you can pick the right one for your scale. For real-time comparison across your exact numbers, use our Vector DB Cost Estimator.

Vector DB is usually 10-25% of an AI app's total infrastructure bill — small enough to ignore at MVP scale, large enough to dominate decisions at production scale. The good news is the math is more predictable than LLM token cost: it scales linearly with vectors, dimensions, and queries.

What does a vector database actually charge for?

Three line items appear on every vector DB bill:

Storage — usually billed per GB-month of indexed data. Index overhead (HNSW typically 1.3-1.5×) means stored bytes are 30-50% larger than raw vectors.
Reads — billed per million queries, or bundled into a node-hour rate. Hybrid search (vector + keyword) often costs 2× a pure vector query.
Writes — billed per million upserts. Re-indexing a document hot-reloads the whole HNSW graph, so frequent updates can dominate the bill.

A fourth hidden item: plan minimums. Most managed providers have a $25-$200/month floor before per-usage billing even kicks in. For tiny experiments, that floor is the entire bill.

What is the cheapest vector DB at each scale?

The cheapest provider depends sharply on scale. Here's a breakdown across four common RAG workload sizes, using float32 1536-dimension OpenAI-style embeddings:

Workload	Vectors	Queries/day	Cheapest provider	Approx. monthly
Small RAG (proof-of-concept)	100k	5,000	Self-hosted pgvector	$20 (VM only)
Small RAG (managed)	100k	5,000	Supabase pgvector	$25
Medium RAG	1M	50,000	Pinecone Serverless	$40-60
Large RAG	10M	200,000	Turbopuffer	$35-80
Enterprise	100M	1M	Turbopuffer or self-host	$300-800

Turbopuffer is the surprise winner at large scale because its object-storage architecture trades cold-read latency (200-500ms vs 30-80ms warm) for radically cheaper storage. For RAG where queries can wait 500ms, that trade is almost always worth it.

How does Pinecone Serverless pricing actually work?

Pinecone Serverless bills three line items separately, then sums:

Storage: $0.33 per GB-month of indexed data
Reads: $8.25 per million read units (1 RU ≈ 1 query × 1KB result)
Writes: $4.00 per million upserts

A worked example for 1M vectors at 1536 dim with 50k queries/day and 5k writes/day:

storage: 1M × 1536 × 4 bytes × 1.4 overhead / (1024^3) = 8.0 GB
         8.0 × $0.33 = $2.64 per month

reads:   50,000 × 30 = 1.5M reads / month
         1.5 × $8.25 = $12.38 per month

writes:  5,000 × 30 = 150k writes / month
         0.15 × $4.00 = $0.60 per month

total:   $15.62 per month

That's the bare minimum. In practice you'll have some baseline storage of metadata and tags that add 10-30%. Still, Pinecone Serverless is genuinely cheap at this scale — the headline price chart looks expensive until you do the math.

The catch: above ~50M vectors the read pricing dominates. At 10M reads/month against a 50M-vector index, you'd pay $82.50 just for reads. Pod-based Pinecone (or migrating to Qdrant / Turbopuffer) becomes cheaper.

Is Qdrant cheaper than Pinecone?

It depends entirely on query rate.

Qdrant Cloud charges per-node-hour, not per-query. Their starter Hybrid Cloud node (1GB, 1 vCPU) runs $0.105/hour = $76/month. You get unlimited queries inside the node's CPU capacity (~50-100 QPS for vector search).

Scenario	Pinecone Serverless	Qdrant Cloud
1M vectors, 10k queries/day	$7	$76
1M vectors, 100k queries/day	$40	$76
1M vectors, 1M queries/day	$260	$76 (likely 2 nodes = $152)
10M vectors, 100k queries/day	$90	$200

Pinecone wins on low-query-rate workloads (because storage is cheap). Qdrant wins on high-query-rate workloads (because predictable per-node pricing dominates per-query pricing past a certain threshold).

Pro tip: if you're already running Postgres, pgvector on Supabase or Neon is even cheaper than either Qdrant or Pinecone for under 10M vectors at moderate query rate. The trade-off is recall (HNSW on Postgres is competitive but lacks some advanced features), and operational simplicity (one DB to manage instead of two).

How much can quantization save?

A lot. Precision converts directly to storage cost:

Precision	Bytes/value	Storage vs float32	Recall hit
float32	4	100%	baseline
float16	2	50%	~0.5%
int8	1	25%	~5%
binary	0.125	3%	~15% (rerank required)

For 100M float32 1536-dim vectors, raw storage is 570GB. Drop to int8 and it's 142GB — at $0.33/GB on Pinecone that's $190/month versus $47/month. Saving four figures annually.

Binary quantization is the most aggressive option but requires a reranking pass with the original float32 vectors (or with a cross-encoder) for production-quality recall. Tools like Pinecone's namespace feature, Cohere's Rerank API, and Voyage AI's reranker make this practical.

When should you use Postgres pgvector instead?

The pgvector decision tree:

Use pgvector if you have under 10M vectors, under 100 queries/sec, and already run Postgres. The operational simplicity beats any niche feature.
Use a purpose-built vector DB if you have over 10M vectors, over 1,000 queries/sec, need sparse-dense hybrid search, or are doing serious metadata filtering with high cardinality.
Use Turbopuffer if you're cost-bound and can tolerate 200-500ms cold reads. Object-storage backing is decisive at large scale.
Use Weaviate / Qdrant if you need built-in modules (CLIP, multi-vector, multi-tenant ACL) without writing them yourself.

The pgvector ecosystem matured significantly in 2024-2025. Native HNSW indexing, IVFFlat for cold storage, half-precision support, and built-in hybrid search make it competitive for most real-world RAG workloads. The Supabase team's pgvector v0.8 benchmarks are within 10-20% of dedicated vector DBs for under-10M-vector workloads.

What about MongoDB Atlas Vector Search and Redis Vector?

Both are good "we already use this database" options:

MongoDB Atlas Vector Search is bundled into Atlas pricing starting at M10 ($57/month). For teams already on MongoDB, the operational and querying integration is genuinely valuable — JSON metadata filtering with vector search in one query.
Redis Vector is included in Redis Cloud pricing. Sub-millisecond query latency is the headline feature; it's the right choice for ad serving, recommendation, and other ultra-low-latency use cases.

Neither is the cheapest at any specific scale, but both can be the right choice when "consolidate vendors" is more valuable than "minimize line-item cost".

How do I actually pick?

Use this decision sequence:

Estimate vector count and query rate for the next 12 months, not just MVP day-one. Vector DBs are sticky — migration is painful.
Estimate quantization tolerance by running a small recall benchmark with int8 vs float32 against your actual reranker. Most teams find ≤2% recall loss is acceptable.
Pick on total monthly cost at your 12-month target, not headline price. Use our Vector DB Cost Estimator to plug in numbers across all 9 providers in one shot.
Layer in the qualitative factors: do you need built-in CLIP / multi-tenancy / GDPR EU residency / hybrid search?

A common 2026 pattern is two-tier storage: hot tier on Pinecone or Qdrant for the past 30 days of content (high query rate), cold tier on Turbopuffer for older archives (rare queries, dirt-cheap storage). The crossover saves 40-60% on a real production RAG bill.

Don't over-optimize at MVP scale. The total vector DB bill for a small AI app is probably under $50/month — engineer time spent shaving that bill is engineer time not spent improving retrieval quality, which is a much bigger lever for product success.