What is the cheapest embedding model in 2026?

Hosted open-weight options — BGE-M3 and bge-large-en on Together at $0.008/M tokens. For commercial / closed-weight, Jina v3 at $0.012/M is cheapest. For comparison: OpenAI text-embedding-3-small is $0.02/M, Voyage 3 is $0.06/M, OpenAI 3-large is $0.13/M.

Is Voyage AI worth the price premium over OpenAI?

For retrieval-heavy applications, often yes. Voyage 3 ranks higher than OpenAI text-embedding-3-large on most MTEB tasks in 2025–2026 benchmarks while costing roughly half. Voyage 3 Large beats OpenAI by 5–8% on retrieval benchmarks. For general-purpose, OpenAI 3-small at $0.02/M is the value pick.

How do I calculate embedding cost for my RAG corpus?

Cost = total_tokens × refreshes_per_month × per_million_rate + query_tokens × per_million_rate. A 50M-token corpus refreshed monthly with 5M query tokens/month costs $1.10/month on OpenAI 3-small ($0.02/M) or $3.30/month on Voyage 3 ($0.06/M).

Should I self-host embedding models?

For more than 50M tokens/month, yes. BGE-M3 or Nomic Embed on a single L40S GPU at $0.99/hour can embed ~5M tokens/minute, working out to roughly $0.001/M tokens — 8× cheaper than the cheapest hosted option. Below that volume, hosted APIs win on operational simplicity.

How often should I re-embed my corpus?

Depends on update cadence. Static knowledge bases: never re-embed unless you switch models. Frequently-edited docs: re-embed updated chunks individually (most APIs support this), not the whole corpus. Switch embedding models: re-embed everything (expensive — budget for it before switching).

Blog

AI Embeddings Pricing 2026: OpenAI vs Voyage vs Cohere vs Jina

Q: What are Matryoshka embeddings and why do they matter?

Matryoshka embeddings let you truncate the output vector to a smaller dimension after generation, without re-embedding. OpenAI 3-large (3072 dim) can be truncated to 512 dim with ~5% recall loss, cutting vector DB storage 6×. Voyage v3 and Gemini Embedding also support this.

Compare 17 embedding models by cost per 1M tokens in 2026 — OpenAI 3-small/large, Voyage 3, Cohere v3, Jina v4, BGE-M3, Nomic, and more.

Updated 2026-05-117 min read· By AITOT Editorial

AI embeddings pricing in 2026 spans a 16× range from $0.008/M tokens on hosted open-weight models like BGE-M3 to $0.18/M on Voyage 3 Large. Embedding is the cheapest line item in most RAG bills, but at very large corpus sizes (100M+ tokens) it becomes a real number — and choosing the wrong model has compound consequences because re-embedding to switch later is expensive. This guide compares 17 embedding models with worked-out scenarios. For real-time pricing across all 17, use our AI Embeddings Cost Calculator.

What does embedding pricing actually look like in 2026?

Cost per 1M tokens, sorted cheapest first:

Model	$/M tokens	Dimensions	Max input	Notes
Together BGE-M3	$0.008	1024	8192	Open-weight
Together bge-large-en	$0.008	1024	512
Fireworks Nomic Embed	$0.008	768	8192
Jina v3	$0.012	1024	8192	Configurable
Jina v4	$0.018	2048	32000	Configurable
OpenAI text-embedding-3-small	$0.02	1536	8191	Matryoshka
Voyage 3 Lite	$0.02	512	32000
AWS Titan Embed v2	$0.02	1024	8192	Matryoshka
Google text-embedding-005	$0.025	768	2048
Voyage 3	$0.06	1024	32000
Cohere embed-english-v3.0	$0.10	1024	512
Cohere embed-multilingual-v3.0	$0.10	1024	512
Mistral mistral-embed	$0.10	1024	8192
Google gemini-embedding-exp	$0.10	3072	8192	Configurable
OpenAI text-embedding-3-large	$0.13	3072	8191	Matryoshka
Voyage 3 Large	$0.18	1024	32000	Top MTEB
Voyage code-3	$0.18	1024	32000	Code-specialized

For most production RAG, the sweet-spot picks are OpenAI 3-small at $0.02/M (broad capability, well-supported) and Voyage 3 at $0.06/M (better retrieval at moderate cost premium).

Which embedding model should I use in 2026?

Decision tree by use case:

General-purpose retrieval, English-heavy — OpenAI text-embedding-3-small at $0.02/M. Best supported ecosystem; works with every vector DB.
Multilingual content — Cohere embed-multilingual-v3.0 at $0.10/M, or Voyage 3 at $0.06/M.
Code search — Voyage code-3 at $0.18/M. Specifically trained for code retrieval; beats general-purpose embeddings by 15–25%.
Best retrieval quality at any price — Voyage 3 Large at $0.18/M. Top of MTEB benchmark 2025–2026.
Self-host break-even territory (>50M tokens/month) — BGE-M3 or Nomic Embed running on your own GPU.
Long documents (book chapters, full papers) — Voyage 3 or Jina v4 at 32k-token max input. Avoids chunking artifacts.
EU data residency — Mistral mistral-embed at $0.10/M. European hosting.
AWS-native stack — Titan Embed v2 at $0.02/M. Bundled with Bedrock.

A common 2026 pattern is two-tier embeddings: embed the bulk of the corpus with cheap BGE-M3 or Jina v3, then re-embed only the highest-traffic 10% with Voyage 3 Large for premium retrieval. Saves 60–80% versus embedding everything with the premium model.

How do I calculate total embedding cost for my RAG corpus?

The formula:

one_time = corpus_tokens × per_million_rate

monthly_refresh = corpus_tokens × refreshes_per_month × per_million_rate

monthly_query = query_tokens_per_month × per_million_rate

year_one = one_time + (monthly_refresh + monthly_query) × 12

A worked example: a 50M-token corpus (about 50,000 typical documents at 1,000 tokens each), refreshed monthly at 25% (re-embed quarterly), with 5M query tokens/month:

OpenAI text-embedding-3-small ($0.02/M):
  One-time: 50M × $0.02 = $1.00
  Monthly refresh: 50M × 0.25 × $0.02 = $0.25/mo
  Monthly query: 5M × $0.02 = $0.10/mo
  Year 1: $1 + ($0.35 × 12) = $5.20

Voyage 3 Large ($0.18/M):
  One-time: $9.00
  Monthly: $3.15
  Year 1: $46.80

OpenAI text-embedding-3-large ($0.13/M):
  Year 1: $33.80

BGE-M3 self-hosted on $0.99/hr L40S (≈ $0.001/M effective):
  Year 1: $0.26 in compute (but +$200 ops time)

The headline cost difference is real but small in absolute dollars. The real cost of choosing wrong is re-embedding when you switch models — switching from Cohere to Voyage 3 on a 50M-token corpus is $5 on OpenAI 3-small but $9 on Voyage. Trivial unless your corpus is 5B+ tokens. So pick on retrieval quality, not embedding cost.

What are Matryoshka embeddings and why do they matter?

Matryoshka representation learning trains the model so that truncating the output vector at any point still gives a useful embedding. This is huge for storage cost:

OpenAI text-embedding-3-large outputs 3072 dimensions. Storing 1M vectors at 4 bytes each is 11.7 GB.
Truncated to 512 dimensions: 1.95 GB. 6× cheaper storage with typically 3–5% recall loss.
Truncated to 256: 977 MB. 12× cheaper with 8–12% recall loss.

Matryoshka-compatible models in 2026:

OpenAI text-embedding-3-small (1536 → 256/512/768/1024)
OpenAI text-embedding-3-large (3072 → 256/512/1024/1536)
Voyage 3 family (1024 → 128/256/512/768)
Google gemini-embedding-exp (3072 → 768/1536)
AWS Titan Embed v2 (1024 → 256/512)
Jina v3 and v4 (configurable at request time)

Cohere v3 family and Mistral mistral-embed do NOT support truncation. If you anticipate downsizing later, pick from the configurable list above.

What hidden costs come with embeddings?

Four line items teams under-budget:

Chunking strategy compute. Chunking algorithms (semantic chunking, fixed-window with overlap) need their own LLM calls. Budget $5–$20/M corpus tokens for high-quality chunking.
Re-embedding when switching models. Adopting a better model means re-embedding your whole corpus. Plan for ~$10 per 100M tokens.
Embedding query inflation. Every user query gets embedded. A naive implementation embeds the raw query (~30 tokens), but hybrid search and HyDE re-write queries first to 300+ tokens — 10× the embedding cost per query.
Storage in vector DB. Embedding cost is trivial compared to storing the vectors. 50M-token corpus → ~50M vectors → 200GB at float32 1536-dim. Vector DB cost dominates embedding cost above 1M vectors.

For the full RAG bill including vector DB, retrieval, reranking, and generation, see our RAG Total Cost Calculator. For embedding-only forecasting, use the Embeddings Cost Calculator.

When does it make sense to self-host embeddings?

The break-even math:

Hosted API floor: $0.008/M (BGE-M3 on Together)
L40S GPU rented at $0.99/hour: embeds roughly 5M tokens/minute = 300M tokens/hour
Effective hosted cost on L40S: $0.99 ÷ 300 = $0.003/M tokens

So renting a GPU for self-hosted embeddings is roughly 3× cheaper than the cheapest hosted API. But:

The GPU runs whether you're embedding or not. If you only have 10M tokens to embed per month, you're paying $0.99 × 720 = $713 in idle GPU time for $30 of embedding work.
Break-even is roughly 50M tokens/month — above this, self-hosting wins.
Operations overhead is real. Plan 0.1–0.2 FTE platform engineering for a production embedding endpoint.

For batch embedding jobs (re-embedding a corpus on demand), spin up a GPU just for the job: 100M tokens takes about 20 minutes on an L40S, costing $0.33. Hosted API would cost $0.80–$26 depending on model. Big win for one-off batches.

How often should I switch embedding models?

Less often than you'd think. The cost is the re-embedding pass. Practical guideline:

Stay if your current model is within 10% of the current best on your specific retrieval benchmark.
Switch when a new model offers >15% improvement on your benchmark AND your corpus is small enough that re-embedding cost is under 5% of annual RAG budget.
Adopt new models in parallel for a few weeks before fully cutting over — keep both vector indexes, do A/B retrieval.

The Embeddings Cost Calculator compares 17 models with your specific corpus size and query rate, including both one-time and recurring costs. For broader RAG architecture decisions, the Vector DB Cost Estimator and RAG Total Cost Calculator extend the analysis to the full stack.

Embeddings pricing changes more slowly than LLM token pricing — major shifts roughly twice a year. We refresh the calculator on the first of every month with verified rates from each provider's pricing page.