AI Embeddings Pricing 2026: OpenAI vs Voyage vs Cohere vs Jina
Compare 17 embedding models by cost per 1M tokens in 2026 — OpenAI 3-small/large, Voyage 3, Cohere v3, Jina v4, BGE-M3, Nomic, and more.
AI embeddings pricing in 2026 spans a 16× range from $0.008/M tokens on hosted open-weight models like BGE-M3 to $0.18/M on Voyage 3 Large. Embedding is the cheapest line item in most RAG bills, but at very large corpus sizes (100M+ tokens) it becomes a real number — and choosing the wrong model has compound consequences because re-embedding to switch later is expensive. This guide compares 17 embedding models with worked-out scenarios. For real-time pricing across all 17, use our AI Embeddings Cost Calculator.
What does embedding pricing actually look like in 2026?
Cost per 1M tokens, sorted cheapest first:
| Model | $/M tokens | Dimensions | Max input | Notes |
|---|---|---|---|---|
| Together BGE-M3 | $0.008 | 1024 | 8192 | Open-weight |
| Together bge-large-en | $0.008 | 1024 | 512 | |
| Fireworks Nomic Embed | $0.008 | 768 | 8192 | |
| Jina v3 | $0.012 | 1024 | 8192 | Configurable |
| Jina v4 | $0.018 | 2048 | 32000 | Configurable |
| OpenAI text-embedding-3-small | $0.02 | 1536 | 8191 | Matryoshka |
| Voyage 3 Lite | $0.02 | 512 | 32000 | |
| AWS Titan Embed v2 | $0.02 | 1024 | 8192 | Matryoshka |
| Google text-embedding-005 | $0.025 | 768 | 2048 | |
| Voyage 3 | $0.06 | 1024 | 32000 | |
| Cohere embed-english-v3.0 | $0.10 | 1024 | 512 | |
| Cohere embed-multilingual-v3.0 | $0.10 | 1024 | 512 | |
| Mistral mistral-embed | $0.10 | 1024 | 8192 | |
| Google gemini-embedding-exp | $0.10 | 3072 | 8192 | Configurable |
| OpenAI text-embedding-3-large | $0.13 | 3072 | 8191 | Matryoshka |
| Voyage 3 Large | $0.18 | 1024 | 32000 | Top MTEB |
| Voyage code-3 | $0.18 | 1024 | 32000 | Code-specialized |
For most production RAG, the sweet-spot picks are OpenAI 3-small at $0.02/M (broad capability, well-supported) and Voyage 3 at $0.06/M (better retrieval at moderate cost premium).
Which embedding model should I use in 2026?
Decision tree by use case:
- General-purpose retrieval, English-heavy — OpenAI text-embedding-3-small at $0.02/M. Best supported ecosystem; works with every vector DB.
- Multilingual content — Cohere embed-multilingual-v3.0 at $0.10/M, or Voyage 3 at $0.06/M.
- Code search — Voyage code-3 at $0.18/M. Specifically trained for code retrieval; beats general-purpose embeddings by 15–25%.
- Best retrieval quality at any price — Voyage 3 Large at $0.18/M. Top of MTEB benchmark 2025–2026.
- Self-host break-even territory (>50M tokens/month) — BGE-M3 or Nomic Embed running on your own GPU.
- Long documents (book chapters, full papers) — Voyage 3 or Jina v4 at 32k-token max input. Avoids chunking artifacts.
- EU data residency — Mistral mistral-embed at $0.10/M. European hosting.
- AWS-native stack — Titan Embed v2 at $0.02/M. Bundled with Bedrock.
A common 2026 pattern is two-tier embeddings: embed the bulk of the corpus with cheap BGE-M3 or Jina v3, then re-embed only the highest-traffic 10% with Voyage 3 Large for premium retrieval. Saves 60–80% versus embedding everything with the premium model.
How do I calculate total embedding cost for my RAG corpus?
The formula:
one_time = corpus_tokens × per_million_rate
monthly_refresh = corpus_tokens × refreshes_per_month × per_million_rate
monthly_query = query_tokens_per_month × per_million_rate
year_one = one_time + (monthly_refresh + monthly_query) × 12
A worked example: a 50M-token corpus (about 50,000 typical documents at 1,000 tokens each), refreshed monthly at 25% (re-embed quarterly), with 5M query tokens/month:
OpenAI text-embedding-3-small ($0.02/M):
One-time: 50M × $0.02 = $1.00
Monthly refresh: 50M × 0.25 × $0.02 = $0.25/mo
Monthly query: 5M × $0.02 = $0.10/mo
Year 1: $1 + ($0.35 × 12) = $5.20
Voyage 3 Large ($0.18/M):
One-time: $9.00
Monthly: $3.15
Year 1: $46.80
OpenAI text-embedding-3-large ($0.13/M):
Year 1: $33.80
BGE-M3 self-hosted on $0.99/hr L40S (≈ $0.001/M effective):
Year 1: $0.26 in compute (but +$200 ops time)
The headline cost difference is real but small in absolute dollars. The real cost of choosing wrong is re-embedding when you switch models — switching from Cohere to Voyage 3 on a 50M-token corpus is $5 on OpenAI 3-small but $9 on Voyage. Trivial unless your corpus is 5B+ tokens. So pick on retrieval quality, not embedding cost.
What are Matryoshka embeddings and why do they matter?
Matryoshka representation learning trains the model so that truncating the output vector at any point still gives a useful embedding. This is huge for storage cost:
- OpenAI text-embedding-3-large outputs 3072 dimensions. Storing 1M vectors at 4 bytes each is 11.7 GB.
- Truncated to 512 dimensions: 1.95 GB. 6× cheaper storage with typically 3–5% recall loss.
- Truncated to 256: 977 MB. 12× cheaper with 8–12% recall loss.
Matryoshka-compatible models in 2026:
- OpenAI text-embedding-3-small (1536 → 256/512/768/1024)
- OpenAI text-embedding-3-large (3072 → 256/512/1024/1536)
- Voyage 3 family (1024 → 128/256/512/768)
- Google gemini-embedding-exp (3072 → 768/1536)
- AWS Titan Embed v2 (1024 → 256/512)
- Jina v3 and v4 (configurable at request time)
Cohere v3 family and Mistral mistral-embed do NOT support truncation. If you anticipate downsizing later, pick from the configurable list above.
What hidden costs come with embeddings?
Four line items teams under-budget:
- Chunking strategy compute. Chunking algorithms (semantic chunking, fixed-window with overlap) need their own LLM calls. Budget $5–$20/M corpus tokens for high-quality chunking.
- Re-embedding when switching models. Adopting a better model means re-embedding your whole corpus. Plan for ~$10 per 100M tokens.
- Embedding query inflation. Every user query gets embedded. A naive implementation embeds the raw query (~30 tokens), but hybrid search and HyDE re-write queries first to 300+ tokens — 10× the embedding cost per query.
- Storage in vector DB. Embedding cost is trivial compared to storing the vectors. 50M-token corpus → ~50M vectors → 200GB at float32 1536-dim. Vector DB cost dominates embedding cost above 1M vectors.
For the full RAG bill including vector DB, retrieval, reranking, and generation, see our RAG Total Cost Calculator. For embedding-only forecasting, use the Embeddings Cost Calculator.
When does it make sense to self-host embeddings?
The break-even math:
- Hosted API floor: $0.008/M (BGE-M3 on Together)
- L40S GPU rented at $0.99/hour: embeds roughly 5M tokens/minute = 300M tokens/hour
- Effective hosted cost on L40S: $0.99 ÷ 300 = $0.003/M tokens
So renting a GPU for self-hosted embeddings is roughly 3× cheaper than the cheapest hosted API. But:
- The GPU runs whether you're embedding or not. If you only have 10M tokens to embed per month, you're paying $0.99 × 720 = $713 in idle GPU time for $30 of embedding work.
- Break-even is roughly 50M tokens/month — above this, self-hosting wins.
- Operations overhead is real. Plan 0.1–0.2 FTE platform engineering for a production embedding endpoint.
For batch embedding jobs (re-embedding a corpus on demand), spin up a GPU just for the job: 100M tokens takes about 20 minutes on an L40S, costing $0.33. Hosted API would cost $0.80–$26 depending on model. Big win for one-off batches.
How often should I switch embedding models?
Less often than you'd think. The cost is the re-embedding pass. Practical guideline:
- Stay if your current model is within 10% of the current best on your specific retrieval benchmark.
- Switch when a new model offers >15% improvement on your benchmark AND your corpus is small enough that re-embedding cost is under 5% of annual RAG budget.
- Adopt new models in parallel for a few weeks before fully cutting over — keep both vector indexes, do A/B retrieval.
The Embeddings Cost Calculator compares 17 models with your specific corpus size and query rate, including both one-time and recurring costs. For broader RAG architecture decisions, the Vector DB Cost Estimator and RAG Total Cost Calculator extend the analysis to the full stack.
Embeddings pricing changes more slowly than LLM token pricing — major shifts roughly twice a year. We refresh the calculator on the first of every month with verified rates from each provider's pricing page.