Harga Embeddings AI 2026: OpenAI vs Voyage vs Cohere vs Jina
Bandingkan 17 model embedding berdasarkan biaya per 1M token 2026 — OpenAI 3-small/large, Voyage 3, Cohere v3, Jina v4, BGE-M3, Nomic.
Harga embeddings AI 2026 rentang 16× dari $0,008/M token di model open-weight hosted seperti BGE-M3 hingga $0,18/M di Voyage 3 Large. Embedding adalah line item termurah di kebanyakan bill RAG. Panduan ini bandingkan 17 model. Untuk pricing real-time, gunakan Kalkulator Biaya Embeddings AI.
Berapa pricing embedding sebenarnya 2026?
Biaya per 1M token, termurah pertama:
| Model | $/M token | Dimensi | Max input | Catatan |
|---|---|---|---|---|
| Together BGE-M3 | $0,008 | 1024 | 8192 | Open-weight |
| Together bge-large-en | $0,008 | 1024 | 512 | |
| Fireworks Nomic Embed | $0,008 | 768 | 8192 | |
| Jina v3 | $0,012 | 1024 | 8192 | Configurable |
| Jina v4 | $0,018 | 2048 | 32000 | Configurable |
| OpenAI text-embedding-3-small | $0,02 | 1536 | 8191 | Matryoshka |
| Voyage 3 Lite | $0,02 | 512 | 32000 | |
| AWS Titan Embed v2 | $0,02 | 1024 | 8192 | Matryoshka |
| Google text-embedding-005 | $0,025 | 768 | 2048 | |
| Voyage 3 | $0,06 | 1024 | 32000 | |
| Cohere embed-english-v3.0 | $0,10 | 1024 | 512 | |
| Cohere embed-multilingual-v3.0 | $0,10 | 1024 | 512 | |
| Mistral mistral-embed | $0,10 | 1024 | 8192 | |
| Google gemini-embedding-exp | $0,10 | 3072 | 8192 | Configurable |
| OpenAI text-embedding-3-large | $0,13 | 3072 | 8191 | Matryoshka |
| Voyage 3 Large | $0,18 | 1024 | 32000 | Top MTEB |
| Voyage code-3 | $0,18 | 1024 | 32000 | Code-specialized |
Untuk kebanyakan RAG production, sweet-spot picks OpenAI 3-small $0,02/M dan Voyage 3 $0,06/M.
Model embedding mana 2026?
- Retrieval general-purpose — OpenAI text-embedding-3-small $0,02/M.
- Konten multibahasa — Cohere embed-multilingual-v3.0 $0,10/M atau Voyage 3 $0,06/M.
- Code search — Voyage code-3 $0,18/M.
- Kualitas retrieval terbaik — Voyage 3 Large $0,18/M.
- Self-host break-even (>50M token/bulan) — BGE-M3 atau Nomic Embed.
- Dokumen panjang — Voyage 3 atau Jina v4 di 32k token max.
- EU data residency — Mistral mistral-embed $0,10/M.
- Stack AWS-native — Titan Embed v2 $0,02/M.
Pattern 2026: embeddings dua-tier: embed bulk corpus dengan BGE-M3 murah atau Jina v3, re-embed top 10% traffic dengan Voyage 3 Large.
Hitung biaya embedding total untuk corpus RAG?
one_time = corpus_tokens × per_million_rate
monthly_refresh = corpus_tokens × refreshes_per_month × per_million_rate
monthly_query = query_tokens_per_month × per_million_rate
year_one = one_time + (monthly_refresh + monthly_query) × 12
Contoh: corpus 50M token (~50.000 doc), refresh bulanan 25%, 5M query token/bulan:
OpenAI 3-small ($0,02/M):
One-time: $1,00
Monthly: $0,35
Year 1: $5,20
Voyage 3 Large ($0,18/M):
Year 1: $46,80
Apa itu Matryoshka embeddings?
Matryoshka memungkinkan truncate vector output di titik manapun. OpenAI 3-large 3072 dim:
- 3072 dim: 11,7 GB untuk 1M vector
- 512 dim: 1,95 GB. Storage 6× murah dengan 3–5% recall loss.
- 256 dim: 977 MB. 12× murah dengan 8–12% recall loss.
Model Matryoshka-compatible: OpenAI 3 family, Voyage 3 family, Google gemini-embedding-exp, AWS Titan v2, Jina v3/v4.
Biaya tersembunyi embedding?
- Compute chunking strategy. Semantic chunking dengan LLM $5–$20/M corpus token.
- Re-embed saat switch model. ~$10/100M token.
- Inflasi query embedding. Hybrid search dan HyDE rewrite query ke 300+ token.
- Storage di vector DB. Cost embed trivial vs storage vector.
Untuk RAG bill lengkap, lihat Kalkulator Biaya RAG.
Kapan self-host embedding?
- Floor hosted API: $0,008/M
- L40S GPU sewa $0,99/jam: 300M token/jam
- Efektif hosted di L40S: $0,003/M token
Rentang GPU 3× lebih murah dari API termurah. Tapi GPU jalan terlepas dari penggunaan. Break-even ~50M token/bulan.
Seberapa sering switch model embedding?
- Tetap jika model saat ini dalam 10% best benchmark.
- Switch saat model baru offer >15% improvement.
- Adopt model baru paralel beberapa minggu sebelum cut over.
Kalkulator Biaya Embeddings bandingkan 17 model. Refresh tanggal 1 setiap bulan.