Embedding Models Explained: Choosing the Right One

Not all embeddings are created equal. OpenAI's text-embedding-3, Cohere's embed-v3, and open-source alternatives each have strengths. Here's how to choose and benchmark for your specific needs.

What Embeddings Actually Capture

Embeddings are dense vector representations of text that capture semantic meaning. Similar concepts cluster together in the vector space. Quality depends on training data, architecture, and the specific tasks the model was optimized for.

Comparing Popular Models

OpenAI text-embedding-3: Excellent quality, easy API, variable dimensions, relatively expensive at scale.

Cohere embed-v3: Strong multilingual support, search-optimized variants, competitive pricing.

Open-source (BGE, E5, Instructor): No API costs, full control, can run locally, requires infrastructure.

// OpenAI
const embedding = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Your text here',
  dimensions: 512 // Smaller dimensions, faster search
});

// Local with HuggingFace
import { pipeline } from '@xenova/transformers';
const embedder = await pipeline('feature-extraction', 'BAAI/bge-small-en-v1.5');
const embedding = await embedder('Your text here');

Dimensionality: Bigger Isn't Always Better

Higher dimensions capture more nuance but increase storage, slow down search, and can introduce noise. 384-512 dimensions often sufficient. 1024-1536 for complex domains. Benchmark your actual data to find the sweet spot.

Domain-Specific vs General-Purpose

General models work for most use cases. Domain-specific (medical, legal, scientific) can significantly outperform on specialized content. Fine-tuning is an option but requires data and expertise.

Benchmarking for Your Data

Create a test set with known relevant pairs. Measure recall@k for your actual queries. Compare latency and cost. Test on edge cases specific to your domain.

async function benchmarkModel(model, testPairs) {
  const results = [];
  for (const { query, relevant } of testPairs) {
    const retrieved = await search(model, query, k=10);
    const recall = relevant.filter(r => retrieved.includes(r)).length / relevant.length;
    results.push(recall);
  }
  return average(results);
}

The right embedding model depends on your data, requirements, and constraints. Benchmark on YOUR data—marketing claims don't capture nuances of your specific use case.