Skip to main content
Embeddings are the core of OpenGround’s semantic search. They transform text chunks into numerical vectors that capture meaning, enabling the system to find relevant documentation even when query words don’t exactly match.

What Are Embeddings?

An embedding is a dense vector representation of text. Similar concepts have vectors that are close together in high-dimensional space.
# Example (simplified)
"how to install package"  → [0.23, -0.15, 0.87, ...] (384 dimensions)
"package installation"    → [0.25, -0.14, 0.89, ...] (similar vector)
"weather forecast"        → [-0.42, 0.76, -0.12, ...] (different vector)
OpenGround uses cosine similarity to measure how close vectors are:
  • Score near 1.0 = very similar meaning
  • Score near 0.0 = unrelated

Embedding Backends

OpenGround supports two embedding backends with different trade-offs:

FastEmbed

Default - Lightweight, ONNX-based
  • Smaller install size
  • CPU-optimized by default
  • Optional GPU support (experimental)
  • Fastest for CPU inference

Sentence-Transformers

Full-featured - PyTorch-based
  • Larger install size
  • Automatic GPU/MPS detection
  • Better GPU performance
  • More model options

Backend Selection

From config.py:62, the default backend:
DEFAULT_EMBEDDING_BACKEND = "fastembed"
Change it with:
openground config set embeddings.embedding_backend sentence-transformers

FastEmbed Backend

FastEmbed uses ONNX Runtime for inference (from embeddings.py:93-116):
@lru_cache(maxsize=1)
def get_fastembed_model(model_name: str, use_cuda: bool = True):
    """Get a cached instance of TextEmbedding (fastembed)."""
    from fastembed import TextEmbedding
    
    if use_cuda:
        try:
            return TextEmbedding(
                model_name=model_name,
                providers=["CUDAExecutionProvider"],
            )
        except ValueError:
            check_gpu_compatibility()
    
    return TextEmbedding(
        model_name=model_name,
        providers=["CPUExecutionProvider"],
    )
ONNX (Open Neural Network Exchange) is an optimized runtime for neural networks. FastEmbed converts PyTorch models to ONNX for faster CPU inference.

Installation Options

# Smallest install, CPU-only
uv tool install 'openground[fastembed]'
pip install 'openground[fastembed]'
GPU support requires matching CUDA drivers and cuDNN versions. See ONNX Runtime CUDA docs for requirements.

GPU Compatibility Check

OpenGround automatically detects GPU availability (from embeddings.py:44-90):
def check_gpu_compatibility() -> None:
    """Check for GPU compatibility and provide optimization tips."""
    gpu_hardware = is_gpu_hardware_available()  # nvidia-smi check
    
    # Check if fastembed-gpu is installed
    has_gpu_pkg = False
    try:
        version("fastembed-gpu")
        has_gpu_pkg = True
    except PackageNotFoundError:
        pass
    
    # Check for functional GPU via onnxruntime
    functional_gpu = False
    try:
        import onnxruntime as ort
        functional_gpu = "CUDAExecutionProvider" in ort.get_available_providers()
    except ImportError:
        pass
    
    # Provide helpful hints
    if gpu_hardware and not has_gpu_pkg:
        hint("GPU detected! Install the GPU version for faster performance:")
        hint("   uv tool install 'openground[fastembed-gpu]'\n")
    
    elif gpu_hardware and has_gpu_pkg and not functional_gpu:
        error("GPU package is installed but CUDA is not functional.")
        # Suggest fixes...

Sentence-Transformers Backend

Sentence-Transformers uses PyTorch with automatic hardware acceleration (from embeddings.py:14-25):
@lru_cache(maxsize=1)
def get_st_model(model_name: str):
    """Get a cached instance of SentenceTransformer."""
    from sentence_transformers import SentenceTransformer
    
    # Automatically uses:
    # - CUDA on NVIDIA GPUs
    # - MPS on Apple Silicon
    # - CPU otherwise
    return SentenceTransformer(model_name)

Installation

# Automatically uses GPU if available
uv tool install openground
pip install openground
The default openground package includes sentence-transformers with automatic GPU/MPS/CPU support. This is the easiest option if you have a GPU.

Embedding Models

From config.py:58-60, the default model:
DEFAULT_EMBEDDING_MODEL = "BAAI/bge-small-en-v1.5"
DEFAULT_EMBEDDING_DIMENSIONS = 384

Why BGE-Small-EN-v1.5?

  • Multilingual: Good English performance
  • Compact: 384 dimensions (vs 768 for larger models)
  • Fast: Smaller vectors = faster search
  • Quality: Strong performance on MTEB benchmarks

Changing Models

Important: Changing the embedding model requires re-embedding all documentation. Models are not compatible with each other.
1

Choose a Model

Browse models on Hugging Face:Look for:
  • Dimensions: 384-768 (smaller = faster)
  • Language: Match your docs (multilingual, en, etc.)
  • Size: Smaller models = faster inference
2

Update Configuration

# Set model and dimensions
openground config set embeddings.embedding_model "sentence-transformers/all-MiniLM-L6-v2"
openground config set embeddings.embedding_dimensions 384
3

Delete Existing Embeddings

# Remove all embedded data
openground nuke embeddings
This deletes the LanceDB table but preserves raw documentation.
4

Re-embed Documentation

# Re-embed all libraries
openground embed
This processes all raw data with the new model.

Model Compatibility Validation

OpenGround stores embedding metadata in the LanceDB schema (from ingest.py:159-177):
schema = pa.schema(
    [
        # ... fields ...
        pa.field("vector", pa.list_(pa.float32(), embedding_dimensions)),
    ],
    metadata={
        "embedding_backend": embedding_backend,
        "embedding_model": embedding_model,
    }
)
When adding new documentation, OpenGround validates the model matches (from ingest.py:111-142):
def _validate_table_metadata(table: Table, backend: str, model: str) -> None:
    stored_metadata = _get_table_metadata(table)
    stored_backend = stored_metadata["embedding_backend"]
    stored_model = stored_metadata["embedding_model"]
    
    if stored_backend != backend or stored_model != model:
        raise ValueError(
            f"Embedding configuration mismatch detected!\n\n"
            f"This table was created with:\n"
            f"  Backend: {stored_backend}\n"
            f"  Model: {stored_model}\n\n"
            f"Current configuration is:\n"
            f"  Backend: {backend}\n"
            f"  Model: {model}\n\n"
            f"To resolve this, you can:\n"
            f"  1. Change your config to match the table's original settings\n"
            f"  2. Run `openground nuke embeddings` and then `openground embed`\n"
        )
This prevents mixing embeddings from different models, which would break search quality.

Embedding Generation

From embeddings.py:207-234, the main generation function:
def generate_embeddings(
    texts: Iterable[str],
    show_progress: bool = True,
) -> list[list[float]]:
    """Generate embeddings for documents using the specified backend."""
    
    config = get_effective_config()
    backend = config["embeddings"]["embedding_backend"]
    
    if backend == "fastembed":
        return _generate_embeddings_fastembed(texts, show_progress)
    elif backend == "sentence-transformers":
        return _generate_embeddings_sentence_transformers(texts, show_progress)
    else:
        raise ValueError(f"Invalid embedding backend: {backend}")

Batch Processing

Both backends process embeddings in batches for efficiency (from config.py:65):
DEFAULT_BATCH_SIZE = 32
From embeddings.py:119-160 (sentence-transformers example):
def _generate_embeddings_sentence_transformers(
    texts: Iterable[str],
    show_progress: bool = True,
) -> list[list[float]]:
    config = get_effective_config()
    batch_size = config["embeddings"]["batch_size"]  # 32
    model_name = config["embeddings"]["embedding_model"]
    model = get_st_model(model_name)
    
    texts_list = list(texts)
    all_embeddings = []
    
    with tqdm(total=len(texts_list), desc="Generating embeddings") as pbar:
        for i in range(0, len(texts_list), batch_size):
            batch = texts_list[i : i + batch_size]
            batch_embeddings = model.encode(
                sentences=batch,
                batch_size=len(batch),
                normalize_embeddings=True,  # L2 normalization
                convert_to_numpy=True,
                show_progress_bar=False,
            )
            all_embeddings.extend(list(batch_embeddings))
            pbar.update(len(batch))
    
    return all_embeddings
Increase batch_size if you have a GPU with lots of VRAM:
openground config set embeddings.batch_size 64

FastEmbed Passage Embedding

FastEmbed distinguishes between passage (document) and query embeddings (from embeddings.py:163-204):
def _generate_embeddings_fastembed(
    texts: Iterable[str],
    show_progress: bool = True,
) -> list[list[float]]:
    # ...
    model = get_fastembed_model(model_name)
    
    for i in range(0, len(texts_list), batch_size):
        batch = texts_list[i : i + batch_size]
        # Use passage_embed for document chunks
        batch_embeddings = list(model.passage_embed(batch))
        all_embeddings.extend([emb.tolist() for emb in batch_embeddings])
Some models are trained differently for documents vs. queries. FastEmbed uses passage_embed() for document chunks and would use query_embed() for search queries (though OpenGround currently uses passage_embed for both).

Embedding Dimensions

From config.py:60:
DEFAULT_EMBEDDING_DIMENSIONS = 384
Dimension count affects:
Each vector = dimensions × 4 bytes (float32)
# 384 dimensions
384 × 4 bytes = 1,536 bytes per vector

# 768 dimensions (larger model)
768 × 4 bytes = 3,072 bytes per vector (2x storage)

# For 10,000 chunks:
384-dim: ~15 MB
768-dim: ~30 MB
Rule of thumb: Stick with the model’s native dimensions. Don’t try to change dimensions independently from the model.

Configuration Examples

Optimal CPU Performance

# Lightweight FastEmbed with small model
openground config set embeddings.embedding_backend fastembed
openground config set embeddings.embedding_model "BAAI/bge-small-en-v1.5"
openground config set embeddings.embedding_dimensions 384
openground config set embeddings.batch_size 32

GPU Performance (NVIDIA)

# Sentence-Transformers with larger model
openground config set embeddings.embedding_backend sentence-transformers
openground config set embeddings.embedding_model "BAAI/bge-base-en-v1.5"
openground config set embeddings.embedding_dimensions 768
openground config set embeddings.batch_size 64  # Larger batches for GPU

Apple Silicon (M1/M2/M3)

# Sentence-Transformers with MPS acceleration
openground config set embeddings.embedding_backend sentence-transformers
openground config set embeddings.embedding_model "BAAI/bge-small-en-v1.5"
openground config set embeddings.embedding_dimensions 384
openground config set embeddings.batch_size 32
Apple Silicon automatically uses MPS (Metal Performance Shaders) via sentence-transformers. No special configuration needed.

Chunking Strategy

Before embedding, documents are split into chunks (from config.py:66-67):
DEFAULT_CHUNK_SIZE = 800
DEFAULT_CHUNK_OVERLAP = 200
From ingest.py:52-76, using LangChain’s text splitter:
from langchain_text_splitters import RecursiveCharacterTextSplitter

def chunk_document(page: ParsedPage) -> list[dict]:
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=800,
        chunk_overlap=200
    )
    chunks = splitter.split_text(page["content"])
    
    for idx, chunk in enumerate(chunks):
        records.append({
            "content": chunk,
            "chunk_index": idx,
            # ... metadata ...
        })

Why 800 Characters?

  • Context window: Most embedding models handle 512 tokens well
  • 800 chars ≈ 200 tokens: Safe margin for tokenization
  • Not too small: Preserves context
  • Not too large: Enables precise retrieval

Why 200 Character Overlap?

Chunk 1: [------------------------]  (chars 0-800)
Chunk 2:              [------------------------]  (chars 600-1400)
                      ^200 overlap^
Overlap ensures:
  • Information spanning boundaries isn’t lost
  • Better retrieval for queries matching boundary content
  • 25% overlap provides good coverage without excessive duplication

Adjusting Chunking

# Larger chunks (more context, less precise)
openground config set embeddings.chunk_size 1200
openground config set embeddings.chunk_overlap 300

# Smaller chunks (more precise, less context)
openground config set embeddings.chunk_size 512
openground config set embeddings.chunk_overlap 128
Changing chunk settings requires re-embedding:
openground nuke embeddings
openground embed

Model Caching

Both backends use @lru_cache to load models once (from embeddings.py:14 and 93):
@lru_cache(maxsize=1)
def get_st_model(model_name: str):
    return SentenceTransformer(model_name)

@lru_cache(maxsize=1)
def get_fastembed_model(model_name: str, use_cuda: bool = True):
    return TextEmbedding(model_name=model_name, ...)
Models are:
  1. Downloaded from Hugging Face (first run)
  2. Cached locally in ~/.cache/huggingface/
  3. Loaded into memory once per process
  4. Reused for all embedding operations
The first run downloads the model (~100-500MB depending on model). Subsequent runs are instant.

Performance Comparison

Best for: Most users, CPU-only machines
  • ~500 chunks/sec (CPU)
  • Lightweight install
  • Low memory usage
  • No GPU setup hassle
Performance varies by hardware. These are approximate estimates for the default model.

Next Steps