Quick Start
If you have an NVIDIA GPU:GPU Detection
OpenGround performs automatic GPU hardware detection using two methods (embeddings.py:28-41):
- First attempts to run
nvidia-smi(NVIDIA System Management Interface) - On Linux, falls back to checking for
/dev/nvidia0device file - Returns
Trueonly if GPU hardware is detected
Compatibility Checking
OpenGround performs comprehensive compatibility checking on startup (embeddings.py:44-91):
- GPU Hardware: Is an NVIDIA GPU physically present?
- GPU Package: Is
fastembed-gpuinstalled? - Functional GPU: Is CUDA properly configured in onnxruntime?
Compatibility Scenarios
Scenario 1: GPU Detected, No GPU Package
Scenario 2: GPU Package, No GPU Hardware
Scenario 3: GPU Package + Hardware, But CUDA Non-Functional
Fastembed GPU Implementation
Fastembed uses ONNX Runtime’s CUDA execution provider (embeddings.py:93-116):
- Uses
CUDAExecutionProviderfor GPU acceleration - Automatically falls back to CPU if GPU initialization fails
- Calls
check_gpu_compatibility()on failure to show helpful diagnostics
CUDA Setup Requirements
Prerequisites
- NVIDIA GPU (compute capability 3.5 or higher)
- NVIDIA Driver (compatible with your GPU)
- CUDA Toolkit (version compatible with onnxruntime-gpu)
- cuDNN (version compatible with onnxruntime-gpu)
Checking Your Setup
CUDAExecutionProvider in the list of providers.
Version Compatibility
Common compatible versions:- ONNX Runtime 1.16+: CUDA 11.8 or 12.x, cuDNN 8.x
- ONNX Runtime 1.15: CUDA 11.6/11.7, cuDNN 8.x
Sentence-Transformers GPU Support
Sentence-transformers has broader GPU compatibility since it uses PyTorch:Performance Optimization
Batch Size Tuning
GPUs benefit from larger batch sizes:- CPU: 16-32
- GPU (8GB VRAM): 64-128
- GPU (16GB+ VRAM): 128-256
Memory Considerations
GPU memory usage depends on:- Model size: Larger embedding models need more VRAM
- Batch size: Larger batches need more VRAM
- Sequence length: Longer documents need more VRAM
- Reduce
batch_size - Use a smaller embedding model
- Chunk documents into smaller pieces
Performance Comparison
Typical speedups with GPU acceleration:| Setup | Speed (docs/sec) | Relative |
|---|---|---|
| CPU (8 cores) | ~50-100 | 1x |
| GPU (fastembed) | ~500-1000 | 10x |
| GPU (sentence-transformers) | ~800-1500 | 15x |
Actual performance varies based on hardware, model size, document length, and batch size.
Troubleshooting
”CUDAExecutionProvider not available”
Cause: onnxruntime-gpu is not properly installed or CUDA is misconfigured Solutions:- Verify CUDA installation:
nvidia-smi - Reinstall onnxruntime-gpu:
pip install --force-reinstall onnxruntime-gpu - Check CUDA version compatibility
- Install matching cuDNN version
”CUDA out of memory”
Cause: Batch size or model too large for GPU VRAM Solutions:- Reduce
batch_sizein config - Use a smaller embedding model
- Close other GPU-intensive applications
”GPU not detected”
Cause: nvidia-smi not found or GPU driver issue Solutions:- Install/update NVIDIA drivers
- Verify GPU is recognized:
lspci | grep -i nvidia - Check if GPU is enabled in BIOS/UEFI
Slow Performance Despite GPU
Causes:- Batch size too small (GPU underutilized)
- Data transfer bottleneck
- CPU preprocessing overhead
- Increase batch size gradually
- Profile with
nvidia-smi dmonduring indexing - Ensure SSD storage for faster I/O
Environment Variables
Useful CUDA-related environment variables:Next Steps
- Learn about Embedding Backends to choose between fastembed and sentence-transformers
- Explore Hybrid Search to understand how embeddings are used in queries