OpenGround is an on-device RAG (Retrieval-Augmented Generation) system designed to give AI agents controlled access to documentation. Everything runs locally - no external APIs, no data leaves your machine.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/poweroutlet2/openground/llms.txt
Use this file to discover all available pages before exploring further.
System Overview
OpenGround follows a pipeline architecture with three main stages:Architecture Stages
1. Source Layer
The source layer handles documentation ingestion from multiple source types. See the Sources page for detailed information. Supported Sources:- Git Repositories: Clone and extract documentation from specific branches/tags
- Sitemaps: Crawl and extract web documentation following sitemap.xml
- Local Paths: Process documentation from local directories
extract/git.py: Handles git repository cloning with sparse checkoutextract/sitemap.py: Fetches and parses sitemaps, respects robots.txtextract/local_path.py: Processes local file system pathsextract/common.py: Shared file processing logic
2. Processing Layer
The processing layer transforms raw documentation into searchable chunks.Text Extraction
OpenGround supports multiple documentation formats:Supported file types:
.md, .mdx, .rst, .txt, .ipynb, .html, .htmDocument Chunking
Documents are split into overlapping chunks for better retrieval (fromingest.py:52-76):
Chunk overlap ensures that context isn’t lost at chunk boundaries, improving retrieval quality.
Embedding Generation
Each chunk is converted to a vector embedding using a local model. See Embeddings for details.3. Storage Layer
OpenGround uses LanceDB for storing both vector embeddings and full-text search indices.Why LanceDB?
- Columnar storage: Efficient for vector operations
- Built-in BM25: Full-text search without external dependencies
- Local-first: No server setup required
- PyArrow integration: Fast data serialization
Schema Structure
Fromingest.py:163-177, the LanceDB table schema:
Full-Text Index
After ingesting chunks, OpenGround creates a BM25 full-text search index (fromingest.py:223-226):
4. Query/Client Layer
The client layer exposes documentation through two interfaces:CLI Commands
MCP Server
The Model Context Protocol (MCP) server exposes OpenGround to AI agents:Data Flow Example
Let’s trace a complete flow from adding documentation to searching it:Add Documentation
- Git extractor clones repo with sparse checkout
- Filters for
.md,.mdxfiles indocs/ - Extracts content and metadata
- Saves to
~/.local/share/openground/raw_data/fastapi/v0.100.0/
Chunk & Embed
- Load parsed pages from raw_data directory
- Split each page into 800-character chunks with 200-char overlap
- Generate embeddings for all chunks (batch size: 32)
- Store in LanceDB with metadata
Configuration
OpenGround’s behavior is controlled through a hierarchical configuration system (fromconfig.py):
XDG Compliance
OpenGround follows the XDG Base Directory Specification (fromconfig.py:10-24):
- Config:
$XDG_CONFIG_HOME/opengroundor~/.config/openground - Data:
$XDG_DATA_HOME/opengroundor~/.local/share/openground - Windows: Uses
AppData/Local/openground
Component Isolation
Each component is designed for independence:- Extractors output standardized
ParsedPageobjects - Ingestion works with any
ParsedPagesource - Query operates on LanceDB tables regardless of source
- Embedding backends are swappable (sentence-transformers ↔ fastembed)
- Adding new source types without changing ingestion
- Swapping embedding models without changing extraction
- Independent testing of each component
Next Steps
Sources
Learn how OpenGround extracts documentation from git, sitemaps, and local paths
Embeddings
Understand embedding backends, models, and dimensions
Search
Deep dive into hybrid search with vector similarity and BM25
Configuration
Customize OpenGround’s behavior with config options