Documentation Index Fetch the complete documentation index at: https://mintlify.com/poweroutlet2/openground/llms.txt
Use this file to discover all available pages before exploring further.
Overview
OpenGround can crawl and extract documentation from websites using XML sitemaps. This is ideal for documentation hosted on platforms like Mintlify, Docusaurus, or any site that provides a sitemap.
Basic Usage
Add documentation with sitemap URL
Use the add command with a sitemap URL: openground add library-name \
--source https://docs.example.com/sitemap.xml \
-y
The -y flag skips the confirmation prompt between extract and ingest.
Verify the library was added
List all libraries in your database: openground list-libraries
# or
openground ls
Filtering URLs
Using Filter Keywords
Use --filter-keyword to only extract URLs containing specific strings:
openground add numpy \
--source https://numpy.org/doc/sitemap.xml \
--filter-keyword docs/ \
-y
Multiple Filter Keywords
Specify multiple keywords by using the flag multiple times. URLs matching any keyword will be included:
openground add library-name \
--source https://docs.example.com/sitemap.xml \
--filter-keyword docs \
--filter-keyword blog \
--filter-keyword tutorials \
-y
This will include URLs containing “docs” OR “blog” OR “tutorials”.
No Filtering
If no filter keywords are provided, all URLs from the sitemap are extracted:
# Extracts all URLs from sitemap
openground add library-name \
--source https://docs.example.com/sitemap.xml \
-y
Handling Query Parameters
Trimming Query Parameters
Some sitemaps include duplicate URLs with different query parameters. Use --trim-query-params to avoid duplicates:
openground add library-name \
--source https://docs.example.com/sitemap.xml \
--trim-query-params \
-y
This converts:
https://docs.example.com/page?v=1 → https://docs.example.com/page
https://docs.example.com/page?v=2 → https://docs.example.com/page (deduplicated)
Only use --trim-query-params if the query parameters don’t affect the page content. Some sites use query parameters to render different content.
Version Handling
Sitemap sources always use version “latest” . The --version flag is ignored for sitemap sources.
Sitemap-based documentation is assumed to be the current/latest version. If you need version-specific documentation:
Check if the site has version-specific sitemaps:
openground add mylib-v1 --source https://v1.docs.example.com/sitemap.xml -y
openground add mylib-v2 --source https://v2.docs.example.com/sitemap.xml -y
Use git repositories instead (if available) for proper version management.
Concurrency Control
By default, OpenGround uses the concurrency limit from your config. You can override it:
# Set extraction concurrency in config
openground config set extraction.concurrency_limit 20
# View current setting
openground config get extraction.concurrency_limit
Higher concurrency = faster extraction, but may overwhelm some servers.
All Available Flags
openground add LIBRARY [OPTIONS]
Arguments
LIBRARY - Name of the library (required)
Options
--source, -s TEXT - Root sitemap URL (e.g., https://docs.example.com/sitemap.xml)
--filter-keyword, -f TEXT - Filter for URLs (can be specified multiple times)
--trim-query-params - Remove query parameters from URLs to avoid duplicates
--yes, -y - Skip confirmation prompt between extract and ingest
--sources-file TEXT - Path to a custom sources.json file
The following flags are for git sources only and are ignored for sitemaps:
--version, -v - Sitemaps always use “latest”
--docs-path, -d - Not applicable to sitemaps
Using Sources Files
When you add documentation with --source, OpenGround automatically saves the configuration to ~/.openground/sources.json:
First time: Add with source
openground add numpy \
--source https://numpy.org/doc/sitemap.xml \
--filter-keyword docs/ \
-y
This saves the configuration including filter keywords.
Later: Add by name only
# Uses saved configuration
openground add numpy -y
The source URL and filter keywords are retrieved from sources.json.
See Managing sources.json files for more details.
Auto-Detection
OpenGround can auto-detect sitemap sources:
# These are automatically detected as sitemaps
openground add lib1 --source https://docs.example.com/sitemap.xml -y
openground add lib2 --source https://example.com/docs/sitemap_index.xml -y
Detection rules:
URL ends with .xml
URL contains “sitemap” (case-insensitive)
If detection fails, OpenGround defaults to sitemap with a warning
Updating Documentation
To refresh documentation from a sitemap:
openground update library-name -y
This efficiently updates only changed pages by comparing content hashes.
Advanced: Direct Extract Command
For advanced use cases, you can use the extract-sitemap command separately:
openground extract-sitemap \
--sitemap-url https://docs.example.com/sitemap.xml \
--library library-name \
--filter-keyword docs \
--concurrency-limit 10 \
--trim-query-params
Then embed separately:
openground embed library-name --version latest
Examples
Basic Sitemap
Multiple Filters
With Query Param Trimming
From Sources File
openground add numpy \
--source https://numpy.org/doc/sitemap.xml \
--filter-keyword docs/ \
-y
Real-World Example: Mintlify Documentation
# Add Mintlify's own documentation
openground add mintlify \
--source https://mintlify.com/sitemap.xml \
--filter-keyword /docs \
-y
# Query it
openground query "how to add code blocks" --library mintlify
Docusaurus Site
openground add docusaurus \
--source https://docusaurus.io/sitemap.xml \
--filter-keyword /docs/ \
-y
Entire Site Without Filtering
# Useful for small documentation sites
openground add smalldocs \
--source https://smalldocs.example.com/sitemap.xml \
-y
Troubleshooting
Too Many Pages
If extraction pulls in too many irrelevant pages:
Add more specific filter keywords
Use multiple keywords to narrow down
Consider using a git repository source instead if available
# Instead of this (too broad)
openground add lib --source https://example.com/sitemap.xml -y
# Do this (more specific)
openground add lib \
--source https://example.com/sitemap.xml \
--filter-keyword /api-reference/ \
-y
Duplicate URLs
If you’re seeing duplicate pages with different query parameters:
openground add library \
--source https://docs.example.com/sitemap.xml \
--trim-query-params \
-y
Rate Limiting
If the target server is rate-limiting you:
# Reduce concurrency
openground config set extraction.concurrency_limit 5
# Then try again
openground add library --source https://docs.example.com/sitemap.xml -y