search_documents_tool

Overview

The search_documents_tool performs hybrid semantic search (combining vector embeddings and BM25) against the OpenGround documentation database to find relevant content from official library documentation.

Always call list_libraries_tool first to verify which libraries and versions are available before searching.

Parameters

query

string

required

The search query text. This will be used for both semantic (embedding-based) and keyword (BM25) search.Example: “How do I configure async timeout settings?”

library_name

string

required

The name of the library to search within. Must match exactly (case-sensitive) with available libraries.Example: “fastapi”, “django”, “react”

version

string

required

The version of the library documentation to search. Must be an exact match with available versions.Example: “3.11.0”, “latest”, “v4.5.2”

Return Format

Returns a formatted string containing search results. The format varies based on whether matches are found:

Successful Search

Found {count} match(es).
1. **{title}**: "{content_snippet}" (Source: {url}, Version: {version}, score={similarity_score})
   To get full page content: {"tool": "get_full_content", "url": "{url}", "version": "{version}"}
2. **{title}**: "{content_snippet}" (Source: {url}, Version: {version}, score={similarity_score})
   To get full page content: {"tool": "get_full_content", "url": "{url}", "version": "{version}"}
...

No Matches

Found 0 matches.

Library Not Found

Library '{library_name}' not found. Available libraries: {comma_separated_list}

Version Not Found

Version '{version}' not found for library '{library_name}'. Available versions: {comma_separated_list}

Response Fields

Each search result includes:

title

string

The title of the documentation page or section

content

string

The full text content of the matching chunk (not truncated)

url

string

The source URL of the documentation page

version

string

The version of the documentation

score

number

The relevance score from the hybrid search algorithm (lower is more relevant for distance-based metrics)

tool_hint

json

A JSON object with parameters for calling get_full_content_tool to retrieve the complete page content

Example Usage

Basic Search

{
  "query": "How do I handle authentication middleware?",
  "library_name": "fastapi",
  "version": "0.104.0"
}

Response:

Found 3 matches.
1. **Security - First Steps**: "OAuth2 with Password (and hashing), Bearer with JWT tokens. You can use OAuth2 with password flow for authentication..." (Source: https://fastapi.tiangolo.com/tutorial/security/first-steps/, Version: 0.104.0, score=0.2341)
   To get full page content: {"tool": "get_full_content", "url": "https://fastapi.tiangolo.com/tutorial/security/first-steps/", "version": "0.104.0"}
2. **Middleware**: "You can add middleware to FastAPI applications. A middleware is a function that works with every request..." (Source: https://fastapi.tiangolo.com/tutorial/middleware/, Version: 0.104.0, score=0.2876)
   To get full page content: {"tool": "get_full_content", "url": "https://fastapi.tiangolo.com/tutorial/middleware/", "version": "0.104.0"}
...

Invalid Library

{
  "query": "async functions",
  "library_name": "nonexistent-lib",
  "version": "1.0.0"
}

Response:

Library 'nonexistent-lib' not found. Available libraries: django, fastapi, flask, react, vue

How It Works

Hybrid Search Algorithm

The tool uses a two-pronged search approach:

Semantic Search: Generates embeddings for the query using BAAI/bge-small-en-v1.5 (384 dimensions) and performs vector similarity search
BM25 Keyword Search: Performs traditional keyword matching using BM25 algorithm
Fusion: Combines results from both approaches to provide more accurate results

Caching Behavior

Library metadata: Cached after first call to list_libraries_with_versions
Embedding model: Pre-loaded in background thread during server startup
Database connection: Connection pooling for efficient query execution

Query Configuration

The number of results returned is controlled by the top_k configuration:

Default: 5 results
Configurable via ~/.config/openground/config.json under query.top_k
Maximum recommended: 20 results for optimal performance

Best Practices

Always verify library and version availability first:

Call list_libraries_tool to get available options
Select the appropriate library and version
Then call search_documents_tool with exact matches

Library names and versions are case-sensitive. Ensure exact matches to avoid “not found” errors.

Effective Query Writing

Be specific: “JWT token validation” instead of “security”
Use technical terms: “async/await” instead of “asynchronous code”
Include context: “React hooks useState” instead of just “hooks”
Avoid over-qualification: Don’t include the library name in the query (it’s already filtered)

Following Up on Results

When you need more context from a search result:

Use the embedded tool_hint JSON object
Call get_full_content_tool with the provided url and version
This retrieves the complete documentation page (all chunks reassembled)

Troubleshooting

Empty Results

If you get Found 0 matches:

Try broader search terms
Check if you’re searching the correct library version
Verify the documentation actually covers that topic
Try searching related terms or concepts

Slow Performance

If searches are taking too long:

Wait for background initialization to complete (check server logs)
Reduce top_k in configuration
Ensure database is on SSD storage
Check if embedding model is cached (first query pre-loads it)

Accuracy Issues

If results aren’t relevant:

Refine your query to be more specific
Try using exact technical terminology from the documentation
Increase top_k to see more results
Consider that the content might not exist in that version’s documentation

​Overview

​Parameters

​Return Format

​Successful Search

​No Matches

​Library Not Found

​Version Not Found

​Response Fields

​Example Usage

​Basic Search

​Invalid Library

​How It Works

​Hybrid Search Algorithm

​Caching Behavior

​Query Configuration

​Best Practices

​Effective Query Writing

​Following Up on Results

​Troubleshooting

​Empty Results

​Slow Performance

​Accuracy Issues

Overview

Parameters

Return Format

Successful Search

No Matches

Library Not Found

Version Not Found

Response Fields

Example Usage

Basic Search

Invalid Library

How It Works

Hybrid Search Algorithm

Caching Behavior

Query Configuration

Best Practices

Effective Query Writing

Following Up on Results

Troubleshooting

Empty Results

Slow Performance

Accuracy Issues