Knowledge Base Management
How to structure, upload, and maintain your knowledge base so HoverBot delivers accurate, grounded responses with source attribution.
How RAG-Powered Responses Work
user query → embedding generation → vector similarity search across knowledge chunks → top-k relevant chunks retrieved → chunks + query passed to LLM with grounding instructions → response generated with source attribution → confidence score assigned based on chunk relevance
Every response is grounded in your knowledge base content. The system retrieves the most relevant chunks, passes them as context to the language model, and includes source references so users can verify answers.
Supported Content Sources
| Source Type | Format | Sync Method |
|---|---|---|
| Website pages | URL crawl | Scheduled re-crawl (daily/weekly) |
| Documents | PDF, DOCX, TXT, Markdown | Upload via dashboard or API |
| Help center articles | Zendesk Guide, Intercom Articles | API sync with change detection |
| FAQ databases | CSV, JSON | Bulk upload with structured Q&A pairs |
| API endpoints | REST JSON | Real-time lookup for dynamic data |
Content Structuring Best Practices
Write for retrieval, not just reading
Each document should cover one clear topic. Avoid long pages that mix multiple subjects. The chunking algorithm works best when content is logically organized with descriptive headings.
Include the question in the answer
FAQ-style content that restates the question in the answer text improves retrieval accuracy. Instead of starting with "Yes, you can...", write "You can cancel your subscription by..."
Keep content current
Stale content is the top cause of inaccurate responses. Set up scheduled re-crawls for your website and review the "low-confidence responses" report weekly to identify outdated content.
Use metadata for context
Tag documents with categories, product areas, or audience segments. HoverBot uses metadata to boost relevance when the user's context matches a specific tag.
Chunk-Level Quality Metrics
- Retrieval hit rate: Percentage of queries where the correct chunk appears in the top-k results
- Source attribution coverage: Percentage of responses that include a verifiable source link
- Staleness score: Age of content chunks relative to last update, flagged when exceeding threshold
- Gap detection: Queries that return no relevant chunks, indicating missing knowledge base content
Maintenance Workflow
- Review the weekly "unresolved queries" report to identify knowledge gaps
- Add or update content for the top-5 unresolved query clusters
- Re-index updated content (automatic for crawled sources, manual trigger for uploads)
- Verify improved responses using the conversation replay tool
- Archive outdated content that no longer applies to your product or service