Feature Deep Dive

Knowledge Base Management

How to structure, upload, and maintain your knowledge base so HoverBot delivers accurate, grounded responses with source attribution.

How RAG-Powered Responses Work

user query
  → embedding generation
  → vector similarity search across knowledge chunks
  → top-k relevant chunks retrieved
  → chunks + query passed to LLM with grounding instructions
  → response generated with source attribution
  → confidence score assigned based on chunk relevance

Every response is grounded in your knowledge base content. The system retrieves the most relevant chunks, passes them as context to the language model, and includes source references so users can verify answers.

Supported Content Sources

Source Type	Format	Sync Method
Website pages	URL crawl	Scheduled re-crawl (daily/weekly)
Documents	PDF, DOCX, TXT, Markdown	Upload via dashboard or API
Help center articles	Zendesk Guide, Intercom Articles	API sync with change detection
FAQ databases	CSV, JSON	Bulk upload with structured Q&A pairs
API endpoints	REST JSON	Real-time lookup for dynamic data

Content Structuring Best Practices

Write for retrieval, not just reading

Each document should cover one clear topic. Avoid long pages that mix multiple subjects. The chunking algorithm works best when content is logically organized with descriptive headings.

Include the question in the answer

FAQ-style content that restates the question in the answer text improves retrieval accuracy. Instead of starting with "Yes, you can...", write "You can cancel your subscription by..."

Keep content current

Stale content is the top cause of inaccurate responses. Set up scheduled re-crawls for your website and review the "low-confidence responses" report weekly to identify outdated content.

Use metadata for context

Tag documents with categories, product areas, or audience segments. HoverBot uses metadata to boost relevance when the user's context matches a specific tag.

Chunk-Level Quality Metrics

Retrieval hit rate: Percentage of queries where the correct chunk appears in the top-k results
Source attribution coverage: Percentage of responses that include a verifiable source link
Staleness score: Age of content chunks relative to last update, flagged when exceeding threshold
Gap detection: Queries that return no relevant chunks, indicating missing knowledge base content

Maintenance Workflow

Review the weekly "unresolved queries" report to identify knowledge gaps
Add or update content for the top-5 unresolved query clusters
Re-index updated content (automatic for crawled sources, manual trigger for uploads)
Verify improved responses using the conversation replay tool
Archive outdated content that no longer applies to your product or service

Close-Loop Optimization Technical Architecture