Feature Deep Dive

Knowledge Base Management

How to structure, upload, and maintain your knowledge base so HoverBot delivers accurate, grounded responses with source attribution.

How RAG-Powered Responses Work

user query
  → embedding generation
  → vector similarity search across knowledge chunks
  → top-k relevant chunks retrieved
  → chunks + query passed to LLM with grounding instructions
  → response generated with source attribution
  → confidence score assigned based on chunk relevance

Every response is grounded in your knowledge base content. The system retrieves the most relevant chunks, passes them as context to the language model, and includes source references so users can verify answers.

Supported Content Sources

Source TypeFormatSync Method
Website pagesURL crawlScheduled re-crawl (daily/weekly)
DocumentsPDF, DOCX, TXT, MarkdownUpload via dashboard or API
Help center articlesZendesk Guide, Intercom ArticlesAPI sync with change detection
FAQ databasesCSV, JSONBulk upload with structured Q&A pairs
API endpointsREST JSONReal-time lookup for dynamic data

Content Structuring Best Practices

Write for retrieval, not just reading

Each document should cover one clear topic. Avoid long pages that mix multiple subjects. The chunking algorithm works best when content is logically organized with descriptive headings.

Include the question in the answer

FAQ-style content that restates the question in the answer text improves retrieval accuracy. Instead of starting with "Yes, you can...", write "You can cancel your subscription by..."

Keep content current

Stale content is the top cause of inaccurate responses. Set up scheduled re-crawls for your website and review the "low-confidence responses" report weekly to identify outdated content.

Use metadata for context

Tag documents with categories, product areas, or audience segments. HoverBot uses metadata to boost relevance when the user's context matches a specific tag.

Chunk-Level Quality Metrics

  • Retrieval hit rate: Percentage of queries where the correct chunk appears in the top-k results
  • Source attribution coverage: Percentage of responses that include a verifiable source link
  • Staleness score: Age of content chunks relative to last update, flagged when exceeding threshold
  • Gap detection: Queries that return no relevant chunks, indicating missing knowledge base content

Maintenance Workflow

  1. Review the weekly "unresolved queries" report to identify knowledge gaps
  2. Add or update content for the top-5 unresolved query clusters
  3. Re-index updated content (automatic for crawled sources, manual trigger for uploads)
  4. Verify improved responses using the conversation replay tool
  5. Archive outdated content that no longer applies to your product or service