Technical Documentation

HoverBot Technical Overview

Architecture, data flow, and service-level objectives for production chatbot deployments.

System architecture at a glance

HoverBot uses a policy-first architecture: requests pass compliance and safety gates before model inference, then flow through validation and analytics loops.

Architecture pipeline

Layer 1

Ingress

Auth, rate limit, policy gate

Layer 2

PII / Safety

Entity detection, redaction

Layer 3

Retrieval

Context ranking, prompt build

Layer 4

Model Router

Tier selection, generation

Layer 5

Validation

Compliance, escalation

Layer 6

Audit Loop

Logging, analytics, optimization

Median end-to-end: 1.1s · p95: 2.4s
Policy gates enforce before model inference

Architecture layers

Layer 1

Ingress and policy gate

Requests enter through channel adapters, rate limiting, tenant policy checks, and auth controls.

Layer 2

PII and safety preprocessing

Sensitive entities are detected and redacted before retrieval and generation when policies are enabled.

Layer 3

Retrieval and context assembly

Grounded documents are selected and scored using intent and confidence-aware routing.

Layer 4

Model routing and generation

Conversation requests route to configured model tiers based on complexity and latency targets.

Layer 5

Validation and escalation

Responses pass compliance checks and can escalate to humans on low confidence or safety triggers.

Layer 6

Audit and analytics loop

All critical events are logged to support incident response and weekly optimization cycles.

Data flow

  1. User message -> channel adapter -> tenant auth and policy checks
  2. PII classifier -> redaction policy -> safe payload assembly
  3. Retriever -> ranked context pack -> prompt builder
  4. Model router -> generation tier selection -> response candidate
  5. Response validator -> compliance checks -> escalation if needed
  6. Audit log + analytics events -> weekly optimization backlog

Operational targets

TargetObjectiveNotes
End-user response latency (p95)< 2.5sVaries by channel network and retrieval depth
Production availability99.9% monthly targetSLA commitments available on enterprise contracts
Sustained request throughput120 req/s per region targetAutoscaling profile tuned per tenant tier
Retriever hit rate (top-3)> 88%Measured on intent-labeled validation packs refreshed monthly
PII masking precision / recall0.97 / 0.94 targetEntity-level scoring across multilingual redaction evaluation sets
Low-confidence auto escalation rate8-15%Calibrated to preserve quality while containing manual queue load

Targets are published SLO objectives and reviewed during monthly reliability operations.

Modeling and control notes

  • Retrieval-augmented generation is used for grounded domain responses.
  • Confidence-based model routing balances latency, cost, and response quality.
  • Deterministic fallback and escalation paths are used for sensitive intents.
  • Closed-loop review process uses unresolved conversations to improve weekly releases.

Model selection and evaluation

HoverBot routes traffic across model tiers using a policy + confidence strategy. Fast tiers handle deterministic intents and high-confidence retrieval responses, while advanced tiers handle multi-step reasoning, ambiguous requests, and multilingual edge cases.

Model tier architecture

TierUse caseSelection criteria
Fast tierDeterministic intents, FAQ, order status, high-confidence retrievalConfidence > 0.85, single-turn, low ambiguity
Advanced tierMulti-step reasoning, ambiguous queries, multilingual edge casesConfidence < 0.85, multi-turn, cross-domain context
Safety fallbackPolicy-flagged intents, PII-adjacent queries, payment/fraud topicsDeterministic rules, no model inference -- direct escalation or scripted response

Data provenance

  • Base models are sourced from commercial LLM providers. No customer conversation data is used for base model training.
  • Retrieval-augmented generation grounds responses in tenant-scoped knowledge bases uploaded and managed by each customer.
  • Tenant-scoped fine-tuning is available as an opt-in feature with explicit data processing agreements. Fine-tuning datasets are isolated per tenant and never shared across accounts.
  • Evaluation and benchmark datasets are constructed from anonymized, consent-cleared conversation samples with all PII stripped before use.

Evaluation methodology

  • Model families are benchmarked by intent class: transactional, advisory, and escalation-sensitive flows.
  • Routing thresholds are tuned using offline evaluation and weekly production replay datasets.
  • Quality scoring combines answer correctness, grounded citation coverage, and policy-compliance pass rates.
  • Safety evaluation tracks refusal appropriateness, redaction correctness, and escalation decision quality.
  • Hallucination rate is monitored using a grounding verification pipeline that compares generated claims against source documents in the knowledge base. Current hallucination rate target: < 5% on grounded intents.
  • Safety refusal calibration ensures the model refuses genuinely unsafe requests (target: > 98% refusal rate) while minimizing false refusals on benign queries (target: < 2% false positive rate).

Evaluation packs are refreshed each release cycle using unresolved and low-confidence conversations from the prior week, then replayed across candidate routing configurations before rollout. Full definitions are published in the benchmark methodology white paper.

Performance benchmark snapshot

BenchmarkCurrent RangeMeasurement Window
Grounded answer accuracy89-93%Weekly replay set, high-volume support intents
Retriever relevance (top-3 context)88-92%Intent-labeled corpus across e-commerce, SaaS, real estate domains
PII masking precision / recall0.97 / 0.94Multilingual entity evaluation packs and redaction replay tests
Median / p95 response latency1.1s / 2.4s30-day rolling production telemetry by region
Escalation decision accuracy91-95%Human-review adjudication on low-confidence and sensitive intents

Full benchmark definitions and sampling assumptions are published in the methodology and feature white papers.