HoverBot Technical Overview
Architecture, data flow, and service-level objectives for production chatbot deployments.
System architecture at a glance
HoverBot uses a policy-first architecture: requests pass compliance and safety gates before model inference, then flow through validation and analytics loops.
Architecture pipeline
Layer 1
Ingress
Auth, rate limit, policy gate
Layer 2
PII / Safety
Entity detection, redaction
Layer 3
Retrieval
Context ranking, prompt build
Layer 4
Model Router
Tier selection, generation
Layer 5
Validation
Compliance, escalation
Layer 6
Audit Loop
Logging, analytics, optimization
Architecture layers
Layer 1
Ingress and policy gate
Requests enter through channel adapters, rate limiting, tenant policy checks, and auth controls.
Layer 2
PII and safety preprocessing
Sensitive entities are detected and redacted before retrieval and generation when policies are enabled.
Layer 3
Retrieval and context assembly
Grounded documents are selected and scored using intent and confidence-aware routing.
Layer 4
Model routing and generation
Conversation requests route to configured model tiers based on complexity and latency targets.
Layer 5
Validation and escalation
Responses pass compliance checks and can escalate to humans on low confidence or safety triggers.
Layer 6
Audit and analytics loop
All critical events are logged to support incident response and weekly optimization cycles.
Data flow
- User message -> channel adapter -> tenant auth and policy checks
- PII classifier -> redaction policy -> safe payload assembly
- Retriever -> ranked context pack -> prompt builder
- Model router -> generation tier selection -> response candidate
- Response validator -> compliance checks -> escalation if needed
- Audit log + analytics events -> weekly optimization backlog
Operational targets
| Target | Objective | Notes |
|---|---|---|
| End-user response latency (p95) | < 2.5s | Varies by channel network and retrieval depth |
| Production availability | 99.9% monthly target | SLA commitments available on enterprise contracts |
| Sustained request throughput | 120 req/s per region target | Autoscaling profile tuned per tenant tier |
| Retriever hit rate (top-3) | > 88% | Measured on intent-labeled validation packs refreshed monthly |
| PII masking precision / recall | 0.97 / 0.94 target | Entity-level scoring across multilingual redaction evaluation sets |
| Low-confidence auto escalation rate | 8-15% | Calibrated to preserve quality while containing manual queue load |
Targets are published SLO objectives and reviewed during monthly reliability operations.
Modeling and control notes
- Retrieval-augmented generation is used for grounded domain responses.
- Confidence-based model routing balances latency, cost, and response quality.
- Deterministic fallback and escalation paths are used for sensitive intents.
- Closed-loop review process uses unresolved conversations to improve weekly releases.
Model selection and evaluation
HoverBot routes traffic across model tiers using a policy + confidence strategy. Fast tiers handle deterministic intents and high-confidence retrieval responses, while advanced tiers handle multi-step reasoning, ambiguous requests, and multilingual edge cases.
Model tier architecture
| Tier | Use case | Selection criteria |
|---|---|---|
| Fast tier | Deterministic intents, FAQ, order status, high-confidence retrieval | Confidence > 0.85, single-turn, low ambiguity |
| Advanced tier | Multi-step reasoning, ambiguous queries, multilingual edge cases | Confidence < 0.85, multi-turn, cross-domain context |
| Safety fallback | Policy-flagged intents, PII-adjacent queries, payment/fraud topics | Deterministic rules, no model inference -- direct escalation or scripted response |
Data provenance
- Base models are sourced from commercial LLM providers. No customer conversation data is used for base model training.
- Retrieval-augmented generation grounds responses in tenant-scoped knowledge bases uploaded and managed by each customer.
- Tenant-scoped fine-tuning is available as an opt-in feature with explicit data processing agreements. Fine-tuning datasets are isolated per tenant and never shared across accounts.
- Evaluation and benchmark datasets are constructed from anonymized, consent-cleared conversation samples with all PII stripped before use.
Evaluation methodology
- Model families are benchmarked by intent class: transactional, advisory, and escalation-sensitive flows.
- Routing thresholds are tuned using offline evaluation and weekly production replay datasets.
- Quality scoring combines answer correctness, grounded citation coverage, and policy-compliance pass rates.
- Safety evaluation tracks refusal appropriateness, redaction correctness, and escalation decision quality.
- Hallucination rate is monitored using a grounding verification pipeline that compares generated claims against source documents in the knowledge base. Current hallucination rate target: < 5% on grounded intents.
- Safety refusal calibration ensures the model refuses genuinely unsafe requests (target: > 98% refusal rate) while minimizing false refusals on benign queries (target: < 2% false positive rate).
Evaluation packs are refreshed each release cycle using unresolved and low-confidence conversations from the prior week, then replayed across candidate routing configurations before rollout. Full definitions are published in the benchmark methodology white paper.
Performance benchmark snapshot
| Benchmark | Current Range | Measurement Window |
|---|---|---|
| Grounded answer accuracy | 89-93% | Weekly replay set, high-volume support intents |
| Retriever relevance (top-3 context) | 88-92% | Intent-labeled corpus across e-commerce, SaaS, real estate domains |
| PII masking precision / recall | 0.97 / 0.94 | Multilingual entity evaluation packs and redaction replay tests |
| Median / p95 response latency | 1.1s / 2.4s | 30-day rolling production telemetry by region |
| Escalation decision accuracy | 91-95% | Human-review adjudication on low-confidence and sensitive intents |
Full benchmark definitions and sampling assumptions are published in the methodology and feature white papers.