Technical Documentation

HoverBot Technical Overview

Architecture, data flow, and service-level objectives for production chatbot deployments.

System architecture at a glance

HoverBot uses a policy-first architecture: requests pass compliance and safety gates before model inference, then flow through validation and analytics loops.

Architecture pipeline

Layer 1

Ingress

Auth, rate limit, policy gate

Layer 2

PII / Safety

Entity detection, redaction

Layer 3

Retrieval

Context ranking, prompt build

Layer 4

Model Router

Tier selection, generation

Layer 5

Validation

Compliance, escalation

Layer 6

Audit Loop

Logging, analytics, optimization

Median end-to-end: 1.1s · p95: 2.4s

Policy gates enforce before model inference

Architecture layers

Layer 1

Ingress and policy gate

Requests enter through channel adapters, rate limiting, tenant policy checks, and auth controls.

Layer 2

PII and safety preprocessing

Sensitive entities are detected and redacted before retrieval and generation when policies are enabled.

Layer 3

Retrieval and context assembly

Grounded documents are selected and scored using intent and confidence-aware routing.

Layer 4

Model routing and generation

Conversation requests route to configured model tiers based on complexity and latency targets.

Layer 5

Validation and escalation

Responses pass compliance checks and can escalate to humans on low confidence or safety triggers.

Layer 6

Audit and analytics loop

All critical events are logged to support incident response and weekly optimization cycles.

Data flow

User message -> channel adapter -> tenant auth and policy checks
PII classifier -> redaction policy -> safe payload assembly
Retriever -> ranked context pack -> prompt builder
Model router -> generation tier selection -> response candidate
Response validator -> compliance checks -> escalation if needed
Audit log + analytics events -> weekly optimization backlog

Operational targets

Target	Objective	Notes
End-user response latency (p95)	< 2.5s	Varies by channel network and retrieval depth
Production availability	99.9% monthly target	SLA commitments available on enterprise contracts
Sustained request throughput	120 req/s per region target	Autoscaling profile tuned per tenant tier
Retriever hit rate (top-3)	> 88%	Measured on intent-labeled validation packs refreshed monthly
PII masking precision / recall	0.97 / 0.94 target	Entity-level scoring across multilingual redaction evaluation sets
Low-confidence auto escalation rate	8-15%	Calibrated to preserve quality while containing manual queue load

Targets are published SLO objectives and reviewed during monthly reliability operations.

Modeling and control notes

Retrieval-augmented generation is used for grounded domain responses.
Confidence-based model routing balances latency, cost, and response quality.
Deterministic fallback and escalation paths are used for sensitive intents.
Closed-loop review process uses unresolved conversations to improve weekly releases.

Model selection and evaluation

HoverBot routes traffic across model tiers using a policy + confidence strategy. Fast tiers handle deterministic intents and high-confidence retrieval responses, while advanced tiers handle multi-step reasoning, ambiguous requests, and multilingual edge cases.

Model tier architecture

Tier	Use case	Selection criteria
Fast tier	Deterministic intents, FAQ, order status, high-confidence retrieval	Confidence > 0.85, single-turn, low ambiguity
Advanced tier	Multi-step reasoning, ambiguous queries, multilingual edge cases	Confidence < 0.85, multi-turn, cross-domain context
Safety fallback	Policy-flagged intents, PII-adjacent queries, payment/fraud topics	Deterministic rules, no model inference -- direct escalation or scripted response

Data provenance

Base models are sourced from commercial LLM providers. No customer conversation data is used for base model training.
Retrieval-augmented generation grounds responses in tenant-scoped knowledge bases uploaded and managed by each customer.
Tenant-scoped fine-tuning is available as an opt-in feature with explicit data processing agreements. Fine-tuning datasets are isolated per tenant and never shared across accounts.
Evaluation and benchmark datasets are constructed from anonymized, consent-cleared conversation samples with all PII stripped before use.

Evaluation methodology

Model families are benchmarked by intent class: transactional, advisory, and escalation-sensitive flows.
Routing thresholds are tuned using offline evaluation and weekly production replay datasets.
Quality scoring combines answer correctness, grounded citation coverage, and policy-compliance pass rates.
Safety evaluation tracks refusal appropriateness, redaction correctness, and escalation decision quality.
Hallucination rate is monitored using a grounding verification pipeline that compares generated claims against source documents in the knowledge base. Current hallucination rate target: < 5% on grounded intents.
Safety refusal calibration ensures the model refuses genuinely unsafe requests (target: > 98% refusal rate) while minimizing false refusals on benign queries (target: < 2% false positive rate).

Evaluation packs are refreshed each release cycle using unresolved and low-confidence conversations from the prior week, then replayed across candidate routing configurations before rollout. Full definitions are published in the benchmark methodology white paper.

Performance benchmark snapshot

Benchmark	Current Range	Measurement Window
Grounded answer accuracy	89-93%	Weekly replay set, high-volume support intents
Retriever relevance (top-3 context)	88-92%	Intent-labeled corpus across e-commerce, SaaS, real estate domains
PII masking precision / recall	0.97 / 0.94	Multilingual entity evaluation packs and redaction replay tests
Median / p95 response latency	1.1s / 2.4s	30-day rolling production telemetry by region
Escalation decision accuracy	91-95%	Human-review adjudication on low-confidence and sensitive intents

Full benchmark definitions and sampling assumptions are published in the methodology and feature white papers.

Explore subsystem deep dives

Guardrails and PII Masking Smart Routing and Escalation Analytics Close-Loop Optimization Trust Center API Reference PII Masking White Paper