Make a smart router the center of your chatbot

Strong chatbots rely on routing. Pick rules, classifiers, and models that meet the goal with minimal risk and cost.

Why routing matters now

Modern chatbots do not run on a single prompt. They sit on top of documents and websites, and they connect to live systems such as APIs, CRMs, calendars, and internal SaaS. Some steps are deterministic policy checks. Some are handled by small local models. A few need a large hosted model. Token budgets, latency budgets, and data rules shape every decision.

A router decides, for each request, what to run, where to run it, and in what order. When routing is correct, quality, cost, and security improve together.

Rules and MCP tool protocols: helpful, with limits

Hand written rules are useful for simple and repeatable checks. For example, if the topic is billing, call the billing API. They break when phrasing drifts.

Model Context Protocol, or MCP, exposes tools to a model so it can call them during generation. MCP reduces glue code and speeds iteration. It also creates trade offs that you must plan for. The model decides when to call tools, and behavior can vary across models and versions. Multi step plans are harder to audit and reproduce. Prompts and parameters may leave your perimeter, which can block sensitive flows. Tool heavy thinking can increase token use and add hops.

Treat MCP as one integration option inside your router. Use it in low risk areas where speed of iteration matters. Do not treat it as the default for everything.

Classifiers fill the middle ground

Lightweight ML classifiers give fast, auditable gates that run inside your boundary. They fit PII detection and redaction, allow or deny topic checks, and coarse intent. The router can then pick the cheapest viable path. Classifiers need training data, versioning, and an inference service. The payoff is predictability and lower cost.

A practical routing pattern

Most real systems settle on three lanes.

1. Safety lane. Local rules and classifiers for PII, policy, and permissions.

2. Fast lane. A small model for social turns such as greetings and acknowledgements, and for short FAQs.

3. Heavy lane. Retrieval plus a frontier model for complex answers that need synthesis or reasoning.

The router scores sensitivity, difficulty, and budget. It plans data access, executes, validates the output, and falls back if confidence is low.

A single request can touch more than one lane. Example. A user asks, "What is our refund policy for pre orders, and can you cancel order 4832?" The policy answer comes from retrieval on public docs. The cancellation stays on deterministic rails inside your network with strict parameter checks and an idempotent API call. The user sees one reply, and two routes run under different rules.

Security and cost are design inputs

Sensitive transactions belong on deterministic paths. Small talk should not touch a frontier model. Complex synthesis sometimes should, but only when the expected value beats the cost. Good routers encode these choices so teams do not debate them on every change.

How we do it at HoverBot

We run a layered router with clear ownership per step:

Local classifiers for PII and guardrails. Tenant policy applies before any external call.
Small LLMs for greetings, confirmations, and other phatic language. This keeps latency and spend down.
Primary LLM with retrieval for real questions that need synthesis across knowledge bases, tenant memory, and business context.
Deterministic flows for sensitive actions such as create, cancel, and update. These stay inside the customer boundary with strict validation and audit logs.

Operationally, we keep the router simple and stable. We use versioned routes, feature flags for new paths, per tenant token and latency budgets, and structured logs that explain why a path was chosen. We do not use MCP in production because it is unpredictable across models and it limits control and auditability.

Operating and improving the router

A router you cannot see is a router you cannot tune. Track route mix, latency and tokens by lane, confidence and fallback rates, and human handoffs by topic. These signals show where to move traffic from the heavy lane to cheaper lanes without hurting quality.

Introduce changes behind flags and compare against a control. Pin model versions per route and roll forward intentionally. Enforce budgets per tenant and per lane.

Getting started without overbuilding

Begin with the three lanes above and a simple difficulty score. Add basic intent classification and PII redaction. Keep one small model for social turns. Use your frontier model with retrieval for hard questions only. Log routing decisions in plain language. As patterns emerge, promote common cases from the heavy lane to cheaper lanes.

Bottom line

Reliable chatbots are routed, not just prompted. Smart routing uses rules for safety, classifiers for control, and selective model use for quality. This delivers better answers under real constraints of privacy, cost, and speed. The design is easy to describe and, once instrumented, easy to operate. The work is to keep each request on the smallest path that gets the job done.