Feature Deep Dive

Guardrails and PII Masking

How to enforce topic boundaries, redact sensitive data, and preserve user trust in customer-facing conversations.

Guardrail Layers

  • Pre-response policy checks for disallowed topics and intents
  • PII detection and redaction before model inference
  • Post-response validation for policy and compliance violations
  • Escalation to human support on confidence or safety threshold breach

PII Masking Pattern

input -> piiClassifier -> redaction -> model -> responseValidator -> output

examples:
- email@example.com -> [EMAIL_REDACTED]
- +1 555 0100 -> [PHONE_REDACTED]

Operational Metrics

  • Redaction hit rate by channel and use case
  • False positive/negative review outcomes
  • Safety-triggered escalation frequency
  • Policy drift detected in weekly transcript audits