PII Filter Guide for ToolRouter

A team-level toggle that scans every tool call for customer PII. When it's on, the filter blocks PII from leaving your team's tools (an agent can't accidentally send a customer's email into a third-party API) and redacts PII coming back from tools before the agent or your logs see it. When it's off, nothing changes — no scanning, no overhead, no behavior difference.

This doc explains what the filter catches, what it doesn't, how to turn it on, and how it affects performance. It's plain English. Implementation details live in docs/plans/2026-04-23-pii-filter-design.md for engineers who need them.

The problem this solves

Your team uses connected tools — CRM lookups, web search, email tools, scrapers — and your agents drive them. By default, those agents see whatever the tools return: customer names, emails, phone numbers, addresses, account numbers. They also pass whatever you asked for into the tool's input. If an agent looks up a customer in your CRM, the customer's email lands in the LLM's context. If a teammate then asks the agent to "summarize this contact," the LLM provider gets the email too.

Most of the time that's fine — you control the tools, you control the agent, and the data is yours. But three things go wrong without a filter:

Outbound leak: an agent paraphrases a customer's identifying details into a search query or a third-party API call.
Inbound exposure: raw customer fields land in your audit log, brain memory, debug reports, and worker logs as a side effect of routine calls.
Cross-team blast radius: if anyone on the team turns on a debug share, raw PII can travel further than intended.

The filter is defense in depth for accidental leakage. It is not a full data-loss-prevention product, an anonymization guarantee, or a compliance certification. We're explicit about that — see the limitations below.

What it catches

OpenAI's open-source privacy-filter model classifies each piece of text into one of eight categories. The filter handles each category differently depending on whether the text is going INTO a tool (the call's arguments) or coming OUT of a tool (the response):

Category	In arguments	In responses
Email address	block	redact
Phone number	block	redact
Person name	block	redact
Mailing address	block	redact
Account number (cards, IBAN, similar)	block	redact
Secret (API keys, JWTs, tokens)	block	redact
URL	log only	redact
Date	log only	redact

A "block" means the tool call doesn't run; the agent receives a structured error naming which categories it would have leaked. A "redact" means the response gets a placeholder like [REDACTED:private_email] where the customer data was. Either way, raw values never reach your audit log, brain memory, debug reports, or worker progress updates.

Two categories — URL and date — are too noisy to block on input. URLs are common tool inputs (every web-search call has them) and dates are everywhere. We log them when they appear, but only redact them out of responses where they sit in customer-record positions.

What it doesn't catch

We're upfront about this so you don't get a false sense of security:

Encoded PII (base64, rot13, hex). The model sees encoded text as gibberish.
Paraphrased PII ("jane dot doe at acme dot com," or "the head of customer success at Acme"). If the agent rephrases a name into a description, the description doesn't trip the model.
Names of public figures look identical to customer names. The model can't tell who's a customer and who's a press contact.
Customer IDs that don't follow obvious patterns (cus_abc123, /customers/42) pass through. We do catch a handful of common URL shapes.
Non-Latin scripts are weaker. The model was trained primarily on English-language text.
Text inside images, audio, video, or arbitrary hosted files is not extracted or scanned.

If your threat model requires hard guarantees on any of these, the filter is not enough on its own — you'll want stricter controls at the connector level (read-only API scopes, per-team data isolation in the upstream service, etc.).

How to turn it on

The toggle lives at team settings → security. Only team owners and admins can flip it. The change applies within a few seconds — there's no deploy step.

When on, the filter runs against every tool call from anyone on the team:

Native first-party tools (web search, SEO analysis, etc.)
Connected SaaS accounts (your team's OAuth or API-key connections)
External MCP servers your team has wired up
Async / long-running jobs
Streaming responses
Brain writes
Debug-report submissions

There is no per-tool exemption in v1 — it's all-or-nothing for the team. (Per-skill exceptions exist for product-required cases like credential_save, but those are hardcoded; you can't customize the list yet.)

Allowlist for your own team's contacts

When you ask an agent to email a teammate ("draft a note to alex@yourcompany.com"), you don't want the filter blocking your own colleague's address. The filter knows about every verified email and phone on your team — synced automatically from Clerk when members sign up — and lets those through both on the way in and on the way out.

So a tool response that echoes your own email back ("you signed up with blake@yourteam.com") shows up unredacted. A response that contains an outsider's email gets redacted to [REDACTED:private_email]. The check happens against a hashed copy of every member's contact info — we never store cleartext lookup tables.

How agents see it

When the filter blocks a call, the agent gets a structured error:

json

{
  "error": "PII_BLOCKED_IN_ARGS",
  "message": "Tool call blocked: contains team member PII (private_email, private_phone). Remove the personal data and retry."
}

The error names the categories so the agent can decide whether to retry without that field, ask the user to confirm, or pick a different tool.

When the filter redacts a response, the agent sees the inline [REDACTED:private_email] markers AND a top-level envelope describing what happened:

json

{
  "...": "...the rest of the tool's normal response...",
  "_pii_filter": {
    "redacted": true,
    "categories": ["private_email", "private_person"],
    "note": "Some fields in this response were redacted by your team's PII filter. Markers like [REDACTED:private_email] mark where customer data was scrubbed before reaching you. Treat redacted values as opaque placeholders, not real strings."
  }
}

The envelope is the unmistakable signal. Even an agent that's never seen the inline markers before reads the envelope and knows the response was filtered.

There's also an X-Pii-Filter HTTP header on REST and JSON-RPC responses (redacted; categories=private_email,private_person or blocked; categories=... or unavailable) for clients that don't parse the body envelope.

Performance impact

When the filter is off, there is no overhead. Tool calls run exactly as they did before — the gateway does one cheap config lookup per call (cached for 5 seconds per team).

When the filter is on, scanning costs depend on how much text the tool returns. Plain numbers from production hardware:

Small responses (a single record, a confirmation message): adds roughly 1-2 seconds.
Large responses like web-search (100+ pieces of text): adds roughly 10-15 seconds.

The filter is built around two ideas that keep that overhead bounded. First, a fast pre-check throws away pieces of text that obviously can't carry PII (URLs, brand labels, very short strings) before they reach the model — typically 60-70% of leaves never need scanning. Second, the model runs in parallel across cores: the gateway batches the remaining pieces into chunks of 16 and dispatches up to 4 chunks at once, so 100 pieces of text don't take 100 × per-piece time.

If a chunk takes too long under heavy load, the filter falls back to a fast regex check for that chunk only. Email, phone, credit-card, and secret patterns still get caught; names and addresses in the affected chunks degrade to "not detected." The user always gets their response within the request's time budget — the filter never hangs the call.

Cost

Hosting the model adds roughly $20-40 per month per replica on Railway. One replica comfortably serves a team of ~10 active users. We add replicas as adoption grows, not preemptively.

There's no per-call charge tied to the filter. Tool calls are billed exactly as they would be without it.

Failure mode

If the model becomes genuinely unreachable (network failure, deploy in flight, etc.), the filter is fail-closed:

Tool calls return PII_FILTER_UNAVAILABLE instead of dispatching.
Tool responses get replaced with a structured stub explaining the filter is down. The original response is withheld.

This is intentional. A filter that silently lets calls through when the model is down would mean teams thinking they're protected when they aren't. After three consecutive failures the filter trips a circuit breaker and starts returning the unavailable error in under a second instead of waiting the full timeout per call — so a sidecar outage degrades quickly to "your tools are blocked" rather than "your tools are slow then blocked." Once the sidecar recovers, the breaker closes on the next successful probe and traffic resumes.

Audit trail

Every blocked call and every redaction writes one row to mcp_audit_log with three columns set:

piiFlagged: true
piiCategories: ['private_email', 'private_person', ...]
piiDirection: 'in' | 'out'

No raw PII values are ever written to the audit log — only the category names. Admins can query the log to see what's been flagged, by which user, against which tool, without seeing the actual customer data that triggered the flag.

When to use it

The filter is most useful for teams whose agents touch customer data and whose tool calls go through providers that could log the request body — third-party search, automation platforms, anything calling an external LLM. If your team is purely calling internal tools you control end-to-end, the filter adds latency without much marginal safety.

For HumanLeap-class deployments (customer-facing AI driving lots of CRM and email tools): turn it on. For internal-only research tools: probably leave it off. You can flip the toggle in either direction with no deploy and no migration.

How Your Team Brain Works — the brain has its own secret-scrubbing layer that runs alongside this filter; with the team toggle on, customer PII is also scrubbed before any page is written.
Integration — how tools are exposed via MCP and REST, and where the filter sits in that flow.