ToolRouter Architecture: Registry, Gateway, Billing, Assets, and Knowledge

ToolRouter is organized so that the registry, not the outer transport, owns the core runtime behavior.

What does the production infrastructure look like?

ToolRouter runs on three platforms: Vercel hosts the Next.js website at toolrouter.com, Railway hosts the API gateway at api.toolrouter.com, and Convex stores API keys, usage records, and account data. Stripe handles all billing — credit balance, metering, invoicing, and credit grants.

Vercel (website)  →  Railway (API gateway)  →  Convex (database)
toolrouter.com       api.toolrouter.com         API keys, usage, accounts
                     ├─ REST API (/v1/*)         ↕
                     ├─ MCP server (/mcp)       Stripe (billing)
                     ├─ Tool execution            Credit balance, metering,
                     └─ Stripe webhooks           invoicing, credit grants

What are the main code layers?

The codebase has five layers: src/core (types, registry, billing, assets, knowledge, rate limiting), src/tools (built-in tool manifests and handlers), src/gateway (MCP server and REST API), src/cli (operator and developer commands), and web/ (Next.js website with docs, catalog, and dashboard).

src/core: types, registry, billing backends, config, keys, ledger, assets, knowledge, rate limiting, circuit breakers, validation
src/tools: built-in tool manifests and handlers
src/gateway: MCP server (stdio + HTTP) and REST API gateway
src/cli: operator and developer commands
web/: Next.js website — docs, tool catalog, rankings, billing dashboard
convex/: cloud database — API keys, usage records, Stripe customer mapping

What happens when a tool is called?

A tool is defined with defineTool() and registered into ToolRegistry. The caller enters through CLI, MCP, or REST. The transport resolves the target, validates input against JSON Schema, runs billing and rate-limit checks, executes the handler, normalizes output, processes assets, and formats the result for delivery.

A tool is defined with defineTool() and registered into ToolRegistry.
The registry validates the manifest and stores the handlers.
A caller enters through CLI, MCP, or REST.
The transport resolves the target tool and skill.
Input is validated against the skill JSON Schema.
Optional billing and rate-limit checks run before execution.
The handler executes with a SkillContext.
The registry normalizes output, usage, and error handling.
Asset post-processing converts local file paths into downloadable asset URLs.
The transport formats the result for CLI, REST, or MCP delivery.

Why is the registry the center of the system?

Because the registry owns the call path, every surface — CLI, MCP, REST, and web — gets the same error model, billing model, result shape, knowledge search, tool composition, and asset rewriting. No behavior diverges between entrypoints.

Because the registry owns the call path, ToolRouter gets shared behavior across every surface:

one error model
one billing model
one result shape
one place for knowledge search
one place for tool composition
one place for asset rewriting

How does billing work at the infrastructure level?

ToolRouter supports two backends: local SQLite for development and Convex + Stripe for production. In production, users purchase credits via Stripe invoices, a webhook creates credit grants, each call fires a meter event (1 unit = $0.001), and Stripe invoices accrued usage at a $2 threshold.

ToolRouter supports two billing backends:

Local SQLite for local development — API keys, credits, and usage stored in ~/.toolrouter/toolrouter.db
Convex + Stripe for production — API keys and usage in Convex, credit balance and metering via Stripe

In production, the flow is:

Users purchase credits via Stripe-hosted invoices (POST /v1/billing/checkout)
On payment, a webhook creates a Stripe credit grant for the purchased amount
Each tool call fires a Stripe meter event (1 unit = $0.001)
Stripe invoices accrued usage at a $2 threshold, deducting from credit grants
Balance checks use Stripe's creditBalanceSummary (accurate within ~$2)

Successful calls record:

the tool and skill used
call latency
raw and marked-up cost
whether BYOK was used

How does asset delivery work?

The asset layer watches for output keys ending in _path. When a handler returns a local file path, the asset store persists it, enriches the response with *_url and *_asset, and serves files over the gateway. MCP can inline small images for agent clients.

The asset layer watches for output keys that end in _path. When a handler returns a local file path:

the asset store persists the file
the response is enriched with *_url and *_asset
local development serves assets over the gateway
MCP can inline small images for agent clients

How does the knowledge (RAG) system work?

Tools with a knowledge/ directory use the built-in chunking and embedding pipeline. Handlers call context.knowledge.search(query, topK) at runtime. This lets tools ship domain guidance without forcing callers to read raw markdown.

Tools with a knowledge/ directory use the built-in chunking and embedding pipeline. At runtime that becomes context.knowledge.search(query, topK) inside the handler.

This lets a tool ship compact domain guidance without forcing every caller to read raw markdown.

How does tool composition work?

Handlers call other tools through context.callTool(toolRef, skill, input). Composition inherits billing and provider keys from the parent call and is bounded by a max depth of 5 levels to prevent infinite loops.

Handlers can call other tools through context.callTool(toolRef, skill, input). Composition inherits billing and provider keys from the parent call and is bounded by a max depth.

How are external API calls made resilient?

External calls are protected by AbortController-based timeouts (10-30s per provider) and circuit breakers that track consecutive failures. After 3-5 failures, the circuit opens and rejects calls for 30-60s, then enters half-open state to probe recovery.

External API calls (Firecrawl, Serper, iTunes) are protected by two layers:

Timeouts — AbortController-based per-request timeouts (10–30s depending on the provider)
Circuit breakers — track consecutive failures per provider. After a threshold (3–5 failures), the circuit opens and rejects calls immediately for a cooldown period (30–60s), then enters half-open state to probe recovery

CORS restricts browser origins to toolrouter.com and localhost ports. Non-browser requests (curl, MCP clients, AI agents) pass through without origin restrictions.

Practical takeaway

If you are debugging behavior, start at the registry and work outward:

manifest definition
registry registration
handler logic
billing and asset post-processing
transport-specific formatting in CLI, REST, or MCP