# Tool Authoring

This is the complete guide to building tools for ToolRouter. Everything you need — from scaffolding to publishing — is here.

## What should I build?

The best new tools do more than wrap a single API. They unlock a workflow agents cannot do natively, sit on top of structured live data or real-world actions, and combine enough official sources that the tool becomes more valuable than generic search.

One strong current example is `contract-opportunities`: a procurement intelligence tool that searches public tenders and award history across official sources like [SAM.gov](https://open.gsa.gov/api/get-opportunities-public-api/), [USAspending](https://api.usaspending.gov/docs/endpoints), [TED](https://docs.ted.europa.eu/api/latest/search.html), [Contracts Finder](https://www.contractsfinder.service.gov.uk/apidocumentation), [Public Contracts Scotland](https://api.publiccontractsscotland.gov.uk/), and [World Bank procurement notices](https://financesone.worldbank.org/d/DS00979). That is a better category than another broad search tool because it exposes live revenue opportunities, buyer history, award evidence, set-asides, and repeatable watchlist workflows.

Another strong category is `marketplace-search`: a region-aware tool for second-hand listings, classifieds, auctions, and online product offers. The right shape is not "scrape eBay, Gumtree, and Craigslist." It is a compliance-aware adapter layer over official or licensed sources like [eBay Browse](https://developer.ebay.com/api-docs/buy/browse/overview.html), [Trade Me General Search](https://developer.trademe.co.nz/api-reference/search-methods/general-search), [Rakuten Ichiba Item Search](https://webservice.rakuten.co.jp/index.php/documentation/ichiba-item-search), and region-specific marketplace partners, with sources like [Craigslist](https://www.craigslist.org/about/terms.of.use/en) and [Gumtree](https://www.gumtree.com/termsofuse) only added through written permission, partner feeds, or user-connected imports because their terms explicitly restrict generic automated scraping.

The value in that category comes from normalized filters and workflows that generic search does not provide: `condition`, `listing_class`, `buying_option`, `local_pickup`, `delivery_country`, seller identity, regional routing, cross-source dedupe, and watchlist snapshots. The detailed proposal lives in [`docs/plans/2026-03-28-marketplace-search-tool.md`](../plans/2026-03-28-marketplace-search-tool.md).

Another strong category is `record-collector`: a no-key collector tool for vinyl, CD, cassette, and box-set releases. The right shape is not "search Discogs and call it done." It is a combined release-and-market workflow built on public structured sources like [Discogs API](https://www.discogs.com/developers/), [MusicBrainz API](https://musicbrainz.org/doc/MusicBrainz_API), and [Cover Art Archive](https://coverartarchive.org/), plus a source-governed record-market search layer across global sale domains like eBay, CDandLP, MusicStack, Mercari, Yahoo Auctions Japan, and regional classifieds where search-index discovery is the only clean no-key path.

The value in that category comes from edition-level fields generic search does not normalize reliably: `catalog_number`, `barcode`, `country`, `format`, `master/release` relationships, collector demand (`want` / `have`), and current supply (`num_for_sale`). Those fields unlock edition matching, collection valuation, and automation-friendly wantlist snapshots. The detailed proposal lives in [`docs/plans/2026-03-29-record-collector-tool.md`](../plans/2026-03-29-record-collector-tool.md).

When evaluating similar ideas, prefer source sets with stable identifiers and reusable structure: `ocid`, notice ids, solicitation numbers, `NAICS`, `PSC`, `CPV`, buyer ids, award ids, and document URLs. Those are the fields that let you build dedupe, cross-source joins, fit scoring, and automations instead of just another list of links.

Capture the detailed execution plan for any serious new tool in a dated file under `docs/plans/` before you start implementing.

## How do I scaffold a new tool?

Run `toolrouter init my-tool my_skill` for a basic tool, `--knowledge` for RAG support, or `--plugin` for a standalone npm package. This creates the file structure with a working manifest and TODO placeholders.

Scaffold a new tool:

```bash
toolrouter init my-tool my_skill           # Basic tool with one skill
toolrouter init my-tool --knowledge        # Tool with RAG knowledge base
toolrouter init my-tool --plugin           # Standalone npm plugin package
```

This creates the file structure and a working manifest with TODO placeholders.

## What file structure should a tool use?

Every tool lives in `src/tools/<tool-name>/` with an `index.ts` (defineTool manifest), a `skills/` directory (one handler file per skill), and an optional `knowledge/` directory for RAG markdown files.

Every built-in tool lives in `src/tools/<tool-name>/`:

```
src/tools/my-tool/
├── index.ts              # defineTool() manifest + skill wiring
├── skills/
│   ├── my-skill.ts       # Handler for my_skill
│   └── other-skill.ts    # Handler for other_skill
└── knowledge/            # Optional — RAG markdown files
    └── domain-guide.md
```

## How do I define a tool?

Call `defineTool()` with the tool's name, displayName, subtitle, description, categories, and skills array. Each skill needs a name, description, input JSON Schema, at least one example, and a handler function. Export the `register` function and `manifest`.

The entire tool is defined in a single `defineTool()` call. This is the source of truth for metadata, skills, schemas, examples, requirements, and handlers. Because this metadata appears in discovery, docs, and agent UIs, it should read like product copy for the top-level fields and precise operational copy for schema fields.

### Metadata tone standard

Treat `displayName`, `subtitle`, `description`, `examples[].description`, and `returns` as user-facing product copy. Treat `instructions` as agent-facing operational copy. Treat input property, output property, and requirement descriptions as operational copy. They serve different jobs and should sound different.

Every tool has four text layers, like an App Store listing. These are **hard caps** enforced by `toolrouter validate --strict` — every character must earn its place.

| Field | Audience | Min | Max | Purpose |
|-------|----------|-----|-----|---------|
| `displayName` | Everyone | 3 | 50 | The tool's name |
| `subtitle` | Everyone | 10 | 80 | Short tagline for search results and tool cards |
| `description` | Users | 50 | 300 | What the tool does, who it's for, why it's useful |
| `instructions` | Agents | 50 | 600 | Which skills to use when, workflow order, key tips |

Skill-level caps:

| Field | Min | Max | Purpose |
|-------|-----|-----|---------|
| Skill `description` | 20 | 300 | What this skill does and when to use it |
| Skill `returns` | 10 | 200 | Brief summary of what the skill returns |
| Input param `description` | — | 150 | What the param is and valid values |
| Example `description` | — | 100 | Realistic one-line request |

**Why hard caps?** Every token in a discover response costs the agent context window. A 300-char description is ~75 tokens. A 1500-char description is ~375 tokens — 5x the cost for diminishing returns. Agents skim; they don't read essays. Make every word count.

#### Subtitles

The subtitle is the first thing users and agents see in search results. Think App Store subtitle:

- One line that captures what the tool does
- Focus on the primary capability or value
- No periods, no filler words
- Examples: "Live crypto prices, market data, and trends", "Audit pages and sites for ranking issues"

#### Tool descriptions

The description is the full pitch on the tool detail page:

- 2-4 sentences explaining what the tool does, who it's for, and what value it provides
- Lead with the main value or job to be done
- Mention the most common use cases in plain English
- Prefer what the tool helps someone achieve over how it is implemented
- Do not embed workflow steps, orchestration rules, or long feature inventories
- Do not mention underlying API providers (e.g. don't say "powered by Serper" or "uses ElevenLabs")
- Do not mention pricing, cost, or API key details (e.g. don't say "all skills are free", "no API key needed", "no authentication required", "no credits charged"). The platform handles pricing and auth — descriptions should focus purely on what the tool does and why it's useful.

Good formula: `Help users do X so they can Y. Covers A, B, and C use cases.`

#### Instructions

Instructions are agent-facing only — they don't appear on the website. They help agents get the most out of the tool:

- Which skills to use for which tasks
- Recommended workflows and skill chaining order
- Important parameters to set and when
- Tips for getting the best results
- Caveats or limitations the agent should know about
- Can be substantial (multiple paragraphs) — agents benefit from detail
- Do not mention pricing or API keys here either — no "all skills are free", "no key needed", etc.

Good formula: `Start with <skill> for <use case>. Chain <skill_a> → <skill_b> for <workflow>. Tips: ...`

#### Skill descriptions

Skill descriptions should be more action-oriented than tool descriptions:

- Describe the specific action and when to use it.
- Stay user-facing and readable, but be slightly more precise than the tool summary.
- Avoid raw implementation detail like "returns an array of objects" unless it changes how the skill should be used.

#### Other description fields

- `returns`: summarize the outcome in human terms, not every key in the output schema.
- `examples[].description`: write like a realistic request a user or agent would make.
- Input property descriptions, output property descriptions, and requirement descriptions should stay precise and technical enough to remove ambiguity.

#### Quick examples

| Field | Avoid | Prefer |
|-------|-------|--------|
| Subtitle | "Uses AI to generate images from text" | "AI image generation and editing across 20+ models" |
| Tool description | "Generate and edit images using 50+ AI models..." | "Create or edit images with AI for ads, concepts, product shots, brand visuals, and creative experiments. Supports text-to-image, image-to-image, inpainting, and background removal across multiple styles." |
| Instructions | "Call generate_image with a prompt" | "Start with generate_image for text-to-image. Use edit_image for modifications to existing images. For best results, use descriptive prompts with style keywords. Chain with image-ops for resizing." |
| Skill description | "Search Google News and return an array of news articles..." | "Find recent news stories on a topic when you need timely coverage and source links." |
| `returns` | "Array of image results with title, imageUrl..." | "Image results with titles, links, thumbnails, and source information." |

### Minimal example

```typescript
import { defineTool } from '../../core/define-tool.js';
import { mySkillHandler } from './skills/my-skill.js';

const { register, manifest } = defineTool({
  name: 'my-tool',
  displayName: 'My Tool',
  subtitle: 'Track brand mentions across the web in real time',
  description:
    'Monitor what people are saying about any brand, product, or topic across news, social media, and forums. ' +
    'Ideal for PR teams, marketers, and founders who need to stay on top of coverage and sentiment.',
  instructions:
    'Start with my_skill to get recent mentions for a brand name. ' +
    'Use broad terms for discovery, exact brand names for monitoring. ' +
    'Chain with sentiment analysis tools for deeper insight.',
  categories: ['analytics'],
  changelog: [{ date: '2026-03-22', changes: ['Initial release'] }],
  skills: [{
    name: 'my_skill',
    displayName: 'My Skill',
    description: 'List recent brand mentions for a search term so you can review what people are saying.',
    input: {
      type: 'object',
      properties: {
        param: { type: 'string', description: 'Brand name or search term to look up' },
      },
      required: ['param'],
    },
    examples: [
      { description: 'Find recent mentions for ToolRouter', input: { param: 'ToolRouter' } },
    ],
    returns: 'Recent mentions with the key details needed for review',
    handler: mySkillHandler,
  }],
});

export { register as registerMyTool, manifest };
```

### Full example (real tool)

Here's a condensed version of `magic-screenshots` showing all features:

```typescript
import { defineTool } from '../../core/define-tool.js';
import { captureHandler } from './skills/capture.js';
import { annotateHandler } from './skills/annotate.js';

const { register, manifest } = defineTool({
  name: 'magic-screenshots',
  displayName: 'Magic Screenshots',
  subtitle: 'Capture and annotate polished app store screenshots',
  description:
    'Capture, annotate, and export polished app store screenshots for launches, ASO refreshes, and creative reviews. ' +
    'Supports iPhone, iPad, and desktop viewports with full-page capture and annotation overlays.',
  instructions:
    'Start with capture to take a screenshot of any URL at a specific device size. ' +
    'Then use annotate to add callouts, highlights, or device frames. ' +
    'For App Store submissions, use iphone-15-pro device preset and avoid full_page mode.',
  categories: ['media', 'marketing'],
  changelog: [{ date: '2026-03-22', changes: ['Initial release'] }],

  // Suggested execution order for agents
  workflow: ['capture', 'annotate'],

  // Secrets/credentials this tool needs
  requirements: [
    {
      name: 'fal',
      type: 'secret',
      displayName: 'fal.ai API Key',
      description: 'API key for fal.ai model inference',
      required: true,
      acquireUrl: 'https://fal.ai/dashboard/keys',
      envFallback: 'FAL_KEY',
    },
  ],

  // Optional metadata shown publicly — only include fields you want on the tool page
  homepage: 'https://toolrouter.com/tools/magic-screenshots',

  skills: [
    {
      name: 'capture',
      displayName: 'Capture Screenshot',
      description:
        'Capture a page at a specific device size so you can create store-ready screenshots quickly.',
      contentType: 'image',
      input: {
        type: 'object',
        properties: {
          url: { type: 'string', description: 'URL to capture' },
          device: {
            type: 'string',
            description: 'Device viewport preset',
            enum: ['iphone-15-pro', 'ipad-pro', 'desktop'],
            default: 'iphone-15-pro',
          },
          full_page: {
            type: 'boolean',
            description: 'Capture full scrollable page',
            default: false,
          },
        },
        required: ['url'],
      },
      output: {
        type: 'object',
        properties: {
          image_path: { type: 'string', description: 'Path to saved PNG' },
          device: { type: 'string', description: 'Device preset used' },
        },
      },
      returns: 'Captured screenshot path plus the device metadata used for the export',
      annotations: {
        readOnlyHint: true,
        destructiveHint: false,
        idempotentHint: true,
        openWorldHint: true,
      },
      examples: [
        {
          description: 'Capture an iPhone screenshot of a landing page',
          input: { url: 'https://example.com', device: 'iphone-15-pro' },
        },
        {
          description: 'Capture a full-page desktop screenshot',
          input: { url: 'https://example.com', device: 'desktop', full_page: true },
        },
      ],
      handler: captureHandler,
    },
    // ... more skills
  ],
});

export { register as registerMagicScreenshots, manifest };
```

`author`, `repository`, and `license` are public-facing metadata. If you set them, the tool page can render labels like author, source, and license. Leave them unset unless you intentionally want that metadata shown publicly. Do not copy first-party placeholders like `Humanleap`, repo URLs, or `UNLICENSED` into new tools by default.

## defineTool() reference

### Tool-level fields

| Field | Type | Required | Rules |
|-------|------|----------|-------|
| `name` | string | Yes | kebab-case, e.g. `magic-screenshots` |
| `displayName` | string | Yes | Min 3 chars. The tool's title |
| `subtitle` | string | Yes | 10-35 chars. Short tagline for search results and cards |
| `description` | string | Yes | Min 50 chars. Detailed user-facing description for the tool page |
| `instructions` | string | Yes | Min 50 chars. Agent-facing guide — best practices, skill selection, chaining tips |
| `categories` | string[] | Yes | Min 1, from allowed list (see below) |
| `changelog` | ChangelogEntry[] | Yes | Min 1 entry. Version auto-computed from length |
| `skills` | SkillConfig[] | Yes | Min 1 skill |
| `workflow` | string[] | No | Suggested skill execution order |
| `requirements` | ToolRequirement[] | No | Secrets and credentials |
| `author` | object | No | Public author info (name, url, email). Leave unset unless you want an author line on the tool page |
| `homepage` | string | No | Valid URL |
| `repository` | string | No | Public source URL. Leave unset unless you want a Source link on the tool page |
| `license` | string | No | SPDX identifier. Leave unset unless you want a public license label on the tool page |
| `icon` | string | No | Valid URL to square image |
| `knowledge` | KnowledgeToolConfig | No | RAG config |
| `currency` | string | No | 3-char code, defaults to `USD` |

### Allowed categories

```
data, media, search, marketing, development, communication,
analytics, productivity, ai, finance, security, infrastructure
```

### Skill-level fields

| Field | Type | Required | Rules |
|-------|------|----------|-------|
| `name` | string | Yes | snake_case, e.g. `analyze_page` |
| `displayName` | string | Yes | Min 3 chars |
| `description` | string | Yes | Min 30 chars. Action-oriented summary of what the skill helps do |
| `input` | JsonSchema | Yes | Every property must have `description` |
| `output` | JsonSchema | No | Recommended for publishability |
| `handler` | SkillHandler | Yes | Async function |
| `examples` | SkillExample[] | Yes | Min 1 (2+ recommended) |
| `returns` | string | No | Min 10 chars if provided. Describe the outcome, not every field |
| `contentType` | `'json' \| 'image'` | No | Defaults to `json` |
| `annotations` | ToolAnnotations | No | Defaults applied automatically (see below) |
| `execution` | ExecutionProfile | No | Shortcut — placed inside `annotations.execution` |

### Annotations

Annotations tell agents how to treat the tool:

| Annotation | Default | Meaning |
|------------|---------|---------|
| `readOnlyHint` | `false` | Tool does not modify anything |
| `destructiveHint` | `false` | Tool may delete/overwrite data |
| `idempotentHint` | `true` | Same input → same effect (safe to retry) |
| `openWorldHint` | `false` | Tool interacts with external services |

Set `readOnlyHint: true` for read-only tools (search, lookup). Set `openWorldHint: true` for tools that call external APIs or browse the web.

### Examples

Examples are not decorative. They are:

- Executed by `toolrouter test` as integration tests
- Shown in docs, discovery, and agent-facing pages
- Used by agents to understand input format
- Validated against the input schema at registration

Each example must:

- Have a `description` (min 5 chars)
- Only use properties declared in the input schema
- Provide all `required` properties

```typescript
examples: [
  {
    description: 'Analyze SEO for a landing page',
    input: { url: 'https://example.com', depth: 2 },
  },
],
```

## Global market support

Tools should work for users worldwide. If your tool shows prices, listings, or merchants from an upstream API that supports regional data, expose `market` and `currency` input params on every skill that returns prices.

### Standard pattern

Define shared params once and spread into each skill's `input.properties`:

```typescript
const MARKET_PARAMS = {
  market: {
    type: 'string' as const,
    description: 'Country code for localized prices (default "us"). E.g. "gb", "fr", "de", "jp", "au", "sg", "hk", "kr", "br", "za", "ae", "in"',
    default: 'us',
  },
  currency: {
    type: 'string' as const,
    description: 'Currency code for prices (default "USD"). E.g. "GBP", "EUR", "JPY", "AUD", "SGD", "HKD", "KRW", "BRL", "ZAR", "AED", "INR"',
    default: 'USD',
  },
};

// In each skill's input:
input: {
  type: 'object',
  properties: {
    query: { type: 'string', description: '...' },
    ...MARKET_PARAMS,
  },
},
```

In the handler, read and pass through to the upstream API:

```typescript
const market = (input.market as string) ?? 'us';
const currency = (input.currency as string) ?? 'USD';
// Pass to your API client
const data = await myApi.search({ ...params, country_code: market, currency_code: currency });
```

### Rules

- **Default to `us`/`USD`** — most upstream APIs default to US. Don't break existing behavior.
- **`market` = where the user shops** — affects which merchants, prices, and availability they see.
- **`country` = where the item is from** — this is a content filter (e.g. French wines), not the same as market.
- **Keep both params on every skill that shows prices** — agents should be able to set market once and have it propagate.
- **Add market context in instructions** — tell agents: "If user is non-US, set market/currency".
- **Use the real currency in table format** — `${w.currency} ${w.price}` not `$${w.price}`.
- **Don't say "in USD" in price descriptions** — say "in local currency" since it depends on market/currency.

### Reference

See `src/tools/wine-collector/` for the full pattern — defines `MARKET_PARAMS` once and spreads into 4 skills.

## Pricing model

Every skill call costs `max($0.005, raw_cost)`:

| Scenario | raw_cost | User pays | Example |
|----------|----------|-----------|---------|
| **Our infra** (Playwright, Remotion, RapidAPI) | `0` | **$0.005** | SEO scans, screenshots, video renders |
| **Provider API** (fal.ai, OpenRouter, ElevenLabs, etc.) | actual cost | **rawCost** | `$2.80` for a Kling video, `$0.04` for a fal.ai image |

- If your skill calls a **paid provider API**, return `usage: { raw_cost }` with the actual cost from the provider.
- If your skill runs on **our infrastructure only** (no paid API), return `raw_cost: 0` — the $0.005 minimum kicks in.
- Use `context.getRate(provider, metric)` for dynamic rates — never hardcode dollar amounts.
- There is no markup on provider costs. Revenue comes from the 5.5% purchase fee on credit top-ups.

## How do I write a skill handler?

Handlers are async functions receiving `(input, context)`. Access input values with `input.param as string`, log progress with `context.log()`, get credentials with `context.get('openai')`, and return either a plain object or `{ output, usage: { raw_cost } }` for metered calls.

Handlers are async functions that receive input and a context object:

```typescript
import type { SkillHandler } from '../../../core/types.js';
import { resolveKey, requireKeyOrUnavailable } from '../../../core/resolve-key.js';

export const mySkillHandler: SkillHandler = async (input, context) => {
  const url = input.url as string;

  // Check for required API key — returns clean "unavailable" if missing
  const unavailable = requireKeyOrUnavailable(context, 'openai', 'OPENAI_API_KEY');
  if (unavailable) return unavailable;

  const apiKey = resolveKey(context, 'openai', 'OPENAI_API_KEY')!;

  // Log progress (goes to stderr in stdio mode)
  context.log(`Processing ${url}`);

  // Do work...
  const result = await fetchSomething(url, apiKey);

  // Return output — plain object or HandlerResult with usage
  return {
    output: { data: result },
    usage: { raw_cost: 0.002 },
  };
};
```

### Handler return values

A handler can return either:

**Plain object** — when there's no AI cost:

```typescript
return { title: 'Example', score: 95 };
```

**HandlerResult** — when there's a cost to report:

```typescript
return {
  output: { title: 'Example', score: 95 },
  usage: {
    raw_cost: 0.003,  // AI provider cost in tool's currency (USD)
    details: { model: 'gpt-4o', tokens: 1500 },  // Optional metadata
  },
};
```

**Pricing model:** The caller pays `max($0.005, raw_cost)`. If your handler returns `raw_cost: 0` (our infra, no provider API), the user is charged a flat $0.005. If your handler returns a provider cost (e.g. `raw_cost: 2.80` for a fal.ai video), the user pays that amount. There is no markup — revenue comes from the 5.5% purchase fee on credit top-ups.

### Handler context API

The `context` object provides:

| Property | Type | Description |
|----------|------|-------------|
| `context.toolName` | string | Current tool name |
| `context.skillName` | string | Current skill name |
| `context.callId` | string | Unique call ID (use for temp filenames) |
| `context.log(msg)` | function | Operational logging (stderr in stdio mode) |
| `context.get?.(name)` | function or undefined | Get a saved secret/credential by name (use `?.` to guard) |
| `context.getRate(provider, metric)` | function | Get live pricing rate (Convex override > static default > 0) |
| `context.getProviderKey(name)` | function | *Deprecated* — use `get()` instead |
| `context.callTool(ref, skill, input)` | function | Call another tool (composition, max 6 levels deep) |
| `context.knowledge` | object or undefined | RAG search if knowledge/ exists |

### Error handling

**Missing API keys — skip, don't throw.** If a key is not configured, the tool should gracefully skip that data source or return an "unavailable" response. Never throw errors about missing keys — agents should not see credential setup instructions.

```typescript
import { resolveKey, requireKeyOrUnavailable } from '../../../core/resolve-key.js';

// Single-source handler — entire skill needs one key:
const unavailable = requireKeyOrUnavailable(context, 'openai', 'OPENAI_API_KEY');
if (unavailable) return unavailable;  // Clean { available: false } response

// Multi-source handler — skip unavailable sources, use what's available:
const rapidapiKey = resolveKey(context, 'rapidapi', 'RAPIDAPI_KEY');
if (rapidapiKey) {
  results.push(await fetchFromRapidApi(rapidapiKey, ...));
}
```

For external service failures (not missing keys), throw with context:

```typescript
try {
  const res = await fetch(url);
  if (!res.ok) throw new Error(`HTTP ${res.status}`);
  return await res.json();
} catch (err) {
  const message = err instanceof Error ? err.message : String(err);
  throw new Error(`Failed to fetch "${url}": ${message}`);
}
```

### Asset delivery

If your handler produces files (screenshots, exports, generated images), return output keys ending in `_path`. Files must be under `tmpdir()` or `~/.toolrouter/assets/` — paths outside these directories are silently ignored.

```typescript
import { writeFile } from 'fs/promises';
import { tmpdir } from 'os';
import { join } from 'path';

// Write to allowed directory, use callId for unique names
const filePath = join(tmpdir(), `screenshot-${context.callId}.png`);
await writeFile(filePath, buffer);

return {
  output: {
    image_path: filePath,   // _path suffix triggers auto-upload
    format: 'png',
  },
  usage: { raw_cost: 0.01 },
};
```

The gateway automatically:

1. Uploads the file to the asset store
2. Adds `image_url` and `image_asset` to the response
3. For MCP clients: inlines images < 1MB as base64

## File Outputs

Handlers produce files by downloading them locally and returning `*_path` keys. The gateway picks up any output key ending in `_path`, uploads it to the asset store, and adds `*_url`, `*_asset`, and `*_page` keys to the response automatically. Agents get a stable, shareable URL — no S3 SDK or upload code needed in your handler.

### Basic pattern

Download the file → write to `tmpdir()` → return the path:

```typescript
import { writeFile } from 'fs/promises';
import { tmpdir } from 'os';
import { join } from 'path';

export const exportHandler: SkillHandler = async (input, context) => {
  const buffer = await generateDocument(input.content as string);

  const filePath = join(tmpdir(), `report-${context.callId}.pdf`);
  await writeFile(filePath, buffer);

  return {
    output: {
      document_path: filePath,       // _path → triggers auto-upload
      document_url_direct: someUrl,  // ephemeral provider CDN URL (capture fallback)
      page_count: 12,
    },
    usage: { raw_cost: 0 },
  };
};
```

After the gateway processes this, the agent receives:

```json
{
  "document_url": "https://api.toolrouter.com/v1/assets/ast_abc123",
  "document_asset": "ast_abc123",
  "document_page": "https://toolrouter.com/a/ast_abc123",
  "document_url_direct": "https://cdn.provider.com/...",
  "page_count": 12
}
```

### What happens automatically

| Output key | Gateway adds |
|------------|-------------|
| `image_path` | `image_url`, `image_asset`, `image_page` |
| `document_path` | `document_url`, `document_asset`, `document_page` |
| `video_path` | `video_url`, `video_asset`, `video_page` |
| `audio_path` | `audio_url`, `audio_asset`, `audio_page` |
| `*_path` (any prefix) | `*_url`, `*_asset`, `*_page` |

Files must live under `tmpdir()` or `~/.toolrouter/assets/` — paths outside these directories are ignored. Use `context.callId` in filenames to avoid collisions across concurrent calls.

### The `*_url_direct` convention

`document_url_direct` (or `image_url_direct`, `video_url_direct`, etc.) is the **ephemeral provider CDN URL** — included as a capture fallback in case the local download fails before upload. It is **not** a permanent render URL. The gateway uploads the local file and produces the stable `*_url`; the `*_url_direct` is just a safety net for debugging.

Never use `*_url_direct` for permanent on-demand render URLs (e.g. a QR code endpoint that generates on request). For those, omit `_path` and return only the URL — or use the capture opt-out below.

### Multi-file outputs

Return numbered suffixes for multiple files:

```typescript
return {
  output: {
    image_path: paths[0],
    image_2_path: paths[1],
    image_3_path: paths[2],
    // Gateway produces: image_url, image_2_url, image_3_url, etc.
  },
  usage: { raw_cost: 0.03 },
};
```

### Capture opt-out

If a URL is a **permanent render endpoint** that does not need uploading, add a `{prefix}_capture: false` flag alongside it:

```typescript
return {
  output: {
    qr_url: 'https://api.qrserver.com/v1/create-qr-code/?data=...',
    qr_capture: false,  // tells gateway: don't download+upload this URL
  },
};
```

Without the opt-out, the gateway would attempt to upload everything it finds, which is wasteful for stable render-on-demand URLs.

### Supported file extensions

The gateway recognises these extensions for auto-upload:

| Type | Extensions |
|------|-----------|
| Images | `.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.svg`, `.avif` |
| Video | `.mp4`, `.webm`, `.mov` |
| Audio | `.mp3`, `.wav`, `.ogg`, `.flac` |
| Documents | `.pdf`, `.zip`, `.pptx`, `.docx`, `.xlsx`, `.csv` |

Files with unrecognised extensions are still uploaded; content type is inferred from the extension or falls back to `application/octet-stream`.

## Interactive Outputs (MCP Apps)

When your tool returns data to an MCP client like Claude, ChatGPT, or VS Code, it normally appears as plain text. MCP Apps let you render that data as an interactive UI — sortable tables, image galleries, charts, maps — directly inside the conversation. The user sees a card they can interact with instead of a wall of JSON.

### How it works

There are two layers: the **MCP Apps standard** (how hosts discover and render interactive UI) and **ToolRouter's format system** (how your handler declares which renderer to use).

**MCP Apps layer** (standard protocol):

1. The gateway's `tools/list` declares `_meta.ui.resourceUri: 'ui://toolrouter/app'` on `use_tool` and `get_job_result` — this tells the host these tools have interactive UI
2. When the host calls one of these tools, it fetches the `ui://toolrouter/app` HTML resource via `resources/read` and mounts it in a sandboxed iframe
3. For supported inline outputs, the tool result includes both `content` (a concise transcript fallback for the model and text-only hosts) and `structuredContent` (structured data for the UI — never enters model context)
4. The iframe receives the result via `ui/notifications/tool-result` and renders `structuredContent` when it is present

**ToolRouter format layer** (your handler code):

1. Your handler returns output with a `format` key declaring which UI renderer to use
2. For supported inline formats, the gateway includes the rich output as `structuredContent` on the tool result
3. The iframe app checks `structuredContent.format` and dispatches to the matching renderer (table, gallery, report, etc.)
4. Tools without `format`, or with formats intentionally hidden inline like `detail`, fall back to the host's normal transcript/accordion instead of a ToolRouter card. Use one of the supported inline formats when you want visible MCP app UI.

Your handler code barely changes — you add `format` and normalize data into `format_data`:

```typescript
// Before — plain JSON output
return {
  output: { items: results, total: results.length },
  usage: { raw_cost: 0.001 },
};

// After — same data, now renders as an interactive table
return {
  output: {
    format: 'table',
    format_data: {
      items: results.map(r => ({ title: r.name, url: r.link, snippet: r.description })),
      total: totalCount,
      page: 1,
    },
    // Keep original fields too — agents still read them
    items: results,
    total: results.length,
  },
  usage: { raw_cost: 0.001 },
};
```

The `format_data` key is the canonical data envelope — it matches the exact interface the format renderer expects. The original output fields stay alongside it for agents that read JSON directly.

ToolRouter intentionally does **not** render `detail` inline in the MCP app anymore. Generic key-value payloads tend to be noisy internal tool state rather than genuinely user-facing UI, so hosts should handle them in their normal tool-result accordions/transcripts instead.

### Available formats

| Format | Renders as | Use when your output has |
|--------|-----------|------------------------|
| `table` | Sortable, filterable rows with pagination | `items[]` — array of objects with consistent keys |
| `gallery` | Image grid with lightbox and download | `images[]` — array of `{ url, title? }` |
| `report` | Scored sections with pass/warn/fail pills | `score?`, `sections[]` with `items[]` |
| `detail` | No inline MCP app card by default | Flat or nested object with named fields when you want a clean text fallback or host-native accordion instead of a ToolRouter card |
| `compare` | Side-by-side columns | `items[]` — 2-4 objects to compare |
| `timeline` | Vertical event stream with status dots | `events[]` — `{ time, title, status? }` |
| `chart` | Line, bar, or pie chart with tooltips | `type`, `labels[]`, `datasets[]` |
| `map` | Map with pin markers and info cards | `pins[]` — `{ lat, lng, title }` |
| `media` | Audio or video player with download | `url`, `type: 'audio'|'video'` |
| `document` | File card with preview and download | `url`, `filename`, `type: 'pdf'|'docx'|...` |
| `list` | Numbered items with title, description, link | `items[]` — `{ title, description?, url? }` |

### Data shapes per format

Each format has a strict `format_data` interface defined in `src/core/format-types.ts`. The renderer only reads `format_data` — never tool-specific output keys.

**Table** — the most common format:

```typescript
return {
  output: {
    format: 'table',
    format_data: {
      items: [
        { title: 'Result 1', url: 'https://...', score: 95 },
        { title: 'Result 2', url: 'https://...', score: 87 },
      ],
      total: 142,
      page: 1,
      per_page: 20,
    },
  },
};
```

**Gallery** — for image-producing tools:

```typescript
return {
  output: {
    format: 'gallery',
    format_data: {
      images: [
        { url: output.image_url, title: 'Generated image', page: output.image_page },
      ],
    },
  },
};
```

**Report** — for audit and analysis tools:

```typescript
return {
  output: {
    format: 'report',
    format_data: {
      score: 72,
      sections: [
        {
          title: 'Meta Tags',
          status: 'pass',
          items: [
            { label: 'Title tag', value: 'Present', status: 'pass' },
            { label: 'Meta description', value: 'Missing', status: 'fail' },
          ],
        },
      ],
    },
  },
};
```

**Chart** — for time series and metrics:

```typescript
return {
  output: {
    format: 'chart',
    format_data: {
      type: 'line',
      labels: ['Jan', 'Feb', 'Mar', 'Apr'],
      datasets: [{ label: 'Revenue', data: [120, 145, 162, 190] }],
    },
  },
};
```

**List** — for search results, ranked items, numbered lists with rich per-item content:

```typescript
return {
  output: {
    format: 'list',
    format_data: {
      items: [
        { title: 'Result One', description: 'Brief summary of the first result', url: 'https://...' },
        { title: 'Result Two', description: 'Brief summary of the second result', detail: 'Extra context' },
      ],
    },
  },
};
```

### Handling large datasets

Never dump hundreds of items into one output. Use pagination — return one page at a time:

```typescript
export const searchHandler: SkillHandler = async (input, context) => {
  const page = (input.page as number) ?? 1;
  const perPage = 20;

  const allResults = await fetchResults(input.query as string);
  const pageResults = allResults.slice((page - 1) * perPage, page * perPage);

  return {
    output: {
      format: 'table',
      format_data: {
        items: pageResults,
        total: allResults.length,
        page,
        per_page: perPage,
      },
    },
  };
};
```

The table renderer shows page buttons at the bottom. Currently pagination is **client-side only** — it highlights the selected page button but displays the same items returned in the current result. For large datasets, return one page at a time and accept a `page` parameter so the agent can request subsequent pages on behalf of the user. Server-driven pagination via `app.callServerTool()` is planned but not yet wired up.

### Interactive features

The MCP App cards have these built-in interactions (no handler code needed):

- **Expand to fullscreen**: every card has an expand button (top-right corner) that calls `app.requestDisplayMode({ mode: 'fullscreen' })`
- **Footer links**: if you include a `footer_link` URL in your format_data, a "View" link appears in the card footer and opens via `app.openLink()`
- **Map pin clicks**: clicking a pin updates the model's context with the selected location
- **Human-readable labels**: all `snake_case` keys are automatically converted to "Title Case" labels

The UI deliberately avoids non-functional buttons. No "Download" or "Variations" buttons appear unless they're wired to real server-side actions. If your tool produces a shareable asset URL (via the asset pipeline), include it as a `page` field in your format_data and it becomes the footer link.

### How MCP Apps rendering works (protocol level)

The interactive UI is powered by the [MCP Apps extension](https://blog.modelcontextprotocol.io/posts/2026-01-26-mcp-apps/) (`io.modelcontextprotocol/ui`). Here's what happens under the hood:

1. The gateway's `tools/list` response includes `_meta.ui: { resourceUri: 'ui://toolrouter/app', visibility: ['model', 'app'] }` on `use_tool` and `get_job_result`
2. The gateway's `resources/read` response serves a single-file HTML bundle (`FORMAT_BUNDLE_HTML`) with MIME type `text/html;profile=mcp-app` and CSP metadata
3. When a host like Claude calls `use_tool`, it detects the `_meta.ui` and fetches the HTML resource
4. The host renders the HTML in a sandboxed iframe and forwards the tool result via `ui/notifications/tool-result`
5. When a tool's output contains a supported inline `format`, the gateway includes the rich UI payload as `structuredContent` on the `CallToolResult` — the host passes this directly to the iframe without it entering model context
6. In normal `response_format: "concise"` mode, the visible `content` fallback is reduced to a compact summary so hosts do not surface giant query or result payloads alongside the app card. Use `response_format: "detailed"` only when you explicitly want the raw JSON transcript.
7. The HTML app (built with `@modelcontextprotocol/ext-apps` `App` class) checks `structuredContent` first, falls back to parsing JSON from text content blocks, then dispatches to the matching renderer

The HTML bundle includes:
- **10 inline format renderers** — table, gallery, report, compare, timeline, chart, map, media, document, list
- **Host theme integration** — `applyDocumentTheme()`, `applyHostStyleVariables()`, `applyHostFonts()` from the ext-apps SDK
- **Auto-resize** — the `App` class uses `ResizeObserver` + `sendSizeChanged()` to tell the host the exact content height
- **CSP metadata** — `resourceDomains` allows images/media from provider CDNs (fal.media, replicate, S3, CloudFront, etc.)
- **Loading states** — `ontoolinput` shows a lightweight loading skeleton while the tool runs. Pending and running async job states intentionally render no inline ToolRouter card; the host's native tool accordion carries that progress instead
- **Cancellation** — `ontoolcancelled` shows a clean "Cancelled" card
- **Dark mode** — uses spec-defined `--color-*` CSS variables that automatically resolve differently in `[data-theme="dark"]` (not `@media prefers-color-scheme` which doesn't work in sandboxed iframes)
- **Full McpUiStyleVariableKey coverage** — all 78 spec CSS variables defined with sensible defaults (colors, typography, radii, shadows, rings) so host-injected themes apply automatically via `applyHostStyleVariables()`

The HTML bundle source lives in `packages/mcp-apps/src/`. To rebuild after changes: `cd packages/mcp-apps && npm run build`, then `node scripts/embed-format-bundle.ts` to embed it into the gateway.

### Design principles

The MCP App UI is designed for **non-technical users**. Key rules:

- **No JSON ever** — if data can't be rendered cleanly, it doesn't show
- **No dead buttons** — buttons only appear when they're wired to real actions
- **Human-readable labels** — `snake_case` keys become "Title Case" automatically
- **Clean error states** — errors show plain text messages, not `{"error":{"code":"..."}}`
- **Null/empty values hidden** — fields with null, undefined, or empty values are silently skipped
- **Gateway metadata stripped** — internal keys like `review_hint`, `request_id`, `usage`, `_meta` never appear in the UI

### Why `format` is required

Without a `format` key, the gateway does **not** include `structuredContent` on the tool result. Some formats, such as `detail`, are also intentionally excluded from inline MCP app rendering even though they still use `format_data` for transcript shaping. If you want a visible ToolRouter card, use one of the supported inline formats and include `format` + `format_data`.

Clients that don't support MCP Apps (e.g. older MCP clients, stdio-only setups) ignore the `_meta.ui` field and get the same JSON output they always did. The data is always there in the text content block regardless of whether the UI renders.

### Lazy loading heavy dependencies

Load expensive dependencies on first use, not at import time. This keeps startup fast:

```typescript
// Good — lazy load
export const captureHandler: SkillHandler = async (input, context) => {
  const { chromium } = await import('playwright');
  // ...
};

// Bad — loaded at import time
import { chromium } from 'playwright';
```

## How do I declare API key requirements?

Add a `requirements` array to `defineTool()` with entries specifying `name` (globally unique), `type` ('secret' or 'credential'), `displayName`, `description`, `acquireUrl`, and `envFallback`. Handlers access them via `context.get('name')`.

If your tool needs API keys or identifiers, declare them in `requirements`:

```typescript
requirements: [
  {
    name: 'openai',                          // Globally unique key
    type: 'secret',                          // 'secret' = masked, 'credential' = plain text
    displayName: 'OpenAI API Key',
    description: 'API key for GPT model access (min 10 chars)',
    required: true,
    acquireUrl: 'https://platform.openai.com/api-keys',
    envFallback: 'OPENAI_API_KEY',
  },
],
```

**Resolution order** (handler calls `context.get('openai')`):

1. Per-request header: `X-Provider-Key-OpenAI: sk-...`
2. User's saved value (Convex cloud)
3. Config file (`~/.toolrouter/config.json`)
4. Environment variable (`process.env.OPENAI_API_KEY`)

**Important**: requirement names are globally unique. If two tools both declare `name: 'openai'`, they must have identical `displayName`, `description`, `acquireUrl`, and `envFallback`. The validator checks this.

## How do I add knowledge to a tool?

Create a `knowledge/` directory with `.md` files and enable it in `defineTool()` with `knowledge: {}`. Handlers search it with `context.knowledge?.search(query, topK)`. Knowledge is indexed lazily using local embeddings — no API key needed.

Add a `knowledge/` directory with `.md` files to give your handler searchable domain knowledge:

```
src/tools/my-tool/
├── knowledge/
│   ├── best-practices.md
│   └── common-patterns.md
├── index.ts
└── skills/
```

Enable in `defineTool()`:

```typescript
knowledge: {
  dir: 'knowledge',       // Default: 'knowledge'
  maxChunkSize: 1500,     // Default: 1500 chars per chunk
  overlap: 200,           // Default: 200 chars overlap
  defaultTopK: 5,         // Default: 5 results
},
```

Use in handlers:

```typescript
export const analyzeHandler: SkillHandler = async (input, context) => {
  // Search knowledge base
  const docs = await context.knowledge?.search('meta tag best practices', 3);

  // docs = [{ content, heading, score, source }]
  const guidance = docs?.map(d => d.content).join('\n') ?? '';

  // Use guidance in your analysis...
};
```

Knowledge is indexed lazily on first search using local embeddings (HuggingFace, no API key needed).

For built-in tools, set `__knowledgePath` to a pre-resolved absolute path (tsup bundles can't resolve relative paths at runtime):

```typescript
import { resolve, dirname } from 'path';
import { fileURLToPath } from 'url';

const __knowledgePath = resolve(
  dirname(fileURLToPath(import.meta.url)),
  '../../src/tools/my-tool/knowledge'
);
```

## How do I call other tools from a handler?

Use `context.callTool(toolRef, skill, input)` to call another tool. Billing and provider keys are inherited from the parent call. Max composition depth is 6 levels (depth 0–5).

A handler can call other tools using `context.callTool()`. Billing and provider keys are inherited from the parent call:

```typescript
export const fullAuditHandler: SkillHandler = async (input, context) => {
  const url = input.url as string;

  // Call another tool's skill
  const seoResult = await context.callTool(
    'seo',
    'analyze_page',
    { url },
  );

  const screenshotResult = await context.callTool(
    'magic-screenshots',
    'capture',
    { url, device: 'desktop' },
  );

  return {
    seo: seoResult.output,
    screenshot: screenshotResult.output,
  };
};
```

Max composition depth: 6 levels (depth 0–5). Deeper calls are rejected.

## Naming conventions

| What | Convention | Example |
|------|-----------|---------|
| Tool name | kebab-case | `magic-screenshots` |
| Skill name | snake_case | `analyze_page` |
| Skill filename | kebab-case | `skills/analyze-page.ts` |
| Handler variable | camelCase + Handler | `analyzePageHandler` |
| Register export | register + PascalCase | `registerMagicScreenshots` |
| MCP tool name | tool\_\_skill | `seo__analyze_page` |
| Tool directory | kebab-case | `src/tools/magic-screenshots/` |

## How do I register a tool?

Import the register function and call it inside `createRegistry()` in `src/tools/index.ts`. The registry validates the manifest and stores the handlers at startup.

Add your tool to `src/tools/index.ts`:

```typescript
import { registerMyTool } from './my-tool/index.js';

export async function createRegistry(): Promise<ToolRegistry> {
  const registry = new ToolRegistry();

  registerMyTool(registry);
  // ... other tools

  return registry;
}
```

## How do I validate and test a tool?

Run `toolrouter validate` for manifest checks (naming, schemas, examples) or `toolrouter validate --strict` to treat warnings as errors. Run `toolrouter test my-tool` to execute every example as an integration test with pass/fail reporting.

The validator enforces all the rules in this guide. Run it often:

```bash
toolrouter validate                # Warnings only
toolrouter validate --strict       # Warnings become errors
```

### What `validate` checks

**Errors** (always fail):

- Tool name not kebab-case
- Skill name not snake_case
- Missing or invalid `subtitle` (must be 10-80 chars)
- `description` too short (tool: 50 chars, skill: 30 chars)
- Missing or too-short `instructions` (min 50 chars, aim for ~600)
- Missing `changelog` (min 1 entry required)
- Invalid category
- No skills defined
- No examples for a skill
- Input property missing `description`
- Example uses undeclared property
- Example missing required property
- Duplicate skill names
- Workflow references non-existent skill
- Requirement `description` too short (min 10 chars)
- Requirement name conflicts across tools

**Warnings** (`--strict` makes these errors):

- Missing `author` (warning only — do not add placeholder public metadata just to silence it)
- Skills with fewer than 2 examples
- Skills without `outputSchema`
- Knowledge directory missing or empty

### What `test` checks

```bash
toolrouter test my-tool              # Test all skills
toolrouter test my-tool --timeout 60 # Custom timeout (seconds)
```

The test runner:

1. Executes every example in every skill
2. Validates output against `outputSchema` (in strict mode)
3. Reports pass/fail and latency per example
4. Default timeout: 30 seconds per example

## Input schema best practices

### Every property needs a description

This is **enforced by the validator**. Agents use descriptions to form correct inputs:

```typescript
// Good
properties: {
  url: { type: 'string', description: 'Full URL to analyze including protocol' },
  depth: { type: 'integer', description: 'Crawl depth (1 = single page, 2+ = follow links)', default: 1 },
}

// Bad — will fail validation
properties: {
  url: { type: 'string' },
  depth: { type: 'integer' },
}
```

### Use enums for constrained values

```typescript
device: {
  type: 'string',
  description: 'Device viewport preset',
  enum: ['iphone-15-pro', 'ipad-pro', 'desktop'],
  default: 'iphone-15-pro',
}
```

### Declare defaults

```typescript
format: {
  type: 'string',
  description: 'Output format',
  enum: ['png', 'jpg', 'webp'],
  default: 'png',
}
```

### Keep required properties minimal

Only mark properties as `required` if the skill genuinely cannot function without them. More optional properties = more flexible for agents.

## Execution profiles (async skills)

For skills that take more than a few seconds, declare an execution profile. The gateway uses `estimatedSeconds` to decide sync vs async routing (>= 30s routes async):

```typescript
skills: [{
  name: 'crawl_site',
  displayName: 'Crawl Site',
  description: 'Crawl an entire website and return structured data for all pages found',
  // ... input, handler, examples ...
  execution: {
    estimatedSeconds: 60,     // Gateway routes async if >= 30
    timeoutSeconds: 300,      // Max wait. Defaults to 2x estimatedSeconds
    mode: 'io',               // 'io' = network polling, 'cpu' = Playwright/ffmpeg
  },
}],
```

The `execution` field is a shortcut — it gets placed inside `annotations.execution` automatically. You do not need to nest it manually.

## Dynamic pricing with context.getRate()

Never hardcode provider costs. Use `context.getRate()` to resolve live pricing that can be updated at runtime via Convex without redeployment:

```typescript
import { resolveKey, requireKeyOrUnavailable } from '../../../core/resolve-key.js';

export const scrapeHandler: SkillHandler = async (input, context) => {
  const unavailable = requireKeyOrUnavailable(context, 'firecrawl', 'FIRECRAWL_API_KEY');
  if (unavailable) return unavailable;

  const apiKey = resolveKey(context, 'firecrawl', 'FIRECRAWL_API_KEY')!;
  const result = await firecrawlFetch(input.url as string, apiKey);

  // Resolve rate dynamically — Convex override > static default > 0
  const ratePerCredit = context.getRate('firecrawl', 'credit');
  const rawCost = (result.creditsUsed ?? 1) * ratePerCredit;

  return {
    output: result.data,
    usage: { raw_cost: rawCost },
  };
};
```

Static rates are defined in `src/core/provider-pricing.ts` as fallbacks. If your tool uses a new provider, add a static rate there.

## Shared clients and handler factories

### Shared API client

When multiple tools call the same API, extract a shared client into `src/tools/_shared/`:

```typescript
// src/tools/_shared/my-api-client.ts
import type { SkillContext } from '../../core/types.js';

export const MY_API_CREDENTIAL = {
  name: 'my_api',
  type: 'secret' as const,
  displayName: 'My API Key',
  description: 'API key for My Service (get it at my-api.com/keys)',
  required: true,
  acquireUrl: 'https://my-api.com/keys',
  envFallback: 'MY_API_KEY',
};

export async function myApiFetch(
  path: string,
  params: Record<string, unknown>,
  apiKey: string,
) {
  const res = await fetch(`https://api.my-service.com${path}`, {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' },
    body: JSON.stringify(params),
  });
  if (!res.ok) throw new Error(`My API error: HTTP ${res.status}`);

  return { data: await res.json(), raw_cost: 0.001 };
}
// Handler does the key check before calling the client:
// const unavailable = requireKeyOrUnavailable(context, 'my_api', 'MY_API_KEY');
// if (unavailable) return unavailable;
// const key = resolveKey(context, 'my_api', 'MY_API_KEY')!;
// const result = await myApiFetch('/search', params, key);
```

Then each tool imports the shared client and credential definition.

> **Adding an upstream provider?** If your client needs circuit breakers, `/tmp` asset downloads, model registry entries, and dispatch routing (e.g. image/video providers), see [Adding Providers](/docs/adding-providers) for the full pattern.

### Handler factory

When a tool has many skills that differ only by an API endpoint, use a factory:

```typescript
// All 11 web-search skills generated from a single factory
function createSearchHandler(endpoint: string): SkillHandler {
  return async (input, context) => {
    const result = await serperSearch(endpoint, input, context);
    return { output: result.data, usage: { raw_cost: result.raw_cost } };
  };
}

skills: [
  { name: 'search', handler: createSearchHandler('/search'), ... },
  { name: 'news_search', handler: createSearchHandler('/news'), ... },
  { name: 'image_search', handler: createSearchHandler('/images'), ... },
],
```

## Stateful session pattern

For multi-step workflows where skills share state (e.g., pentest), return a `session_id` from the first skill and accept it in subsequent skills:

```typescript
// skills/recon.ts — creates the session
export const reconHandler: SkillHandler = async (input, context) => {
  const session = createSession(input.target as string);
  return { session_id: session.id, findings: session.recon };
};

// skills/scan.ts — continues the session
export const scanHandler: SkillHandler = async (input, context) => {
  const session = getSession(input.session_id as string);
  if (!session) throw new Error('Invalid session_id — run recon first');
  // ... scan using session state
};
```

Use the `workflow` array to hint the correct order: `workflow: ['recon', 'scan_vulnerabilities', 'generate_report']`.

## Workflow hints

The `workflow` array suggests an execution order to agents:

```typescript
workflow: ['capture', 'restyle', 'annotate', 'export'],
```

This is a hint, not enforced. Agents can call skills in any order.

## How do I add my tool to the E2E test suite?

After registering your tool, add it to the test infrastructure so all three tiers (manifests, billing, handler execution) cover it automatically.

### Step 1: Register in test registry

Add your register function to `tests/helpers/registry-factory.ts`:

```typescript
import { registerMyTool } from '../../src/tools/my-tool/index.js';

const ALL_REGISTERS = [
  // ... existing registers
  registerMyTool,
];
```

This gives you Tier 1 (manifest validation) and Tier 2 (billing pipeline) for free.

### Step 2: Add MSW handlers

If your tool calls external APIs, add mock handlers in `tests/helpers/msw-handlers.ts`:

```typescript
// Mock My API responses
http.get('https://api.my-service.com/*', ({ request }) => {
  const url = new URL(request.url);
  if (url.pathname.includes('/search')) {
    return HttpResponse.json({ results: [{ title: 'Mock result' }] });
  }
  return HttpResponse.json({ status: 'ok' });
}),
```

Place handlers **before** the catch-all at the bottom. For multi-step flows (submit + poll), create separate handlers.

### Step 3: Add mock provider keys

If your tool declares `requirements`, add entries to `MOCK_PROVIDER_KEYS` in `tests/e2e/tools.test.ts`:

```typescript
const MOCK_PROVIDER_KEYS: Record<string, string> = {
  // ... existing keys
  my_api: 'test-my-api-key',
};
```

### Step 4: Skip if needed

If your tool depends on Playwright, CLI binaries, system DNS, or requires real files, add it to `SKIP_EXECUTION`:

```typescript
const SKIP_EXECUTION = new Set([
  // ... existing skips
  'my-tool',
]);
```

### Step 5: Run the suite

```bash
npm run test:e2e
```

Your tool is now covered by all three tiers.

## How do I create a plugin?

Run `toolrouter init my-tool --plugin` to create a standalone npm package. The package exports a `register` function. Users install plugins with `toolrouter plugins add my-tool-plugin`. Plugin tools get the same registry, billing, assets, and composition as built-in tools.

For tools distributed as separate npm packages:

```bash
toolrouter init my-tool --plugin
```

This creates a standalone package with its own `package.json`. The package must export a `register` function:

```typescript
export { register } from './index.js';
```

Users install plugins:

```bash
toolrouter plugins add my-tool-plugin
```

Plugin tools get the same registry, billing, assets, and composition as built-in tools.

## Checklist

Before shipping a tool:

**Manifest & schema:**
- [ ] `toolrouter validate --strict` passes
- [ ] `toolrouter test <tool>` passes
- [ ] `subtitle` is 10-80 chars — punchy App Store tagline
- [ ] `description` is 50+ chars — detailed user-facing pitch (2-4 sentences)
- [ ] `instructions` is 50+ chars (aim for ~600) — agent guide with best practices and skill tips
- [ ] `changelog` has at least 1 entry
- [ ] Every skill has at least 2 examples
- [ ] Every input property has a `description`
- [ ] `outputSchema` defined for all skills
- [ ] `returns` field describes what the skill gives back
- [ ] `requirements` declared for any external API keys

**Handler quality:**
- [ ] Heavy dependencies lazy-loaded (not imported at top level)
- [ ] Missing keys use `requireKeyOrUnavailable()` / `resolveKey()` — never throw on missing keys
- [ ] File outputs use `_path` suffix for auto-upload
- [ ] Cost reported via `usage: { raw_cost }` for paid APIs
- [ ] Pricing uses `context.getRate()` instead of hardcoded values
- [ ] `execution` profile set for skills taking > 5 seconds
- [ ] **Output includes `format` + `format_data`** — required for MCP App rendering; without it the host shows raw JSON (see format interfaces in `src/core/format-types.ts`)
- [ ] `format_data` matches the canonical interface for the declared format (TableData, GalleryData, etc.)
- [ ] Large result sets paginated via `page` parameter (max ~20 items per response)

**Test suite wired:**
- [ ] Registered in `tests/helpers/registry-factory.ts`
- [ ] MSW handlers added for external APIs in `tests/helpers/msw-handlers.ts`
- [ ] Mock provider keys added in `tests/e2e/tools.test.ts` (if tool has requirements)
- [ ] `npm run test:e2e` passes

**Registry:**
- [ ] Registered in `src/tools/index.ts`
- [ ] `npm run build` succeeds

**App icon:**
- [ ] Icon definition added to `scripts/generate-icons.mjs` (subject, material, gradient, lighting)
- [ ] Icon generated via ToolRouter MCP `generate-image` tool (see Step 1 below)
- [ ] PNG (1024x1024) and WebP (512x512) exist in `web/public/icons/`

**Going live:**
- [ ] Icon generated and committed (must happen before approve)
- [ ] Committed and pushed to `main` (triggers Railway + Vercel deploy)
- [ ] Tool approved in production Convex (see "How do I push a tool live?" below)
- [ ] Verified on `https://api.toolrouter.com/v1/tools/<tool-name>`
- [ ] Verified on `https://toolrouter.com/tools/<tool-name>` (may take up to 60s for ISR)

## How do I push a tool live?

Registering a tool in code and pushing to `main` is not enough. ToolRouter has a **tool approval system** — new tools are hidden from the public API until explicitly approved in production Convex. This prevents unfinished tools from appearing on the website or being discovered by agents.

### Step 1: Generate an app icon

Every tool needs an icon before going live. Icons must be generated and committed before approving the tool.

**1a. Add the icon definition** to `scripts/generate-icons.mjs`:

```javascript
'my-tool': {
  subject: {
    object: '3D [single object description]',
    material: '[ceramic, metal, glass, etc.]',
  },
  background: {
    gradient: ['[top color]', '[bottom color]'],
  },
  lighting: '[soft top-down | dramatic rim | warm golden | cool ambient]',
},
```

See the `generate-tool-icons` skill for icon design rules (single 3D object, real materials, gradient background, no text/clutter).

**1b. Generate the icon** using the ToolRouter MCP `generate-image` tool (we dogfood our own product):

```
Use the ToolRouter MCP's generate-image tool with the text_to_image skill.
Billing context: use the HumanLeap billing context.

Prompt: Build the prompt from the icon definition — edge-to-edge gradient,
single centered 3D object with the specified material, lighting style,
no text/labels/badges. Square 1:1 aspect ratio.
```

**1c. Save the output** as PNG (1024x1024) and WebP (512x512) in `web/public/icons/`:
- `web/public/icons/<tool-name>.png`
- `web/public/icons/<tool-name>.webp`

Use `sharp` to resize/convert if the generated image needs processing.

### Step 2: Push to main

Commit your changes (including the icon files) and push. This triggers auto-deploys on both Railway (API) and Vercel (website):

```bash
git push origin main
```

Railway deploys the API gateway with the new tool registered internally. Vercel rebuilds the website. Both take 1–3 minutes.

### Step 3: Approve the tool in production Convex

New tools must be approved before they appear in `GET /v1/tools`, the website, or MCP discovery.

**Using `toolrouter approve` CLI (recommended):**

The CLI needs `CONVEX_URL` and `TOOLROUTER_CONVEX_SERVER_SECRET` pointing at production. These are on the Railway ToolRouter service. Pull them and run:

```bash
# Get production credentials from Railway (ToolRouter service must be linked)
railway service link ToolRouter

# Approve a single tool
CONVEX_URL="https://jovial-pika-231.eu-west-1.convex.cloud" \
TOOLROUTER_CONVEX_SERVER_SECRET="$(railway variables --json | node -e "console.log(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).TOOLROUTER_CONVEX_SERVER_SECRET)")" \
node dist/bin/toolrouter.js approve my-tool

# Approve ALL tools at once
CONVEX_URL="https://jovial-pika-231.eu-west-1.convex.cloud" \
TOOLROUTER_CONVEX_SERVER_SECRET="$(railway variables --json | node -e "console.log(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).TOOLROUTER_CONVEX_SERVER_SECRET)")" \
node dist/bin/toolrouter.js approve --all
```

**Alternative: Direct Convex function call:**

```bash
# Approve a single tool
CONVEX_DEPLOYMENT=prod:jovial-pika-231 npx convex run --no-push \
  toolApprovals:setApproval '{"toolName": "my-tool", "approved": true, "updatedBy": "yourname"}'

# Approve multiple tools at once
CONVEX_DEPLOYMENT=prod:jovial-pika-231 npx convex run --no-push \
  toolApprovals:bulkSetApproval '{"toolNames": ["tool-a", "tool-b"], "approved": true, "updatedBy": "yourname"}'
```

> **Important:** `.env.local` points to the **dev** Convex instance (`acoustic-antelope-876`), NOT production. The `toolrouter approve` CLI and `npx convex run` both need explicit production targeting — never rely on `.env.local` for production operations.

### Step 4: Wait for the approval refresh

The API gateway refreshes its approved tool list from Convex every **60 seconds**. After approving, wait up to 60 seconds for the tool to appear in the public API.

### Step 5: Verify

```bash
# Check the API serves the tool
curl -s https://api.toolrouter.com/v1/tools/my-tool | python3 -m json.tool

# Check the website (ISR revalidates every 60s)
curl -sI https://toolrouter.com/tools/my-tool | head -1
# Should return: HTTP/2 200
```

### Step 6: Test via ToolRouter MCP

The final verification is calling the tool through the MCP package — the same way real agents use it:

```bash
# If not already configured, add ToolRouter MCP:
claude mcp add toolrouter -- npx -y toolrouter-mcp

# Then in Claude Code, use the discover and use_tool MCP tools to:
# 1. discover("my-tool") — verify it appears in results
# 2. use_tool({ tool: "my-tool", skill: "my_skill", input: { ... } }) — verify it executes
```

This confirms the full pipeline: agent → MCP → gateway → handler → response.

### How approval works internally

- Tools register into the `ToolRegistry` at startup regardless of approval status
- `registry.listTools()` filters by the approved set (loaded from Convex `tool_approvals` table)
- `registry.listToolsUnfiltered()` returns all tools (used by admin endpoints)
- The gateway calls `refreshApprovals()` every 60 seconds to sync the in-memory set
- Health endpoint (`/health`) reports the **approved** tool count
- If the approval loader is not configured (e.g. local dev with no Convex), all tools are visible

### Checking current approvals

```bash
# List all currently approved tools
CONVEX_DEPLOYMENT=prod:jovial-pika-231 npx convex run --no-push toolApprovals:listApproved

# Revoke approval (hides tool from public API)
CONVEX_DEPLOYMENT=prod:jovial-pika-231 npx convex run --no-push \
  toolApprovals:setApproval '{"toolName": "my-tool", "approved": false, "updatedBy": "yourname"}'
```

## How do I add an icon for my tool?

Every tool needs an icon. Icons are 3D photorealistic objects on gradient backgrounds, generated using the ToolRouter MCP `generate-image` tool (we dogfood our own product).

### Icon config format

Add your tool to `TOOL_ICONS` in `scripts/generate-icons.mjs`:

```javascript
'my-tool': {
  subject: {
    object: '3D [single object description]',
    material: '[what it is made of — ceramic, metal, glass, etc.]',
  },
  background: {
    gradient: ['[top color]', '[bottom color]'],
  },
  lighting: '[soft top-down | dramatic rim | warm golden | cool ambient]',
},
```

### Design rules

1. **ONE object only** — not "magnifying glass with globe and documents." Just the magnifying glass.
2. **3D with real materials** — describe what it's made of: ceramic, chrome, leather, glass, wood.
3. **Unique gradient** — check existing tools in your category. Don't duplicate colors.
4. **No accessories** — no floating secondary elements, sparkles, or decorative extras.

### Category color guidelines

| Category | Gradient Range |
|----------|---------------|
| Search/Web | Blues, cyans, teals |
| Security | Dark reds, crimsons, blacks |
| Finance | Teals, golds, navys |
| Media/Audio | Purples, violets, magentas |
| Data/Reference | Greens, limes, soft blues |
| Social | Pinks, oranges, corals |
| Marketing | Electric blues, purples |
| Weather/Nature | Sky blues, golds, oranges |
| Utility | Charcoals, slates, dark grays |

### Generate the icon

Use the ToolRouter MCP `generate-image` tool with the `text_to_image` skill (HumanLeap billing context). Build the prompt from the icon config above:

> Edge-to-edge [gradient colors] gradient background. Single centered 3D [object] made of [material]. [Lighting style] lighting. No text, labels, badges, or decorative extras. Square composition.

Save the output as:
- `web/public/icons/<tool-name>.png` (1024x1024)
- `web/public/icons/<tool-name>.webp` (512x512, quality 85)

Use `sharp` to resize/convert if needed. The icon **must** be committed before approving the tool.

## How metadata is surfaced

Your tool's metadata appears in different contexts with different levels of detail:

| Context | What's shown |
|---------|-------------|
| **Search results (6+ matches)** | `name` + `subtitle` + skill names |
| **Compact results (2-5 matches)** | `name` + `subtitle` + skills with input params |
| **Exact match / tool detail** | `subtitle` + `description` + `instructions` + full skills with schemas and examples |
| **Website tool card** | Icon + `displayName` + `subtitle` + star rating |
| **Website tool page** | Icon + `displayName` + `subtitle` + `description` + all skills |
| **REST API `GET /v1/tools/:tool`** | Full manifest including all fields |

The `subtitle` is the most-seen field — it appears in every context. Write it as if it's the only thing agents and users will read before deciding to click.

The `instructions` field is agent-only — it never appears on the website. It's shown when an agent queries a specific tool by name via `discover`. This is your chance to give agents a detailed guide.

## Read next

- [Adding Providers](/docs/adding-providers) for integrating a new upstream API provider (client, models, dispatch, billing)
- [Architecture](/docs/architecture) for the runtime execution path
- [CLI](/docs/cli) for validation, testing, and plugin workflows
- [Integration](/docs/integration) for how tools are exposed via MCP and REST