Model Selection

Every tool that outputs media (images, video, text) follows the same model selection pattern. The agent always has the final say on which model to use.

Three-tier hierarchy

Model selection follows a strict priority order:

  1. Agent override — the agent passes model: "some-key" in the skill input. Always wins.
  2. Tool default — the tool sets a preferred model for its use case (e.g. ugc-video prefers seedance-2.0 for lip-sync support). Used when the agent doesn't specify a model.
  3. Platform default — the model with isDefault: true in the registry for that media type. Used when neither the agent nor the tool specifies a model.
Agent passes model?  →  yes  →  use that model
                     →  no   →  Tool has a default?  →  yes  →  use tool default
                                                     →  no   →  use platform default (isDefault: true)

Current platform defaults:

  • Image: nano-banana-2 (Gemini 3.1 Flash via fal)
  • Video: seedance-2.0 (ByteDance Seedance 2.0 via fal)

How agents discover models

Tools declare modelMediaType in defineTool():

typescript
defineTool({
  name: 'my-tool',
  modelMediaType: 'video', // auto-injects list_models skill
  // ...
});

This auto-injects a list_models skill that queries the central model registry plus live provider catalogs (fal, prodia, google, openrouter). Agents call list_models to see all available models with pricing, capabilities, and provider info before choosing one.

Shared generation functions

src/tools/shared/generate-media.ts provides two functions that handle all provider routing:

typescript
import { generateImage } from '../shared/generate-media.js';
import { generateVideo } from '../shared/generate-media.js';

// Image — model determines provider automatically
const result = await generateImage('a cute cat', context, {
  model: 'nano-banana-2',  // or any model key from list_models
  aspectRatio: '16:9',
});

// Video — same pattern
const result = await generateVideo('ocean waves', context, {
  model: 'kling-3.0',      // or any model key
  imageUrl: 'https://...',  // optional — enables image-to-video
  duration: 5,
  enableAudio: true,
});

These functions:

  1. Resolve the model via the registry (static + live catalogs)
  2. Detect the provider from the model's metadata
  3. Resolve the API key via resolveKey(context, provider)
  4. Dispatch to the correct provider client (fal, prodia, google, openrouter)
  5. Return a uniform result shape regardless of provider

All tools that generate generic images or video should use these shared functions instead of calling provider clients directly. This ensures every model the agent discovers via list_models actually works when selected.

When NOT to use shared functions

Specialised tools that use model-specific capabilities with no equivalent on other providers:

  • video-edit: Kling O1 video-to-video/edit, restyle, motion-control
  • video-upscale: Model-specific upscaling (bytedance, topaz, seedvr2)
  • face-swap: Uses a specific face-swap model endpoint
  • background-removal: Uses a specific segmentation model
  • virtual-tryon: Uses a specific try-on model

These tools are correctly locked to their specific models. Don't migrate them to the shared function.

Adding model support to a new tool

  1. Set modelMediaType in defineTool() (gets list_models for free)
  2. Accept model as an optional string input in your skill schema
  3. Call generateImage() or generateVideo() with opts.model set to the input value
  4. The shared function handles everything else — provider routing, key resolution, cost
typescript
// In your skill handler:
const result = await generateVideo(prompt, context, {
  model: input.model as string | undefined,  // agent override or undefined
  imageUrl: input.image_url as string,
  duration: 5,
});

If the agent passes a model, it's used. If not, the shared function falls back to its default (seedance-2.0 for video, nano-banana-2 for image).

Provider routing internals

The model registry (src/core/model-registry.ts) stores every model with its provider endpoints. When a model key is resolved, the provider field determines which API client to call:

ProviderImageVideo
falYes (default)Yes (default)
prodiaYesYes
googleYesYes
openrouterYes

Higgsfield and Phota models are hosted on fal.ai and route through the fal path automatically.

Live provider catalogs (fal, openrouter) auto-discover new models at runtime — models added by providers appear without code changes.

Coverage

The model selection UX applies to every output modality:

ModalitymodelMediaTypeShared functionStatus
Image'image'generateImage()30+ tools
Video'video'generateVideo()14+ tools
Audio'audio'— (direct ElevenLabs API)7 tools

Audio models are registered in src/core/models/audio.ts with capabilities: text-to-speech, speech-to-speech, text-to-music, text-to-sfx, speech-to-text. All 7 audio tools accept input.model override and have list_models auto-injected.