Every tool that outputs media (images, video, text) follows the same model selection pattern. The agent always has the final say on which model to use.
Three-tier hierarchy
Model selection follows a strict priority order:
- Agent override — the agent passes
model: "some-key"in the skill input. Always wins. - Tool default — the tool sets a preferred model for its use case (e.g. ugc-video prefers
seedance-2.0for lip-sync support). Used when the agent doesn't specify a model. - Platform default — the model with
isDefault: truein the registry for that media type. Used when neither the agent nor the tool specifies a model.
Agent passes model? → yes → use that model
→ no → Tool has a default? → yes → use tool default
→ no → use platform default (isDefault: true)Current platform defaults:
- Image:
nano-banana-2(Gemini 3.1 Flash via fal) - Video:
seedance-2.0(ByteDance Seedance 2.0 via fal)
How agents discover models
Tools declare modelMediaType in defineTool():
defineTool({
name: 'my-tool',
modelMediaType: 'video', // auto-injects list_models skill
// ...
});This auto-injects a list_models skill that queries the central model registry plus live provider catalogs (fal, prodia, google, openrouter). Agents call list_models to see all available models with pricing, capabilities, and provider info before choosing one.
Shared generation functions
src/tools/shared/generate-media.ts provides two functions that handle all provider routing:
import { generateImage } from '../shared/generate-media.js';
import { generateVideo } from '../shared/generate-media.js';
// Image — model determines provider automatically
const result = await generateImage('a cute cat', context, {
model: 'nano-banana-2', // or any model key from list_models
aspectRatio: '16:9',
});
// Video — same pattern
const result = await generateVideo('ocean waves', context, {
model: 'kling-3.0', // or any model key
imageUrl: 'https://...', // optional — enables image-to-video
duration: 5,
enableAudio: true,
});These functions:
- Resolve the model via the registry (static + live catalogs)
- Detect the provider from the model's metadata
- Resolve the API key via
resolveKey(context, provider) - Dispatch to the correct provider client (fal, prodia, google, openrouter)
- Return a uniform result shape regardless of provider
All tools that generate generic images or video should use these shared functions instead of calling provider clients directly. This ensures every model the agent discovers via list_models actually works when selected.
When NOT to use shared functions
Specialised tools that use model-specific capabilities with no equivalent on other providers:
- video-edit: Kling O1 video-to-video/edit, restyle, motion-control
- video-upscale: Model-specific upscaling (bytedance, topaz, seedvr2)
- face-swap: Uses a specific face-swap model endpoint
- background-removal: Uses a specific segmentation model
- virtual-tryon: Uses a specific try-on model
These tools are correctly locked to their specific models. Don't migrate them to the shared function.
Adding model support to a new tool
- Set
modelMediaTypeindefineTool()(getslist_modelsfor free) - Accept
modelas an optional string input in your skill schema - Call
generateImage()orgenerateVideo()withopts.modelset to the input value - The shared function handles everything else — provider routing, key resolution, cost
// In your skill handler:
const result = await generateVideo(prompt, context, {
model: input.model as string | undefined, // agent override or undefined
imageUrl: input.image_url as string,
duration: 5,
});If the agent passes a model, it's used. If not, the shared function falls back to its default (seedance-2.0 for video, nano-banana-2 for image).
Provider routing internals
The model registry (src/core/model-registry.ts) stores every model with its provider endpoints. When a model key is resolved, the provider field determines which API client to call:
| Provider | Image | Video |
|---|---|---|
| fal | Yes (default) | Yes (default) |
| prodia | Yes | Yes |
| Yes | Yes | |
| openrouter | — | Yes |
Higgsfield and Phota models are hosted on fal.ai and route through the fal path automatically.
Live provider catalogs (fal, openrouter) auto-discover new models at runtime — models added by providers appear without code changes.
Coverage
The model selection UX applies to every output modality:
| Modality | modelMediaType | Shared function | Status |
|---|---|---|---|
| Image | 'image' | generateImage() | 30+ tools |
| Video | 'video' | generateVideo() | 14+ tools |
| Audio | 'audio' | — (direct ElevenLabs API) | 7 tools |
Audio models are registered in src/core/models/audio.ts with capabilities: text-to-speech, speech-to-speech, text-to-music, text-to-sfx, speech-to-text. All 7 audio tools accept input.model override and have list_models auto-injected.