Create authentic UGC-style videos and photos from a product description. Researches audience, writes scripts, generates AI scenes and clips across 16 formats — from product showcase and unboxing to lifestyle stills and flat lays. For marketers, DTC brands, and agencies.
Step 1: Research audience pain points, language, triggers, and objections. Async (1-3 min). Do NOT skip. Pass full output to generate_creative.
Generate UGC video hooks using the 3-variable framework (Angle + Aesthetic + Action). Uses RAG from the UGC playbook to apply proven hook patterns targeting 60%+ 3-second view rate. ⏱ Takes ~5 seconds. Step 2 of the UGC workflow — requires research output from step 1.
Generate authentic UGC scripts using But/Therefore zigzag structure. Creates script variations for each hook with mini-hooks at key drop-off timestamps. Targets ~20 second duration with conversational, non-salesy tone. ⏱ Takes ~5-10 seconds per hook. Step 3 — requires hooks from step 2.
Generate visual scene descriptions for each script, optimized for AI image/video generation. Set "format" for format-specific direction. Takes ~5s per script. Step 4 — requires scripts from step 3.
Generate hooks, scripts, and scene descriptions in one call (~8s). Set "format" for format-specific output (e.g. product_demo, grwm, unboxing, lifestyle_still). Photo formats return compositions instead of hooks/scripts. Step 2 — pass FULL research output from step 1.
Create a reusable voice profile for consistent voice across clips. Provide a voice_sample_url or auto-match from persona + accent. Returns a voice_id for generate_videos. Call after generate_creative.
Step 3: Generate first-frame images from scenes. ~30s async. Pass existing_frames to skip frames that already have image_url. Do NOT re-call if you have frames — pass them as existing_frames instead. persona_image_url for face consistency.
Step 5: Generate video clips from first frames using Seedance 2.0 with native lip-sync. Takes 2-10 MINUTES — do NOT retrigger. Pass elevenlabs_voice_id for consistent voice across clips. Use check_video for pending clips.
Check on a pending video that was still generating when generate_videos returned. Pass the fal_request_id and model_id from the pending_videos array. Returns the video URL if ready, or current status if still generating. You can check back any time.
Check on a pending image that was still generating when generate_frames timed out. Pass the fal_request_id from pending_frames. Returns image URL if ready, or status if still generating.
Re-generate a single frame without re-running the full pipeline. Accepts optional revision notes for targeted edits. Use when a specific frame needs adjustment — much faster than regenerating all frames.
Composite real product into frames. IMPORTANT: Inline chat images CANNOT be passed as URLs. Direct user to toolrouter.com/dashboard/files to upload their image and copy the hosted URL. They can reuse it anytime. Or use web-screenshot on product page.
Stitch video clips into one final video with text overlays and transitions. Last step after generate_videos. Takes ~30-120s. Async — poll with get_job_result.
List available models for this tool, sorted by popularity. Returns provider details and pricing.
Loading activity...
- Style references now supported — pass a style reference for consistent visual direction across UGC content
- Personas from your library are now auto-discovered and injected — pass persona_file_id instead of manual persona fields
- Improved error messages for person-driven formats — agents now get clear guidance to ask the user for appearance details
- BREAKING: write_scripts now always returns full script dialogue (script_body, hook_text, cta, mini_hooks) — removed table format that stripped content in concise mode
- BREAKING: generate_scenes now always returns full scene data (shot_list, setting, lighting, mood, camera_angle, wardrobe) — removed table format
- BREAKING: generate_videos now returns full videos/pending_videos data — removed media format that stripped content in concise mode
- generate_videos: added duration awareness — clips capped to match script target duration instead of generating one per frame
- generate_videos: video clips now download to temp for reliable asset pipeline upload (video_N_path keys)
- write_scripts + generate_creative: improved CTA generation — natural sentences, not bare brand names
- Updated tool instructions with recommended workflow, voice profile guidance, and URL handling
- assemble_final: captions now default to false — users add captions natively on social platforms
- assemble_final: documented max 20 clips per call limit in skill schema
- Instructions now mention regenerate_frame for fixing weak hero frames before expensive video gen
- generate_frames: timeout now returns pending_frames with fal_request_id instead of failing — images can be retrieved later via check_image
- Added check_image skill for polling pending image jobs (mirrors check_video for videos)
- generate_frames: added existing_frames param to skip frames that already have image_url — prevents redundant regeneration
- composite_product: output key renamed from composited_frames to frames for consistency with generate_videos input
- generate_videos: increased estimatedSeconds to 300 and timeoutSeconds to 900 to prevent premature retrigger
- Updated instructions to warn agents against re-calling generate_frames and retriggering generate_videos
- Added 16 UGC format templates: 11 video (talking_to_camera, product_demo, product_showcase, grwm, unboxing, before_after, problem_solution, storytime, pov, voiceover_broll, reaction) + 5 photo (lifestyle_still, product_flat_lay, in_use_shot, aesthetic_moment, mirror_selfie)
- generate_creative now accepts "format" param for format-specific creative direction and scene generation
- Photo formats produce compositions instead of hooks/scripts, mapped to scenes for generate_frames compatibility
- Added "aspect_ratio" param for custom aspect ratios independent of platform presets
- generate_scenes now accepts "format" param for format-specific video scene direction
- Added create_voice_profile skill for consistent voice across video clips
- Wired voice_id into generate_videos for synced lip movement across all clips
- Added subtitle, expanded description, and agent instructions
- Enhanced realism prompts across all image and video generation skills
- Added platform-aware variants (tiktok, instagram_reels, youtube_shorts, instagram_stories, twitter)
- Added wardrobe reference image support for clothing consistency
- Added composite_product skill for real product placement via nano-banana-2 edit
- Added regenerate_frame skill for scene-level iteration
- Added assemble_final skill for video assembly via Remotion render server
- Initial release