Appearance

Account Settings Sign Out
AI generation is temporarily paused. Image, video, and audio generation are disabled by an administrator. Existing content is unaffected.
Docs Home /Tools /Audio Generation

Audio Generation

Audio Generation is a dedicated text-to-speech workspace. Type a line, pick a voice, tune the sliders, generate. Use it for voiceover, narration previews, character dialogue, or any audio that is not already covered by scene-by-scene generation in Scene Mode.

How to reach it

App top bar: Create → Audio Gen. Or the storyboard subnav's Audio Gen chip if you are inside a storyboard.

The workspace

Three regions:

Composer
The top panel. Text input, voice settings, and a Generate button.
Voice browser
Right sidebar. Searchable list of every voice across providers with favorites, presets, and filters.
History grid
The rest of the page. Every clip you've generated, newest first, with inline playback.

Voice providers

Three providers, ranked by quality:

ProviderTierStrength
ElevenLabsProfessionalHighest quality. Full expressive control.
Kokoro (via Replicate)PremiumHigh quality. Simpler controls.
Google Cloud TTSFree60+ neural voices across 20+ languages.

Picking a voice

The voice browser lists every available voice. Each row shows the voice name, provider, gender, language, accent, and style. Filter by provider or language. Click a voice to load it into the composer.

A star button on each row adds a voice to Favorites. Favorites live at the top of the browser for quick access.

Voice settings

ElevenLabs exposes the full expressive knob set:

Speed
0.5x to 2x. 1x is natural.
Stability
0-100. Higher is more consistent. Lower gives more expressive variation (with drift risk).
Similarity boost
0-100. How strictly to hew to the base voice's identity. Higher is safer.
Style
0-100. Emotional expressiveness. Zero is documentary-neutral. Higher is dramatic.

Google and Kokoro expose a smaller set (primarily speed). Sliders hidden when not applicable to the current voice.

Voice presets

Save a configured voice + slider settings as a named preset. Reuse across projects. Presets are scoped to you and do not leak to other users.

Presets cover recurring needs: a specific documentary narrator for every essay, a warm memoir voice for intimate stories, an energetic explainer voice for tutorials. Saved presets appear at the top of the voice browser alongside favorites.

Generating a clip

  1. Type the text into the composer. Up to 5,000 characters per clip.
  2. Pick a voice.
  3. Tune sliders if needed.
  4. Click Generate. The request is synchronous (1-3 seconds typical).

The new clip lands at the top of the history grid with a waveform preview and inline playback.

Clip history

Each tile in the history grid shows:

  • The first 60 characters of the text.
  • Duration.
  • Voice name.
  • Waveform preview (click to play).

The tile's menu lets you regenerate (with the original text and settings preloaded), download, add to a storyboard, or delete.

Adding a clip to a storyboard

Add to Storyboard in a clip's menu opens a picker with your storyboards and their scenes. Choose a target. The clip is copied into that storyboard's Library as an audio asset with origin = uploaded, ready to drag onto the timeline.

Cost

A flat rate of 20 credits per 1,000 characters, rounded up, minimum 1 credit. Applies to every provider. The generate button shows the estimated cost before you confirm.

Use Audio Gen to pre-roll narration before committing to a full storyboard. Generate your script as a single clip, play it back, and time the delivery. If the pacing is off, edit the script and regenerate. Cheap iteration before expensive image and video work.

Limits

  • 5,000 characters per clip. Split long narration into multiple clips.
  • No voice cloning. You cannot upload a reference voice to match.
  • Language detection is automatic but imperfect. Write in the language you want the voice to speak.

Where to go next

Storyboard
Scene
Replace a shot, or insert a new one