Appearance

Account Settings Sign Out
AI generation is temporarily paused. Image, video, and audio generation are disabled by an administrator. Existing content is unaffected.
Docs Home /Models /Fawna Audio

Fawna Audio

Audio is the Veo-family tier with native synced audio as the headline feature. Dialogue, music, ambient, all generated in sync with the video from the same prompt. The closest Fawna gets to a full scene in one generation.

Tier label
Audio
Engine
Google Veo 3.1 Fast with audio mode
Price
From 20 credits per second
Aspect ratios
16:9, 9:16
Resolutions
720p, 1080p
Durations
4, 6, 8 seconds
Audio
Always on, natively synced
Character refs
First-frame image only (I2V mode)
Style ref
Not supported
Keyframes
First and last frame
Negative prompt
Not supported
Magic Prompt
Off

When to pick Audio

  • A character speaks a line and you want it synced to mouth motion.
  • You want ambient atmosphere baked into the clip rather than added in post.
  • You want a short music cue that lands with the on-screen action.
  • You need a one-shot scene with sound for quick drafts or social content.

Directing the audio

Audio uses inline tags to separate what is said from what is heard. Three tags do most of the work:

Dialogue
Put the line in double quotes. Mention who says it.
SFX:
Sound effects tied to on-screen actions.
Ambient:
Background atmosphere.
Music:
Score or musical cues.
Example Dialogue with ambient and SFX
Medium close-up: a bearded fisherman in his fifties stands on a
dock at sunrise, looks at the camera, and says quietly, "We're
leaving with the tide." Handheld, slight sway. 50mm lens, soft
overcast light, muted blue-grey palette. SFX: wooden dock creak,
rope slap. Ambient: gentle surf, distant gull cries.

Voice direction

You can shape the voice with brief modifiers: young woman, elderly man, soft voice, whisper, shouting, sarcastic, accented. The model will do its best to match. It is not precise voice cloning and does not allow you to match a specific person's voice.

Example Shaped voice
A young woman with a quiet, determined voice says, "Not today."
Medium shot.

Strengths

  • True lip-sync. The mouth motion matches the audio.
  • Naturalistic ambient beds. Ocean, forest, city, rain all sound placed and convincing.
  • Same Veo photoreal quality as the Film family.
  • Short durations keep audio coherent (longer clips risk audio drift).

Where it struggles

  • Long monologues. Keep lines under ~10 words each. Multi-sentence dialogue gets compressed or cropped.
  • Music composition is serviceable but not production-grade. Use a real music track for hero pieces.
  • Voice cloning is not supported. Do not expect a specific actor's voice.
  • Three or more speakers in a single clip confuse the audio assignment. One or two speakers max.

Tips

  • Short lines, clear delivery. "We're leaving with the tide." reads better than a paragraph.
  • Layer tags. Dialogue plus SFX plus Ambient in one prompt gives the richest result.
  • Keep duration to 4-6 seconds for dialogue-heavy shots. Longer clips drift.
  • Use brackets to separate speakers: "She says, 'Ready?' He replies, 'Always.'" is clearer than a run-on.

Where to go next

Storyboard
Scene
Replace a shot, or insert a new one