Writing a great video prompt
Video models reward specificity. Vague prompts produce vague output. The biggest single lever for quality is treating the prompt like a shot list: subject, action, scene, camera, style. Nothing more, nothing less.
The five-part formula
Structure every prompt as Subject, Action, Scene, Camera, Style. Keep the order. Models pay disproportionate attention to the first clause, so put your subject there. Keep the total length between 60 and 120 words. Shorter prompts under-direct the model, longer ones dilute attention.
dolly, pan, crane.Camera language that works
Models have seen millions of tagged clips labelled with film terminology. They respond well to named techniques and poorly to metaphor. Use these as a starter vocabulary:
- Shot size: extreme wide, wide, medium, medium close-up, close-up, extreme close-up.
- Angle: low angle, eye level, high angle, overhead, Dutch tilt.
- Movement: static, handheld, dolly in, dolly out, crane up, crane down, tracking shot, arc around subject, slow pan left, whip pan, orbit.
- Lens: 24mm, 35mm, 50mm, 85mm, 135mm, macro, tilt-shift, anamorphic.
- Depth: shallow depth of field, deep focus, rack focus, bokeh background.
One camera verb per prompt. Two moves in a single clause ("dolly in while panning left and craning up") tends to produce visual mush. Pick the move that matters most and let the other motions happen naturally.
Examples that land
Cinematic wide shot: a weathered fisherman in his sixties, thick grey beard, yellow oilskin coat, mends a net on the deck of a small trawler at dawn. The boat rolls gently on calm grey water. Low sun breaks through coastal fog behind him. Slow dolly forward. 50mm lens, shallow depth of field, desaturated color palette, soft film grain.
An old man doing fisherman stuff, atmospheric, beautiful lighting, cinematic masterpiece, 4k ultra hd epic.
Style cues that matter
Four lightweight tags will do more than a paragraph of adjectives: a lens, a lighting quality, a color palette, and a film texture. Example style tail: 50mm lens, warm golden hour side light, earthy palette of rust and sage, 16mm film grain.
Avoid "cinematic masterpiece", "epic", "award-winning", "4K ultra HD", "masterpiece quality". These phrases have been absorbed as noise during training and either do nothing or bias toward oversaturated, generic output.
Describing motion
Motion is easy to over-specify. One clear action beats three vague ones. "A woman raises a coffee cup to her lips, exhaling steam" is better than "A woman is moving around gracefully, interacting with her environment in a natural way". Concrete, single-subject actions with a clear start and end read best.
What to leave out
- Do not write stage directions addressed to the model ("now zoom in", "then cut to"). Models do not interpret editing instructions, they try to render them literally.
- Do not describe sound if the model is silent. It wastes tokens on tiers like Drama.
- Do not stack three synonymous adjectives ("sad melancholic mournful"). Pick one.
- Do not use negation ("no text, no logos, no watermark"). Use the dedicated Negative Prompt field for that. Negation inside the main prompt inverts unpredictably.
Quick checklist
- Is my subject in the first ten words?
- Is there exactly one camera move?
- Am I between 60 and 120 words?
- Have I said lens, light, palette, texture?
- Have I removed every "cinematic", "epic", "masterpiece"?
Where to go next
- Using references for attaching images to lock subject or style.
- Character consistency for keeping the same person across shots.