Writing a great image prompt
Image prompts are shorter than video prompts but the discipline is the same: concrete nouns, named visual terminology, no filler adjectives. Illustrate and Compose share the same prompt style.
Start with "A photo of..."
If you want photorealistic output, start the prompt literally with the words "A photo of". Models trained on large image corpora learn "a photo of" as a strong visual prior for photographic lighting, grain, and natural compositions. Skipping it tends to produce illustration-leaning results even when you ask for realism elsewhere.
For stylized output, name the style at the start: "An oil painting of...", "A pencil sketch of...", "A charcoal study of...", "A watercolor illustration of...".
The four-part formula
Image prompts are tighter than video. Structure them as Subject, Scene, Lens and lighting, Style tags. Aim for 30 to 80 words.
Worked examples
A photo of a cast-iron skillet on a dark wooden table, seared sourdough bread torn open beside it, steam rising, a sprig of rosemary and a pat of melting butter on the bread. Overhead soft window light from the left, deep shadows on the right. 50mm macro lens, shallow depth of field, natural color palette, subtle film grain.
A photo of a woman in her late twenties, long dark braid, freckles across her nose, wearing a mustard knit cardigan over a white t-shirt. Seated on a wooden bench in a sunlit greenhouse, trailing ivy in the background. Golden hour side light through glass, 85mm portrait lens, shallow depth of field, warm natural color palette.
A watercolor illustration of a small red fishing village clinging to cliffs above a calm harbor at dusk, lights flickering in the cottages, a wooden dock and two rowboats in the foreground. Soft wet-on-wet technique, muted ochre and slate palette, visible paper grain, hand-painted linework.
Lighting vocabulary
Light direction matters more than light adjective. These terms produce predictable results:
- Side light / rim light: light from one side, dramatic shadow on the other.
- Soft overhead window light: diffused daylight, low contrast.
- Golden hour backlight: warm low sun behind subject, halo glow.
- Blue hour: cool twilight, artificial lights starting to turn on.
- Studio key light with fill: controlled portrait lighting.
- Single candle source: warm point light, heavy falloff, chiaroscuro.
Lens and depth
Focal length shapes the image more than any other technical tag. A quick map:
- 24mm wide: environment, slight distortion, everything in focus.
- 35mm: natural, documentary, context plus subject.
- 50mm: close to human eye, general purpose, versatile.
- 85mm portrait: flattering faces, separated background.
- 135mm telephoto: compressed perspective, creamy bokeh.
- Macro: extreme close-up, texture study.
Aspect and framing
Pick the aspect first. The model composes differently for each. Hero images for web usually look best at 16:9 or 3:4. Stories and vertical video thumbnails use 9:16. Square 1:1 is the safest default for social.
Illustrate supports 1:1, 4:5, 3:4, 16:9, and 9:16. Compose supports 1:1, 16:9, 9:16.
Writer's block on a prompt? Toggle Magic Prompt on. It rewrites your short brief into a full prompt following the formula above. Covered in Magic Prompt.
What to avoid
- Stacked empty adjectives ("beautiful stunning gorgeous amazing"). They are noise.
- Negation inside the prompt ("no watermark, no text"). Use the negative prompt field.
- Resolution claims ("4K ultra HD 8K"). They correlate with low-quality training data and bias toward oversaturated output.
- More than three distinct subjects in one image. Models merge or drop features past that count.
Where to go next
- If you need to keep the same face across many images, read Character consistency.
- For reference-driven gen, read Using references.