Wan 2.5 Video Generation

Wan 2.5 Prompting Guide

Wan 2.5 is best prompted like a short film shot, not a still-image caption. Because many implementations support audio, prompts should often describe not just visuals and motion, but also timing, performance, camera behavior, and sound.

Best Overall Formula

Subject + Environment + Action + Camera + Lighting + Style + Audio + Constraints

Why Wan 2.5 Feels Different

Wan 2.5 often works best when prompts read like film direction notes because the model may support lip sync, native audio, uploaded audio, and stronger prompt awareness.

Single Biggest Rule

Keep one shot idea per clip.

Best Order to Write

Who + where + what happens + camera + light + style + sound + negatives

Prompt Anatomy

1) Subject

Describe the subject in a way that affects generation: wardrobe, age range, posture, hair, expression, emotional state.

2) Environment

Be specific enough to anchor the shot: indoors or outdoors, time of day, weather, props, cleanliness, crowd level.

3) Action

Use visible motion. Prefer concrete actions over abstract feelings.

4) Camera

Specify framing, angle, and movement instead of letting the model improvise.

Useful Lighting Language

  • soft morning window light
  • harsh fluorescent office lighting
  • warm tungsten bulbs
  • moody moonlight
  • neon edge lighting
  • overcast daylight
  • golden-hour backlight
  • candlelit darkness

Useful Style Lanes

  • photoreal cinematic
  • documentary realism
  • glossy fashion ad
  • gritty handheld street footage
  • dreamlike fantasy
  • anime action
  • vintage 1970s film
  • music video aesthetic
  • luxury editorial

Audio Matters in Wan 2.5

Because many Wan 2.5 implementations support audio, sound should be prompted intentionally when relevant.

Useful audio instruction types:

  • ambient only
  • no dialogue
  • whispered dialogue
  • clear voiceover
  • distant traffic
  • thunder rumble
  • soft applause
  • birds and wind
  • nightclub bass
  • footsteps on concrete
  • crackling fire
  • radio static

Text-to-Video

For T2V, your prompt must do more worldbuilding.

A lone astronaut walks slowly through a dim abandoned spacecraft corridor, illuminated by flickering emergency lights and drifting sparks. The camera tracks backward in front of him in a smooth slow dolly shot. Dust floats in zero gravity. Cinematic sci-fi realism, metallic reflections, tense silence, distant alarm beeps, no other characters.

Image-to-Video

For I2V, focus more on what moves, what stays fixed, camera behavior, atmosphere, audio, and preservation.

She remains centered and retains the same facial features, hair, outfit, and composition as the reference image. She blinks, smiles softly, and slowly turns toward the window. The curtains move slightly in a light breeze. Gentle camera push-in, warm afternoon sunlight, quiet room tone, no extra people, no text overlay, no major pose change.

Motion Guidance

Low Motion

  • subtle breathing
  • blinking
  • slight smile
  • hair moving in breeze
  • gentle head turn

Medium Motion

  • walking slowly
  • turning and looking back
  • opening a door
  • sitting down
  • raising a hand

High Motion

  • sprinting
  • fighting
  • crowd chaos
  • explosions
  • rapid camera shake

Camera Movement

One camera move is usually enough.

  • static
  • slow push-in
  • slow pull-back
  • gentle orbit
  • smooth lateral tracking
  • handheld follow
  • overhead descent

Dialogue and Lip Sync

A woman faces camera and says, “I knew you’d come back,” with calm, restrained emotion. Her lip movements stay precise and natural, with subtle blinking and a slight breath before the line. Quiet hallway ambience, no music.

Useful Negatives

  • no extra people
  • no text overlay
  • no watermark
  • no abrupt cuts
  • no shaky camera
  • no deformed hands
  • no duplicated limbs
  • no flickering face
  • no background morphing

Strong Master Template

[Subject] in [environment].
[Action sequence in 1–3 visible beats].
[Shot size / angle / camera move].
[Lighting + atmosphere].
[Visual style].
[Audio behavior: voice / ambience / music / silence].
[Constraints / negatives].

Bottom Line

Clear subject + clear location + one visible action + one camera move + one lighting concept + one style lane + intentional audio + explicit constraints.