Wan 2.5 Video Generation
Wan 2.5 Prompting Guide
Wan 2.5 is best prompted like a short film shot, not a still-image caption. Because many implementations support audio, prompts should often describe not just visuals and motion, but also timing, performance, camera behavior, and sound.
Best Overall Formula
Subject + Environment + Action + Camera + Lighting + Style + Audio + Constraints
Why Wan 2.5 Feels Different
Wan 2.5 often works best when prompts read like film direction notes because the model may support lip sync, native audio, uploaded audio, and stronger prompt awareness.
Single Biggest Rule
Keep one shot idea per clip.
Best Order to Write
Who + where + what happens + camera + light + style + sound + negatives
Prompt Anatomy
1) Subject
Describe the subject in a way that affects generation: wardrobe, age range, posture, hair, expression, emotional state.
2) Environment
Be specific enough to anchor the shot: indoors or outdoors, time of day, weather, props, cleanliness, crowd level.
3) Action
Use visible motion. Prefer concrete actions over abstract feelings.
4) Camera
Specify framing, angle, and movement instead of letting the model improvise.
Useful Lighting Language
- soft morning window light
- harsh fluorescent office lighting
- warm tungsten bulbs
- moody moonlight
- neon edge lighting
- overcast daylight
- golden-hour backlight
- candlelit darkness
Useful Style Lanes
- photoreal cinematic
- documentary realism
- glossy fashion ad
- gritty handheld street footage
- dreamlike fantasy
- anime action
- vintage 1970s film
- music video aesthetic
- luxury editorial
Audio Matters in Wan 2.5
Because many Wan 2.5 implementations support audio, sound should be prompted intentionally when relevant.
Useful audio instruction types:
- ambient only
- no dialogue
- whispered dialogue
- clear voiceover
- distant traffic
- thunder rumble
- soft applause
- birds and wind
- nightclub bass
- footsteps on concrete
- crackling fire
- radio static
Text-to-Video
For T2V, your prompt must do more worldbuilding.
Image-to-Video
For I2V, focus more on what moves, what stays fixed, camera behavior, atmosphere, audio, and preservation.
Motion Guidance
Low Motion
- subtle breathing
- blinking
- slight smile
- hair moving in breeze
- gentle head turn
Medium Motion
- walking slowly
- turning and looking back
- opening a door
- sitting down
- raising a hand
High Motion
- sprinting
- fighting
- crowd chaos
- explosions
- rapid camera shake
Camera Movement
One camera move is usually enough.
- static
- slow push-in
- slow pull-back
- gentle orbit
- smooth lateral tracking
- handheld follow
- overhead descent
Dialogue and Lip Sync
Useful Negatives
- no extra people
- no text overlay
- no watermark
- no abrupt cuts
- no shaky camera
- no deformed hands
- no duplicated limbs
- no flickering face
- no background morphing
Strong Master Template
[Action sequence in 1–3 visible beats].
[Shot size / angle / camera move].
[Lighting + atmosphere].
[Visual style].
[Audio behavior: voice / ambience / music / silence].
[Constraints / negatives].
Bottom Line
Clear subject + clear location + one visible action + one camera move + one lighting concept + one style lane + intentional audio + explicit constraints.