By HiggsfieldOctober 13th
This guide will teach you how to provide the right instructions to unlock the power of Sora 2.
❇️ Sora 2 excels at:
Transforming concise, director-level instructions into short, believable video clips with remarkable continuity
It intuitively understands and executes clear directions for framing, lighting, & complex subject action, delivering clips that feel consistent and professionally crafted
These are the fundamental principles for getting predictable, high-quality results.
Shot Composition
For maximum consistency, detail each shot like a storyboard by specifying its framing, depth of field, lighting, palette, and action.
Motion & Timing
Achieve the clearest motion by pairing one specific camera move with one distinct subject action, using strong verbs and defined timing.
Dialogue Handling
Place short lines of dialogue in a dedicated “Dialogue” block to ensure the most accurate lip-sync and verbatim delivery.
Stylistic Control
Establish the overall visual style upfront and then use concrete, descriptive nouns and verbs to ensure stylistic consistency.
Multi-Shot Sequencing
When creating a multi-shot sequence, define each shot as a distinct block with its own unique setup, action, and lighting to maintain maximum control.
At its core, Sora 2 operates on a powerful reasoning model.
This means it doesn't just follow instructions; it creatively interprets them - anything you don't specify the AI will invent for you.
Prompt length is a strategic choice. Short prompts invite creative partnership; long prompts give you precise directorial control. Knowing when to use each is key to professional results.
❇️ The most effective short prompts follow a simple rule: declare the format and style upfront.
By defining the format, a simple description yields a rich, coherent result.
Two approaches to short prompts:
Descriptive Prompt (Painting a Picture) | Directive Prompt (Giving a Command) |
These prompts describe a scene, a character, or a situation, letting the AI creatively interpret the specific details and camera work.
“A rapper who writes and records music, but gets booed off the scene at the end.”
These prompts give a direct command to the AI, often including specific actions, cuts, or a sequence of events.
“Make a UGC-style reaction video: a person tastes a new drink, three fast cuts—open, sip, smile.”
When your creative vision is specific and you need maximum control, a structured and detailed prompt is the best approach.
This guide covers the two primary methods for high-control prompting:
Production Brief📃 | Time-coded Shotlist⏱️ |
for atmospheric scenes/precise composition/rhythm & sound. | or fast-paced sequences/exact pacing/transitions & audio hits. |
This method is ideal for creating a single, rich, and emotionally resonant scene. You only need to include the blocks relevant to your specific video.
Format & tone | Specify the genre - cinematic ad, UGC reaction, music video, or mini-scene. |
Main subject(s) | Briefly describe the main characters. |
Wardrobe and props | List clothing and key objects visible in the frame. |
Location & framing | Define shot size and angle; anchor foreground, mid-ground, and background elements to maintain spatial continuity. |
Lighting & palette | Outline 1–3 timed beats; pair each beat with one camera movement and one subject action. |
Continuity rules | Define weather and time of the day that must remain consistent across cuts. |
Actions & camera beats | Outline 1–3 timed beats; pair each beat with one camera movement and one subject action. |
Montage plan | Specify cuts (jump, match), inserts, transitions (whip-pan, flash-frame), and pacing (e.g., "three fast inserts, then 0.5-second hold"). |
Dialogue (if any) | Include short, labeled lines of speech. |
Sound & foley | List specific micro-sounds (peel, snap, pour, shoe squeak) plus ambient audio. |
Finish | Note film grain, halation, LUT intent, and desired final frame. |
Prompt Example:
Format & Tone
Cinematic mini-scene — emotional realism with a soft romantic rhythm and atmospheric intimacy. Tone: nostalgic, tender, immersive.
Main Subject(s)
A young couple standing close under one umbrella in the rain — their chemistry quiet but electric, eyes locked, hesitant smiles.
Wardrobe and Props
She wears a beige trench coat, pearl earrings, and carries a transparent umbrella; he wears a navy jacket, white shirt, and wristwatch reflecting streetlight. Props: umbrella, takeaway coffee cup gently steaming.
Location & Framing
Rain-soaked cobblestone street at dusk outside a softly glowing café.
Foreground: falling raindrops and bokeh reflections.
Midground: the couple framed beneath the umbrella.
Background: café sign glowing amber, blurred city silhouettes.
Camera alternates between gentle dolly-ins, over-shoulder close-ups, and slow ¾ circular arcs to preserve emotional depth.
Lighting & Palette
Warm café light spilling onto cool blue-gray rain.
Light sources: diffused streetlight key from camera left, amber window backlight.
Color anchors: blush pink, amber gold, navy blue, cool gray, and ivory skin tones.
Soft diffusion lens and wet reflections maintain continuity.
Actions & Camera Beats (0–12 s)
0–4 s — Wide shot: camera slowly pushes in through rain toward the couple; she adjusts the umbrella, faint smile.
4–8 s — Medium shot: he reaches for her hand; droplets cascade down joined fingers; camera drifts laterally, catching the reflection of neon light across their faces.
8–12 s — Close-up: their foreheads gently meet; camera rises slightly, focusing on their breath mixing in the rain-haze before fading into soft blur.
Montage Plan
Three inserts: (raindrop hitting umbrella → fingertip touch → smile).
Smooth match cuts guided by piano rhythm; final 0.5-second emotional hold before fade-out.
Transitions use natural lens flares from passing car headlights.
Dialogue
Whisper (female): “Stay a little longer.”
He exhales softly, smiling.
Sound & Foley
Soft rainfall, muffled footsteps on wet cobblestone, umbrella fabric tension, faint breath, distant café hum, and soft piano underscore with subtle reverb.
Finish
Light film grain, warm halation on highlights, gentle chromatic bloom around neon reflections.
LUT intent: vintage romance with balanced teal–amber contrast.
Poster frame: their hands clasped beneath the umbrella, neon reflections rippling across the puddled ground like living light.
This method is best for fast-paced commercials, ads, or montages where syncing action to a specific rhythm is key.
Header | An [DURATION]-second [FORMAT] with [TRANSITION STYLE] and [CAMERA STYLE]. |
[0–A s] — OPEN | [SHOT SIZE and ANGLE] on [SUBJECT] doing [ACTION] + [CAMERA BEHAVIOR & PACE] + [CUT / TRANSITION] + [AUDIO] |
[A–B s] — TRANSITION / INSERT | [ACTION]. + [CAMERA BEHAVIOR]. + [CUT / TRANSITION] + [AUDIO] |
[B–C s] — RUN / DEVELOPMENT | [ACTION] + [CAMERA BEHAVIOR] + [CUT / TRANSITION] + [AUDIO] |
[C–D s] — IMPACT / REVEAL | [ACTION / VFX] + [CAMERA BEHAVIOR] + [CUT / TRANSITION] + [AUDIO] |
[D–E s] — OUTRO / BUTTON | [HERO ANGLE / REACTION / FINAL GESTURE]. + [CAMERA LOCK-OFF] + [FINAL IMAGE DESCRIPTION]. |
VISUAL CUES | Duration guidance: 4 s ≈ 2–3 beats · 8 s ≈ ~5 beats · 12 s ≈ 6–7 beats. |
Prompt Example:
An 8-second ultra-cinematic video with seamless transitions and dynamic camera motion.
[0–2s]: Extreme close-up of a woman’s eye — ultra-detailed iris, reflections of light, camera slowly dolly-ins toward the pupil. Soft ambient hum builds tension.
[2–3s]: The camera flies into the pupil — smooth CG transition as the iris morphs into a mechanical world: gears, oil, and valves moving in slow motion, like entering the inside of a powerful engine.
[3–5s]: FPV-style flight through the engine interior — sparks, moving pistons, rushing sound design of roaring combustion. The camera races between metallic tunnels, heading toward a glowing valve gate.
[5–6s]: The camera bursts through the valve — a blinding flash of light transitions instantly to…
[6–8s]: …a dark underground parking lot, filled with smoke and flashing lights. Two sports cars drift in perfect synchronization, tires screeching, sparks flying. The camera flies between them, close to the ground, capturing reflections of neon lights on the concrete.
Final beat — quick whip-pan and hard cut on engine rev sound.
4K cinematic realism, dynamic FPV camera movement, seamless CG transition (eye → engine → parking lot), warm-to-cool color palette, dramatic lighting, slow motion + motion blur, atmospheric smoke, high-octane tone.
Everything you generate should be original and non-infringing. Don't imitate protected IP-characters, logos, or packaging trade dress. Don't use real people's likeness without permission, and avoid making claims you can't substantiate.
No realistic faces in image-to-video.
Use stylized characters (anime, toon, painterly) instead of real people.
No lookalike prompts.
Avoid wording that targets a specific person or celebrity, even indirectly (“like X actor”)
No minors
in risky setups or suggestive contexts.
Avoid medical, political, or crisis exploitation tropes; if unavoidable, use neutral, factual framing and disclaimers.
Unleash total creative control and create viral AI videos with Sora 2.
Discover more
Shot-driven instruction following (like storyboarding), motion clarity, brief synchronized dialogue handling, and stylistic steerability.
Precisely define the key cinematic elements: shot framing, depth-of-field (DOF), beat-by-beat action, and the scene's lighting scheme and color palette.
Use it for fast, creative, and low-spec results, or when you want the model to choose for you.
Use it when you require granular control over composition, timing, palette, audio, or finishing.
For maximum clarity and impact, tie one camera move to one character action per beat. Drive the scene forward by using strong, specific verbs instead of vague ones.
To establish clear visual depth and composition, define the key elements that anchor the foreground, midground, and background within your Location & Framing description.
SORA 2 has a reasoning model and will creatively expand anything you omit.