Gemini Omni Prompt Guide — Write Prompts That Work (2026)

Why Gemini Omni Prompts Work Differently

Most text-to-video models treat the prompt as a scene description: you describe what you want and the model generates it once. Gemini Omni is designed differently. It treats the prompt as a creative brief — a starting point you can continue to refine through conversation.

You do not need to front-load everything.

Describe the core scene, generate, then use follow-up turns to adjust camera angle, change a character, move the action to a different environment, or modify the lighting. Each turn preserves what worked and only changes what you asked.

Your references are part of the prompt.

When working with images, video clips, or audio files, name them in your prompt using @image1, @video1, or @audio1. The model uses those references exactly where you point them.

Gemini Omni understands world knowledge.

You do not need to describe how gravity works, how steam behaves, or how firelight moves. Gemini Omni draws on real-world physics and science knowledge automatically. Describe the scene; the model fills in the physical behaviour.

The Gemini Omni Prompt Formula

A strong Gemini Omni prompt answers four questions:

What is the subject?

Who or what is in the frame

What is the action?

Motion, gesture, event — what is happening

What is the environment?

Setting, time of day, weather, atmosphere

How should it look?

Camera angle, style, lighting, mood

Basic formula

[Subject] + [action] + [environment/setting] + [camera/style/mood]

Example applying the formula:

"A barista pours steamed milk into an espresso shot in slow motion. Close-up shot, top-down angle. Warm amber café lighting, shallow depth of field, cinematic."

You do not need to use all four elements every time. For simple scenes, subject + action is enough. Add environment and camera details when you want more control over the output.

Camera and Style Keywords That Work in Gemini Omni

The Gemini Omni Video Generator responds well to standard cinematography language. Using these terms gives the model a precise visual vocabulary to work from.

Camera movement

Keyword	What it does
Static shot	No camera movement — clean, controlled
Slow push in	Camera moves gradually toward the subject
Dolly zoom	Camera moves while focal length adjusts — disorienting depth effect
Tracking shot	Camera follows the subject
Overhead / top-down	Bird's-eye perspective
Low angle	Camera below subject eye line — adds dominance
Over-the-shoulder	Classic conversational or POV framing
Handheld	Slight movement, naturalistic feel
Drone shot	Aerial, wide establishing

Shot framing

Keyword	What it does
Extreme close-up	Detail — texture, eyes, small objects
Close-up	Face or single object fills frame
Medium shot	Subject from waist up
Wide shot	Full subject with environment context
Establishing shot	Broad environment, subject small or absent

Style and mood

Keyword	What it does
Cinematic	Film-grade colour grading, natural motion blur
Photorealistic	As close to real footage as possible
Shallow depth of field	Subject sharp, background blurred
High contrast	Deep shadows, bright highlights
Soft natural lighting	Diffused, flattering, no harsh shadows
Golden hour	Warm amber tones, long shadows
Neon-lit	Saturated, urban, nighttime colour palette
Wes Anderson style	Symmetric framing, pastel palette, deadpan
Stop motion	Frame-by-frame aesthetic, slight jerkiness
Claymation	Everything looks made of clay, physical texture

Pace and motion

Keyword	What it does
Slow motion	Action slowed dramatically
Time-lapse	Motion sped up — clouds, crowds, flowers
Continuous smooth shot	No cuts, single flowing take
Jump cut	Rapid edit feel

Text-to-Video Prompt Examples

These prompts work from a text description alone — no reference files required.

Product / commercial

"A sleek black smartphone rotates slowly on a dark reflective surface. Studio lighting, soft rim light from behind. 4K look, product shot aesthetic. Clean and minimal."

"A bottle of olive oil being poured in slow motion onto a white plate. Golden oil catching studio light. Close-up, top-down angle. Premium food photography style."

"A pair of white sneakers on a wooden floor. Camera slowly circles the shoes. Natural daylight from a window. Clean, editorial style."

Lifestyle / social

"A woman walks through a sunlit forest path in autumn. Leaves falling gently around her. Tracking shot from behind, slightly wide. Warm golden tones, serene mood."

"A cozy living room at night. A fireplace burning, a book open on the coffee table, soft lamp light. Gentle slow push in toward the fire. Calm and peaceful."

"Two friends laughing over coffee at an outdoor café in summer. Handheld shot, candid feel. Warm afternoon light, shallow depth of field."

Nature / landscape

"A marble rolling fast along a chain-reaction track. Continuous smooth shot following the marble. Real-world physics — momentum, gravity, and contact sound included."

"Time-lapse of a thunderstorm approaching over an open field at dusk. Wide shot, low angle. Dramatic sky, dark clouds building."

"A drone shot flying slowly over a misty mountain valley at sunrise. Golden light breaking through clouds. Camera gradually ascending to reveal a hidden lake."

Educational / explainer

"A claymation explainer of how a seed becomes a plant. Everything is made of clay. Stop-motion aesthetic, accurate growth stages, no narration, 10 seconds."

"Words appearing one at a time on screen — DID, YOU, KNOW, THIS, MODEL, CAN, RENDER, TEXT — each word in a different animated style, paced to a rhythm."

Stylized / creative

"10-second anime action sequence. A character in dark robes draws a sword that emits crackling blue energy. Camera follows the arc of the slash. Dramatic speed lines."

"A Wes Anderson-style hotel lobby. Perfectly symmetrical wide shot. Pastel pink walls, geometric patterns. A bellhop in a red uniform stands dead center. Centered tracking shot pulling backward."

"A person walking down a city street at night. As they walk, the world around them gradually transforms into a retro-futuristic aesthetic — neon signs, chrome surfaces, flying vehicles in the distance."

Image-to-Video Prompt Examples

Upload an image as a reference and describe how to animate it. Gemini Omni preserves the composition and visual identity of your image while generating motion.

Animating a still photo

"@image1 — animate this product photo with a slow 360-degree rotation. Keep the lighting consistent with the original. Studio background, no motion blur."

"@image1 — the person in this photo is standing on a cliff. Add wind moving through their hair and clothes. Keep everything else static. Cinematic."

"@image1 — this is a painting of a forest. Bring it to life: leaves rustling, light shifting through the canopy, birds moving in the background. Stay close to the original colour palette."

Style transfer from a reference image

"A barista making coffee in a café. Use @image1 as the visual style reference — match the colour grading, lighting mood, and film grain from that image."

"A busy street market at sunrise. Apply the watercolour illustration style from @image1 to the whole scene."

Placing a product or character into a scene

"Generate a video of a busy city street at night. Place the product from @image1 on a café table in the foreground. The camera starts wide and slowly pushes in toward the product. Neon ambient light."

"A dancer performing on a stage. The costume should match the design in @image1 exactly — preserve all details including colour, texture, and pattern."

Multi-Input Prompt Examples — @image1, @video1, @audio1

Gemini Omni's @-tagging system lets you name your reference files directly in the prompt. This is what separates a vague multi-file request from a precise creative brief.

Character image + motion reference

"Generate a front-facing full-body walk cycle of the character from @image1. Follow the movement rhythm and style from @video1. Stay consistent with the character's proportions, costume, and colour scheme throughout."

"The person in @image1 is walking through the environment shown in @video1. Match the lighting of the environment. The person's appearance should not change between frames."

Video + audio

"Take @video1 — a walking video on a city street — and imagine the world gradually transforming into a retro-futuristic aesthetic as I walk. Generate background music matching the mood from @audio1. 10 seconds."

"Apply the sound design from @audio1 to the action in @video1. The sound should be precisely synchronized to the motion — every impact and movement should have an audio match."

Image + video + audio combined

"Create a front-facing walk cycle of @image1 character. The visual style should shift rapidly between different artistic styles during the walk, synchronized to the beat of @audio1. Start from realistic cinema style."

"A flock of birds flying in the sky loosely forms the shape of the animal from @image1. Their wings flap in sync with the rhythm of @audio1 before they dissipate into clouds."

Chat Editing Prompts — How to Refine a Generated Clip

After generating a clip, continue the conversation to change specific elements. Each instruction should be clear about what to change and what to preserve.

The key principle: Be specific about what changes. Be implicit about what stays. The model preserves everything you do not mention.

Camera and angle

"Change the camera angle to be over the shoulder."

"Pull the camera back to a wide shot."

"Add a slow push-in toward the subject."

"Switch to a low-angle shot looking up at the character."

Environment and setting

"Transport the character to a beach at sunset. Keep the character's appearance and action exactly the same."

"Change the background to a modern glass office building."

"Make it night instead of day. Add city lights in the background."

Character and subject

"Make the violin invisible but keep the musician's posture and emotion."

"Change the spaceship to a watch."

"Remove the person in the background."

"Change the character's costume to a red dress."

Style and mood

"Apply a warm golden-hour colour grade to the whole clip."

"Add rain to the scene."

"Make it look like the 1970s — add film grain, slightly faded colours."

"Convert the style to anime."

Audio

"Add the sound the animal makes when the finger touches it."

"Replace the ambient sound with café noise — quiet conversation and coffee machine sounds."

"Add a dramatic orchestral swell when the character turns around."

Ready-to-Use Prompts by Use Case

Social media content

"A close-up of hands assembling a burger layer by layer in slow motion. Overhead shot. Warm restaurant lighting. No narration — let the visuals and ambient sound tell the story."

"A timelapse of a coffee being poured into a glass of ice in slow motion. The liquid swirls slowly. Moody, dark background with one strong beam of light."

Product advertising

"A running shoe on a track. The camera starts at ground level and slowly rises as the shoe lifts off in mid-stride. Motion blur on the background, shoe sharp. Sportswear brand aesthetic."

"A luxury watch on a dark velvet surface. A single spotlight creates a reflection. The camera slowly orbits the watch. Elegant, minimal, no text."

Explainer and education

"A skeuomorphic stop-motion explainer of how the hippocampus works. No seahorses. A compelling voiceover describes the memory-formation process. No text on screen. 10 seconds."

"Show the protein folding process as a claymation. Everything is made of clay. No hands visible. Stop-motion aesthetic. Scientifically accurate."

Real estate and architecture

"A modern open-plan living room at golden hour. Natural light streaming through floor-to-ceiling windows. A slow, smooth dolly shot moving from the kitchen toward the living area. Architecture photography style."

App and SaaS demos

"Show a clean interface where a user enters a prompt, clicks generate, and sees a polished video result. Smooth UI motion, subtle zooms, clean transitions. The style should feel simple, modern, and trustworthy."

Prompt Mistakes to Avoid

Overloading the first prompt

Gemini Omni supports iterative editing. You do not need to describe every detail upfront. Start with the core scene — subject, action, setting — then refine camera and style through follow-up turns.

Instead of:

"A woman with brown hair wearing a blue coat walks through a rainy Paris street at night with neon reflections on wet cobblestones and a melancholy mood with shallow depth of field and a tracking shot from behind at medium distance"

Try:

"A woman walks through a rainy Paris street at night. Tracking shot from behind." → Then follow up: "Add shallow depth of field and neon reflections on the wet cobblestones."

Contradictory camera instructions

Avoid combining movements that conflict. "Static handheld shot" or "wide close-up" creates confusion.

Fix: Pick one clear camera instruction per turn.

Vague style descriptors

"Make it look cool" or "cinematic vibes" gives the model little to work with. Replace vague mood words with specific visual references.

Instead of:

"cinematic vibes"

Try:

"shallow depth of field, natural window light, muted colour grade"

Forgetting @-tags with multiple files

If you upload multiple files but don't @-tag them, the model has to guess which reference applies where. Always name your assets.

Instead of:

"Use the image and the audio to make a video"

Try:

"Create a walk cycle of the character from @image1. The style should shift in sync with the beat from @audio1."

Gemini Omni Prompt Guide — FAQ

How long should a Gemini Omni prompt be?

For text-to-video, 20–80 words covers most use cases. More detail gives more control, but you do not need to describe everything upfront — use follow-up turns to add specificity. Prompts up to several hundred words work for complex multi-reference briefs.

Does Gemini Omni understand cinematic terminology?

Yes. Terms like "dolly zoom," "tracking shot," "shallow depth of field," "golden hour," and specific style references (Wes Anderson, claymation, anime) work reliably. Using standard cinematography vocabulary gives the model clearer direction than describing the outcome in plain language.

Can I use @image1 with Image-to-Video mode?

Yes. In Reference mode on GeminiOmniHub, upload your file and reference it as @image1, @video1, or @audio1 in your prompt. You can use up to 5 reference images in a single generation.

What happens if my prompt produces unexpected results?

Use chat editing to correct it. Describe specifically what needs to change: "Change the background to a forest" or "Remove the text overlay." The model will patch the element without regenerating the full clip.

Does audio generation require a specific prompt instruction?

By default, Gemini Omni generates audio alongside video when the Generate audio toggle is enabled. You can also specify audio in your prompt: "Add the sound of rain on windows" or "Include ambient café noise." For precise audio control, upload an @audio1 reference.

Can I reference a specific film or director's visual style?

Yes. Gemini Omni has broad world knowledge that includes filmmaking styles. References like "Wes Anderson symmetry," "Christopher Nolan practical lighting," or "1970s film grain" work as style guides. Be specific about what visual elements you want from that reference, not just the name.

Try your prompts on GeminiOmniHub

10 Free Credits — No Card Required

Put these prompts to work immediately. New accounts get 10 free credits on signup — no credit card, no subscription.

Try Your First Prompt Free →How to use Gemini Omni

No credit card required · No subscription · 18+ only