How to use Kling O3 Standard Reference Video to Video on RunDiffusion

Kling O3 Standard Reference Video to Video in RunDiffusion helps you generate tightly guided video results by combining a source video with reference images and subject-specific element images. This gives you much more control than prompt-only video generation, especially when you want to preserve motion, maintain consistency, or steer the final look with specific visual inputs.

In this tutorial, you will learn how each input works, how to prepare your files, and how to get better results from short controlled generations. The goal is simple: use the source clip for structure, then layer in visual guidance only where it improves the output.

Launch Kling O3 Standard Reference Video to Video

Start by opening the tool directly.

Try Kling O3 Standard Reference Video to Video

You can also access it manually Login to RunDiffusion

Open RunDiffusion's Runnit Platform

Click on Launchpad on the left sidebar

Click on Tools

Then Search for Kling o3 and click on Kling o3 Std Ref. Vid to Vid

Kling O3 Std Ref. Vid to Vid tool card in Runnit — Kling O3 Std Ref. Vid to Vid in the RunDiffusion tool library.

You can also click here to go directly to the tool:

Kling o3 Standard Reference Video to Video

What Kling O3 Standard Reference Video to Video Does

Kling O3 Standard Reference Video to Video is designed for controlled video generation. Instead of asking the model to invent motion from scratch, you provide a reference video that shapes the movement, pacing, and shot structure of the final result.

You can then add two other layers of guidance:

Reference Images for general visual direction
Element Images for subject-specific consistency

This setup is useful when you want to:

Preserve motion from an existing clip
Keep a featured subject more consistent
Guide color, composition, or frame styling
Transform a clip while keeping its original timing
Retain the input audio when needed

Concept illustration of controlled video generation guided by reference motion and visual styling — Concept art showing how guided video generation can evolve a source motion path into a more stylized final result.

Understand the Main Controls

The interface includes these key fields:

Prompt
Reference Video
Keep Original Audio
References Images
Element 1
Element 2
Video Duration
Aspect Ratio

Each control influences a different part of the result. The strongest outputs usually come from using each section with a clear purpose instead of loading every possible input at once.

Top section of the Kling O3 Std Ref. Vid to Vid settings panel in Runnit — The top of the settings panel shows the prompt box and the first control blocks.

Reference Video Requirements

Your uploaded reference video must meet these limits:

.mp4 or .mov other video files are not supported.
3 to 10 seconds long
720px to 2160px resolution
Maximum 200MB

These limits matter because the model depends on the source clip as the structural guide. A clean, short, well-framed video usually produces more predictable results than a long or visually messy clip.

Best Practices for Your Source Clip

Choose a reference video that has:

One clear action or camera move
Stable framing
Good lighting
Minimal cuts
Limited motion blur

A strong input video gives the model a cleaner motion blueprint to follow.

Step 1: Write the Prompt

Start with a direct prompt that explains what should happen in the final clip. Keep it focused on the result, not on vague style words.

A good prompt usually includes:

The subject
The action
The look or mood
What should stay consistent
How the references should be used

The clearer the job of each reference, the easier it is for the model to follow your direction.

Creative planning scene for writing prompts and organizing references for AI video generation — Prompt planning works best when your motion goals, references, and visual direction are aligned before you generate.

Step 2: Upload the Reference Video

In the Reference Video section, upload your main source clip. You can select one from the left panel, upload a new file, or pull it from your library.

This is the most important input because it influences:

Motion
Camera movement
Sequence pacing
Framing cues
Overall timing

If your source video is shaky, cluttered, or visually confusing, the generation can inherit those problems. Start with the cleanest clip you have.

Reference Video upload area in Kling O3 Std Ref. Vid to Vid — The Reference Video area is the main structural input for your generation.

Step 3: Choose Whether to Keep Original Audio

The Keep Original Audio toggle lets you preserve the audio track from the uploaded reference video.

Turn it on when:

The original audio already fits the clip
You need ambient sound, synced effects, or dialogue
You want the transformed video to stay aligned to the source sound

Turn it off when:

You plan to replace the audio later
The visual transformation no longer matches the source sound
The input audio is not useful for the final output

If you generate multiple variations and need to manage exports, the bulk download in Runnit article can help streamline your output workflow.

Step 4: Add Reference Images for General Visual Control

The Reference Images section lets you upload up to 2 general reference images. These are flexible visual guides that can shape the overall result without locking the model to one exact subject identity.

You can refer to them in your prompt as:

@Image1
@Image2

What Reference Images Are Good For

Use them to guide:

Composition
Color palette
Mood
Lighting style
Starting-frame direction
Broad visual influence

Example Prompt Uses

“Use @Image1 as the starting frame.”
“Match the soft lighting and interior palette of @Image2.”
“Adopt the composition of @Image1 and the color treatment of @Image2.”

These are especially helpful when you are building toward a specific look instead of just describing that look in text. That is similar to how visual planning works in the mood board workflow in Runnit article.

Image References section in Kling O3 Std Ref. Vid to Vid — Use Image References to guide the overall look, palette, and composition.

Step 5: Use Element Images for Subject Consistency

Element images are more specific than general image references. They are meant to help the model understand a particular subject that should remain recognizable in the video.

This could be:

A person
A character
A product
A vehicle
A prop
A furniture item
A building feature

Important Rule for Element Image Inputs

You can define up to 2 elements:
@Element1
@Element2

If you upload a primary image for an element, you also need to provide one or two additional images of that same element from different perspectives.

This matters because the model needs more than one view to understand the subject’s shape and appearance. A single image often is not enough for reliable consistency.

Best Practices for Element Images

Use images that are:

Clear and high resolution
Well lit
Visually consistent with one another
Focused on the same exact subject
Taken from different angles

Avoid cluttered backgrounds, heavy filters, and mismatched photos. When element images conflict with each other, the output tends to drift.

Element 1 Reference section in Kling O3 Std Ref. Vid to Vid — Element references help the model hold onto a specific subject across the clip.

Step 6: Set Video Duration

Use the Video Duration field to choose the target output length offered in the interface.

For most testing, shorter is better. Short clips are easier to control and let you validate your prompt and references before spending more credits on larger runs.

A practical approach is:

Start with a shorter duration
Test your prompt and reference setup
Refine weak areas
Only then generate longer versions

This workflow usually saves time and reduces wasted iterations.

Step 7: Set the Aspect Ratio

The Aspect Ratio menu controls the output frame shape. In many cases, Auto is the easiest place to start because it lets the tool follow the shape of the source material.

Choose a specific ratio when you already know the final destination of the video, such as:

Landscape for presentations or long form YouTube content
Vertical for mobile-first delivery and many social media platforms and shorts
Square for social placements that need centered framing and presentations

If you are unsure, test with Auto first and only force a manual ratio when your delivery format requires it.

How to Choose Between Reference Images and Element Images

This is one of the most important parts of using Kling O3 Std Ref. Vid to Vid well.

Use Reference Images When You Need

General art direction
Color guidance
Composition examples
Mood or lighting influence
Starting-frame inspiration

Use Element Images When You Need

A specific subject to stay recognizable
Better consistency for one featured object or character
More reliable subject appearance across the clip

A useful way to think about it is this: reference images shape the scene, while element images help preserve the subject.

A Simple Starter Workflow

This sequence works well for most first tests:

Upload a clean 3 to 6 second source video
Decide whether to keep the original audio
Add one reference image for overall style
Add element images only if a specific subject must stay consistent
Write a prompt that clearly says what the model should preserve and what it should change
Run a shorter test first
Review the result and adjust one variable at a time

That last point matters. Change one part of the setup between runs so you know what improved the result.

Prompting Tips for Better Results

Prompting works best when you explain the role of the references instead of assuming the model will infer everything.

Useful patterns include:

“Preserve the camera motion from the reference video”
“Use @Image1 as the visual anchor”
“Match the lighting and color palette of @Image2”
“Keep @Element1 consistent across the full clip”
“Use the input video as the timing guide”

Example Prompt Template

You can use a structure like this:

[Subject] + [action] + [style/look] + [what to preserve] + [how references should be used]

Example:

“Create a polished showroom video of the same watch from the reference clip, preserve movement and timing, use @Image1 for lighting direction, and keep Element 1 consistent throughout.”

Click here to Try Kling o3

Common Mistakes to Avoid

Conflicting Inputs

Do not upload a source video that suggests one scene, then add references that push toward a completely different result. Align your prompt, video, and image guidance.

Weak Element Packs

If your element images show different products, angles with poor visibility, or inconsistent lighting, the model will struggle to understand what should remain stable.

Overcomplicated First Runs

Start simple. One source video plus one strong image reference is often better than trying to control everything on the first pass.

Busy or Low-Quality Source Clips

Fast edits, shaky footage, and heavy motion blur reduce the usefulness of the reference video.

Troubleshooting Weak Results

The Subject Keeps Changing

Try adding stronger element references and make sure all images show the same subject clearly from different angles.

The Look Is Not Following the Images

Mention @Image1 or @Image2 directly in the prompt and make the instruction specific, such as lighting, palette, or composition.

The Motion Feels Wrong

Swap in a cleaner source clip with simpler movement and test a shorter generation first.

The Output Feels Inconsistent

Reduce the number of instructions. A smaller number of aligned references often performs better than many mixed signals.

Before-and-after troubleshooting concept for inconsistent AI video generation results — When references conflict, results drift. Cleaner, more aligned inputs usually produce a more stable output.

Final Tips for More Reliable Kling O3 Results

Kling O3 Standard Reference Video to Video works best when you treat it as a guided video workflow, not just a text prompt box. The source video gives the model structure. Reference images guide the overall look. Element images improve consistency for a featured subject.

Start with the strongest single visual idea, keep the prompt clear, and expand only after the base setup is working. That usually leads to cleaner results and more repeatable outputs inside RunDiffusion's Runnit platform.

Try Kling o3 Standard Reference Video to Video

How to Use Kling O3 Standard Reference Video to Video