How to Prepare a Dataset for Model Training on RunDiffusion

Preparing a dataset for model training is one of the most important steps in getting good results from your model. Whether you’re using Kohya, Flux Gym, or the Runnit Trainer, your dataset must be clean and well structured.

This guide walks through the full process of preparing your dataset correctly so your LoRA model trains efficiently and performs reliably.

Define Your Training Goal

Before curating your dataset, know what you're teaching the model:

Identity (Face): 20 to 40 high-quality images with varied angles and expressions.
Outfit/Clothing: 30 to 50 images showcasing the clothing item. Keep the subject centered.
Object or Logo: 10 to 25 images with clean framing and minimal occlusion.
Artistic Style: 20 to 30 images with diverse content but consistent style.

These are starting guidelines not rules. Strong results depend more on quality and consistency than volume.

Match the Image Resolution to Your Trainer

Choose your image size based on which RunDiffusion service you're using:

SDXL: Use 1024×1024 images
Flux: Use 512×512 images
Runnit Trainer: Use 512×512 images

Resize all images to the correct dimensions before uploading. Avoid mixing resolutions within a dataset. The Runnit Trainer will automatically resize images for you.

Maintain Subject Focus

Your dataset should represent a single concept:

Only one person, object, or artistic style per dataset
No mixed-character images
Keep the subject clearly visible and unobstructed
Avoid images with overlays, stickers, or watermarks

Clarity and consistency of the subject will make your LoRA more effective and more responsive to prompts.

Use Varied Backgrounds for Better Generalization

Unlike older training advice that suggested a plain background, modern LoRA training benefits from environmental variety. Background variation teaches the model to isolate the subject, rather than memorizing a fixed environment.

Use:

Indoor and outdoor backgrounds
Real-world environments (rooms, streets, nature)
Moderate variety in lighting direction and scene depth

Avoid:

Overly busy or cluttered scenes
Repeating the exact same backdrop
Scenes where the subject blends into the background

As long as the subject is clearly visible, a varied background will enhance your LoRA’s flexibility.

Ensure High Image Quality

Every image in your dataset should be:

Sharp and in focus
Well-lit with neutral or soft lighting
Free from compression artifacts or filters
Cropped properly, without borders or UI overlays
Not stretched, pixelated, or watermarked

Avoid screenshots, filtered selfies, or low-resolution uploads.

Include Controlled Variation

Introduce some variation to help the model generalize:

Different angles (front, side, three-quarter)
Slight changes in pose or expression
Moderate lighting shifts

Avoid:

Wildly inconsistent scenes
Drastic costume or subject changes
Mixing art styles unless the style itself is the target

The goal is controlled diversity not randomness.

Remove Duplicates and Irrelevant Images

Before uploading:

Delete duplicate or near-duplicate images
Remove blurry or off-topic content
Avoid meme templates and collages
Make sure the subject remains centered and recognizable across all images

Better to have 25 good images than 75 inconsistent ones.

Use Captions When Required

Captions are not optional for every training method. Here's what you need:

Kohya: Captions are required
Flux Gym: Captions are required
Runnit Trainer: Captions are not needed

How to Caption

Paired .txt Files
Each image has a matching .txt file with a clear caption saved inside the txt file.
Example:

image_01.jpg
image_01.txt

The caption inside image_o1.txt may read "a photo of a Anna23 in a red coat, standing, midshot"

What Makes a Good SDXL Caption?

SDXL uses a richer and more compositional prompt vocabulary than previous models like SD1.5. That means your captions should include:

A clear subject token or custom string
Scene or composition context (e.g. portrait, full-body, close-up)
Lighting or environmental cues
Optional style or camera descriptors
Natural language not keyword soup

Example: Proper SDXL Caption

portrait of bb393, soft lighting, shallow depth of field, realistic skin texture, high-resolution photo, neutral background

More SDXL Caption Examples

Style Training Example:
illustration of Leonard a knight in a fantasy forest, glowing light effects, intricate armor design, painterly style

Fashion Training Example:
full-body photo of a woman wearing a red trench coat, standing on a city street, golden hour lighting, 85mm lens

Sci-Fi Character Example:
hyper-realistic close-up of Alf39, robotic eye, dramatic shadows, studio lighting, ultra-sharp focus

SDXL Captioning Tips

Be descriptive, but not excessive
Avoid vague terms be specific
Stay consistent with your subject token across all captions
Don’t include text that won’t be part of your final prompting vocabulary
Write each caption like it could be used directly in a generation prompt

What makes a good Flux Caption?

Flux models excel when trained with natural-language prompts that read like real world generation requests not tag lists, and not overly compressed. Captions should describe what is in the image as you'd prompt it, using clear, complete phrases.

Flux is trained on full sentence data, so your captions should be longer, descriptive, and visually grounded.

Flux Captioning Style

Natural phrasing, not tokenized keyword lists
Usually 12–30 words
Full concepts: subject, setting, action, style (if relevant)
Avoid excessive camera jargon (unlike SDXL)
Use sentence like or phrase like structure

Examples of Flux Captions

Character LoRA (identity example):
a close-up portrait of person_xyz with short black hair and glasses smiling gently wearing a green sweater standing indoors in front of a white bookshelf filled with books soft natural lighting from the left

Fashion LoRA (outfit-focused):
a full-body photo of a woman wearing a bright yellow raincoat dark blue jeans and black ankle boots walking down a rainy city sidewalk with blurred street lights in the background captured in the early evening with soft reflections on the pavement

Stylized Portrait LoRA:
a fantasy style illustration of a warrior with silver armor and glowing blue eyes standing in a snowy forest at dusk surrounded by falling snowflakes with dramatic lighting casting long shadows concept art, digital painting style

Object LoRA (product or prop):
a top-down photo of a white ceramic coffee mug with a curved handle sitting on a rustic wooden table next to an open hardcover book and a pair of reading glasses morning sunlight streaming in from a nearby window

Tips for Writing Strong Flux Captions

Focus on what you see not abstract tags
Describe the subject’s appearance, pose, and scene naturally
Use modifiers like in front of, standing on, wearing, next to
Maintain a consistent structure across your dataset
Include a unique name or token (e.g., person_xyz) if you’re training a specific concept

What to Avoid

Token chains or comma lists (e.g., man, suit, city, walking)
Redundant or unnatural phrasing
Overuse of SDXL style tokens like 85mm lens, studio lighting, unless visually relevant
Writing full novels keep it focused and clear

Clean File Naming and Structure

Organize your dataset with readable, consistent filenames. Use lowercase letters, underscores, and sequential numbers. While this is not required it is a good practice.

Good examples:
woman_red_dress_01.jpg
woman_red_dress_01.txt

woman_red_dress_02.jpg
woman_red_dress_02.txt