Preparing a dataset for model training is one of the most important steps in getting good results from your model. Whether you’re using Kohya, Flux Gym, or the Runnit Trainer, your dataset must be clean and well structured.
This guide walks through the full process of preparing your dataset correctly so your LoRA model trains efficiently and performs reliably.
Define Your Training Goal
Before curating your dataset, know what you're teaching the model:
Identity (Face): 20 to 40 high-quality images with varied angles and expressions.
Outfit/Clothing: 30 to 50 images showcasing the clothing item. Keep the subject centered.
Object or Logo: 10 to 25 images with clean framing and minimal occlusion.
Artistic Style: 20 to 30 images with diverse content but consistent style.
These are starting guidelines not rules. Strong results depend more on quality and consistency than volume.
Match the Image Resolution to Your Trainer
Choose your image size based on which RunDiffusion service you're using:
SDXL: Use 1024×1024 images
Flux: Use 512×512 images
Runnit Trainer: Use 512×512 images
Resize all images to the correct dimensions before uploading. Avoid mixing resolutions within a dataset. The Runnit Trainer will automatically resize images for you.
Maintain Subject Focus
Your dataset should represent a single concept:
- Only one person, object, or artistic style per dataset
- No mixed-character images
- Keep the subject clearly visible and unobstructed
- Avoid images with overlays, stickers, or watermarks
Clarity and consistency of the subject will make your LoRA more effective and more responsive to prompts.
Use Varied Backgrounds for Better Generalization
Unlike older training advice that suggested a plain background, modern LoRA training benefits from environmental variety. Background variation teaches the model to isolate the subject, rather than memorizing a fixed environment.
Use:
- Indoor and outdoor backgrounds
- Real-world environments (rooms, streets, nature)
- Moderate variety in lighting direction and scene depth
Avoid:
- Overly busy or cluttered scenes
- Repeating the exact same backdrop
- Scenes where the subject blends into the background
As long as the subject is clearly visible, a varied background will enhance your LoRA’s flexibility.
Ensure High Image Quality
Every image in your dataset should be:
- Sharp and in focus
- Well-lit with neutral or soft lighting
- Free from compression artifacts or filters
- Cropped properly, without borders or UI overlays
- Not stretched, pixelated, or watermarked
Avoid screenshots, filtered selfies, or low-resolution uploads.
Include Controlled Variation
Introduce some variation to help the model generalize:
- Different angles (front, side, three-quarter)
- Slight changes in pose or expression
- Moderate lighting shifts
Avoid:
- Wildly inconsistent scenes
- Drastic costume or subject changes
- Mixing art styles unless the style itself is the target
The goal is controlled diversity not randomness.
Remove Duplicates and Irrelevant Images
Before uploading:
- Delete duplicate or near-duplicate images
- Remove blurry or off-topic content
- Avoid meme templates and collages
- Make sure the subject remains centered and recognizable across all images
Better to have 25 good images than 75 inconsistent ones.
Use Captions When Required
Captions are not optional for every training method. Here's what you need:
Kohya: Captions are required
Flux Gym: Captions are required
Runnit Trainer: Captions are not needed
How to Caption
Paired .txt
Files
Each image has a matching .txt
file with a clear caption saved inside the txt file.
Example:
- image_01.jpg
- image_01.txt
The caption inside image_o1.txt may read "a photo of a Anna23 in a red coat, standing, midshot"
What Makes a Good SDXL Caption?
SDXL uses a richer and more compositional prompt vocabulary than previous models like SD1.5. That means your captions should include:
- A clear subject token or custom string
- Scene or composition context (e.g. portrait, full-body, close-up)
- Lighting or environmental cues
- Optional style or camera descriptors
- Natural language not keyword soup
Example: Proper SDXL Caption
portrait of bb393, soft lighting, shallow depth of field, realistic skin texture, high-resolution photo, neutral background
More SDXL Caption Examples
Style Training Example:illustration of Leonard a knight in a fantasy forest, glowing light effects, intricate armor design, painterly style
Fashion Training Example:full-body photo of a woman wearing a red trench coat, standing on a city street, golden hour lighting, 85mm lens
Sci-Fi Character Example:hyper-realistic close-up of Alf39, robotic eye, dramatic shadows, studio lighting, ultra-sharp focus
SDXL Captioning Tips
- Be descriptive, but not excessive
- Avoid vague terms be specific
- Stay consistent with your subject token across all captions
- Don’t include text that won’t be part of your final prompting vocabulary
- Write each caption like it could be used directly in a generation prompt
What makes a good Flux Caption?
Flux models excel when trained with natural-language prompts that read like real world generation requests not tag lists, and not overly compressed. Captions should describe what is in the image as you'd prompt it, using clear, complete phrases.
Flux is trained on full sentence data, so your captions should be longer, descriptive, and visually grounded.
Flux Captioning Style
- Natural phrasing, not tokenized keyword lists
- Usually 12–30 words
- Full concepts: subject, setting, action, style (if relevant)
- Avoid excessive camera jargon (unlike SDXL)
- Use sentence like or phrase like structure
Examples of Flux Captions
Character LoRA (identity example):a close-up portrait of person_xyz with short black hair and glasses smiling gently wearing a green sweater standing indoors in front of a white bookshelf filled with books soft natural lighting from the left
Fashion LoRA (outfit-focused):a full-body photo of a woman wearing a bright yellow raincoat dark blue jeans and black ankle boots walking down a rainy city sidewalk with blurred street lights in the background captured in the early evening with soft reflections on the pavement
Stylized Portrait LoRA:a fantasy style illustration of a warrior with silver armor and glowing blue eyes standing in a snowy forest at dusk surrounded by falling snowflakes with dramatic lighting casting long shadows concept art, digital painting style
Object LoRA (product or prop):a top-down photo of a white ceramic coffee mug with a curved handle sitting on a rustic wooden table next to an open hardcover book and a pair of reading glasses morning sunlight streaming in from a nearby window
Tips for Writing Strong Flux Captions
- Focus on what you see not abstract tags
- Describe the subject’s appearance, pose, and scene naturally
- Use modifiers like
in front of
,standing on
,wearing
,next to
- Maintain a consistent structure across your dataset
- Include a unique name or token (e.g.,
person_xyz
) if you’re training a specific concept
What to Avoid
- Token chains or comma lists (e.g.,
man, suit, city, walking
) - Redundant or unnatural phrasing
- Overuse of SDXL style tokens like
85mm lens
,studio lighting
, unless visually relevant - Writing full novels keep it focused and clear
Clean File Naming and Structure
Organize your dataset with readable, consistent filenames. Use lowercase letters, underscores, and sequential numbers. While this is not required it is a good practice.
Good examples:
woman_red_dress_01.jpg
woman_red_dress_01.txt
woman_red_dress_02.jpg
woman_red_dress_02.txt