Hello and welcome to Prompting 101! Today we will be discussing different techniques to aid you in your journey of becoming a profient prompter within Stable Diffusion.
Stable Diffusion is a cutting-edge image synthesis technology that employs a fusion of AI models and advanced image generation techniques. By providing a text prompt as input, the system generates remarkably realistic images that align with the description, enabling users to craft bespoke visuals tailored to diverse applications.
The effectiveness of the prompt significantly impacts the image quality generated by Stable Diffusion. A carefully crafted prompt can guide the AI model to produce desired results, while a vague or unclear prompt may yield unexpected or unsatisfactory outcomes. As such, mastering the art of crafting effective prompts is pivotal in maximizing the potential of Stable Diffusion technology.
An effective Stable Diffusion prompt should be:
- Clear and specific: Provide detailed descriptions of the subject and scene to help the AI model generate accurate images.
- Concise: Use brief language and avoid unnecessary words that may confuse the model or dilute the intended meaning.
- Relevant: Include relevant keywords and phrases related to the subject and scene.
- Unambiguous: Avoid ambiguous words or phrases that can have multiple interpretations.
To control the variation in the images generated by Stable Diffusion, you can:
- Add more detail to your prompt: By providing more specific descriptions, you can narrow down the possible interpretations of your prompt and reduce the variation in the generated images.
- Limit the number of keywords: Using fewer keywords can help to focus the AI model on a smaller set of possibilities, reducing the variation in the generated images.
Managing Association Effects in Stable Diffusion
Association effects arise when certain attributes or elements are highly correlated in the AI model's comprehension, leading to unintended outcomes in the generated images. To effectively manage association effects:
- Be mindful of common associations: For instance, attributes like ethnicity and eye color may be strongly linked in the model's understanding. Consider this when crafting prompts and plan accordingly to avoid unintended results.
- Exercise caution with celebrity or artist names: These names can carry implicit associations with specific poses, outfits, or styles, which may inadvertently influence the generated images. Use them judiciously and be aware of potential biases.
- Test and adjust prompts: Thoroughly evaluate your prompts to identify any unintended association effects. If needed, make necessary adjustments to the prompts to achieve the desired image results without unintended biases.
Example prompt:
Prompting is nothing short of an art and requires a lot of research/trial and error. However you generally want to layout your prompt as follows.
(Subject) This is your main objective. "A vast scenery of an unexplored planet", "A fluffy white rabbit", "1967 Ford Mustang crossing the checkered flag".
A common mistake is not writing enough about the subjects. This includes adding supporting details that help describe the "subject". Each descriptor will be separated with a comma and space.
Example - A fluffy white rabbit, white fur, eating a carrot, standing beneath a tree"
(Medium) This is the material used to make artwork. Some examples are illustration, oil painting, 3D rendering, and photography. Medium has a strong effect because one keyword alone can dramatically change the style.
Example - A fluffy white rabbit, white fur, eating a carrot, standing beneath a tree, Close up portrait, photorealism
(Style) refers to the artistic style of the image. Examples include impressionist, surrealist, pop art, etc.
Example - A fluffy white rabbit, white fur, eating a carrot, standing beneath a tree, close up portrait, Modern art
(Resolution/supporting descriptors/lighting) is the second to last details we want to add to our prompt. These include all supporting details to really bring your image to life!
Example - A fluffy white rabbit, white fur, eating a carrot, standing beneath a tree, close up portrait, Modern art, highly detailed, highly accurate, masterpiece, unreal engine, powerful imagery, studio lighting, 8k UHD, HDR
(Artist names) are strong modifiers. They allow you to dial in the exact style using a particular artist as a reference. It is also common to use multiple artist names to blend their styles.
Example - A fluffy white rabbit, white fur, eating a carrot, standing beneath a tree, close up portrait, Modern art, highly detailed, highly accurate, masterpiece, unreal engine, powerful imagery, studio lighting, 8k UHD, HDR by example artist
Prompt Matrix
When using multiple prompts in Stable Diffusion, separate them using the "|" character. The system will generate images for every possible combination of prompts. For example, if you input "Deserted town in the midwest|Digital painting|studio lighting lighting" as prompts, four combinations of prompts are possible:
- "Deserted town in the midwest"
- "Deserted town in the midwest, digital painting"
- "Deserted town in the midwest, studio lighting"
- "Deserted town in the midwest, digital painting, studio lighting"
This allows you to experiment with different prompts and combinations to generate a diverse range of images that align with your creative vision in Stable Diffusion.
BREAK keyword
Adding a BREAK
keyword (must be uppercase) fills the current chunks with padding characters. Adding more text after BREAK
text will start a new chunk.
Attention/emphasis
Employing parentheses () in the prompt amplifies the model's focus on the enclosed words, while square brackets [] diminish it. It is also possible to combine multiple modifiers for enhanced customization:
Examples:
- "(Sunny day) at the beach"
- "A [small] puppy playing in the park"
- "(Gloomy weather) in a forest"
- "An [enormous] mountain peak"
- "(Peaceful) sunset over the lake"
By strategically incorporating parentheses and square brackets, you can fine-tune the model's attention and emphasis on specific elements in your prompts for more precise image generation in Stable Diffusion.
Examples:
a (word)
- increase attention toword
by a factor of 1.1a ((word))
- increase attention toword
by a factor of 1.21 (= 1.1 * 1.1)a [word]
- decrease attention toword
by a factor of 1.1a (word:1.5)
- increase attention toword
by a factor of 1.5a (word:0.25)
- decrease attention toword
by a factor of 4 (= 1 / 0.25)a \(word\)
- use literal()
characters in prompt
Styles
Press the "Save prompt as style" button to write your current prompt to styles.csv, the file with a collection of styles. A dropbox to the right of the prompt will allow you to choose any style out of previously saved, and automatically append it to your input. To delete a style, manually delete it from styles.csv and restart the program.
if you use the special string {prompt}
in your style, it will substitute anything currently in the prompt into that position, rather than appending the style to your prompt.
Alternating Words
Convenient Syntax for swapping every other step.
(man|duck)
On step 1, prompt is "man." Step 2 is "duck." Step 3 is "man" and so on.
This results in a morphing style in which this prompt would be half man, half duck.
Hires. fix
A convenience option to partially render your image at a lower resolution, upscale it, and then add details at a high resolution. By default, txt2img makes horrible images at very high resolutions, and this makes it possible to avoid using the small picture's composition. Enabled by checking the "Hires. fix" checkbox on the txt2img page.
Small picture is rendered at whatever resolution you set using width/height sliders. Large picture's dimensions are controlled by three sliders: "Scale by" multiplier (Hires upscale), "Resize width to" and/or "Resize height to" (Hires resize).
- If "Resize width to" and "Resize height to" are 0, "Scale by" is used.
- If "Resize width to" is 0, "Resize height to" is calculated from width and height.
- If "Resize height to" is 0, "Resize width to" is calculated from width and height.
- If both "Resize width to" and "Resize height to" are non-zero, image is upscaled to be at least those dimensions, and some parts are cropped.
Face Restoration
In the picture editing feature of Stable Diffusion, you have the option to enhance faces using either GFPGAN or CodeFormer. Each tab contains a checkbox that enables face restoration, and there is also a separate tab dedicated solely to face restoration on any picture. In this tab, you can adjust the visibility of the effect using a slider. Additionally, you can choose between GFPGAN and CodeFormer as the preferred method in the settings of Stable Diffusion. This flexibility allows you to selectively and effectively improve the appearance of faces in pictures according to your preferences and desired level of enhancement.