How to Train Stable Diffusion Model with DreamBooth on Runpod
To learn how to Train Stable Diffusion Model with DreamBooth, you should have already completed the RunPod setup, as detailed in the previous guide.
DreamBooth Training is a step in refining Stable Diffusion Models. It allows you to impart your personal data, making these models not just generative but personalized. They learn to respond to text prompts with contextually rich visual outputs. By integrating DreamBooth into your RunPod environment, you're using computational power to machine learn your images. Each cell in Jupyter Notebook guides you through downloading, configuring, and training these models to fine-tune your model.
Table of Contents:
Boot Up RunPod and Get Started
Assuming your RunPod setup is complete, you can access it via the Pods menu, conveniently located under the Manage section. Within this menu, you should find the "RunPod Fast Stable Diffusion" Pod.
By simply clicking the play button, you can open up Jupyter Notebooks, which serves as the gateway to downloading and configuring Stable Diffusion—essential steps for establishing your training environment for DreamBooth.
Press the Play button to initiate the POD
Upon launching Jupyter Notebook, navigate to the "RNDP-Dreambooth-v1.ipynb" file, and click to open the dedicated notebook for Stable Diffusion. Within this notebook, you'll encounter distinct cells, each serving a specific purpose:
These cells can be categorized as either documentation cells or code cells, with code cells distinguished by their light grey background and square [ ] brackets on the left.
To interact with these cells, you have two options:
Click on the cell and press Shift+Enter or use the Play button located above the notebook. Executing code cells allows you to run the Python code contained within.
For first-time users, some of these cells need to be executed to download and prepare your notebook for the upcoming tasks. While documentation cells primarily display text, code cells carry out essential functions.
Understanding Each Cells
In this guide, we will take a deliberate approach and execute one cell at a time to ensure a thorough understanding of the process.
Executing this cell installs all the necessary dependencies required for the subsequent notebooks.
Shit+Enter to Execute
Download the Model:
In this section, executing the cell defaults to the base Stable Diffusion 1.5 model and downloads it if no specific link is provided. You have three options for downloading your model:
Using a path from HuggingFace
From a Model link (e.g., Civitai.com)
If you don't provide a link, it defaults to the base model.
Using a fine-tuned model from Civitai.com
This step is optional but a popular one. If you prefer utilizing a fine-tuned model on Civitai instead of the SD1.5 base model, you will need to provide a direct link to the model. You can skip this step if you intend to use the base model.
Copy the Model URL
For this exercise, I'm using epiCPhotoGasm model. If you would like to get more details of this model, Click Here.
To obtain the necessary download link, right-click on "Copy Link Address" to acquire the URL, which will appear in this extended format:
However, the extended format of this link may result in errors when input into the Runpod notebook. To address this issue, simply trim the link to retain only the bolded section, which represents the direct model download link and is all that is required.
Trimmed link below:
Model_link = "https://civitai.com/api/download/models/165888"
Execute the cell by pressing Shift+Enter (or clicking the play button), and the model will be downloaded accordingly.
Create/Load a Session:
Here, you must modify the "Session_Name=" to something specific to your fine-tuned model. It's essential to choose a name that reminds you of what this model was trained on and its purpose.
A suggested naming convention is to include:
The Stable Diffusion version
Descriptor (e.g., "No CAPTIONS").
Example: Session_Name= "Name Here" Essentially you want to name it something meaningful to you that will remind you of what this model was trained on and what purpose. In my instance, I aim to choose a name that's easily memorable and recognizable. Therefore, I've opted for the talented Korean actress, Go Youn-Jung, and assigned the token name "gynjng"
As for my Session_Name= "sdv15_gynjng512_gsm".
After executing this cell, you'll encounter the "Choose Image" and "Upload" button. This is where you upload training images from your local file system.
Once you've executed this cell, you will see the:
In this section, you'll upload images specifically prepared for training from your local file system. The process of preparing these images was covered in detail in a previous guide, which you can reference if you haven't already (Link Here). Once you've uploaded your images, it's important to verify their successful upload.
After executing this cell, you can upload your reference images.
Fine-tuning SD with DreamBooth + Captions
When providing the model with a set of images, it may not explicitly understand what to focus on, potentially leading to unintended learning outcomes. Therefore, it's advisable to include a diverse range of backgrounds and, in the case of people, images with varying clothing styles.
However, even with such diversity, the model might pick up on other artifacts within the images.
For instance, if your intention is for the model to exclusively learn a person's facial likeness, you can incorporate captions in the images to provide guidance. This concept is applicable across various subjects, not just limited to people. Whether you're training a model for objects or any other imagery, these considerations remain equally important.
Manual Caption and Concept Captioning can be a bit complex, warranting a separate guide for a more in-depth understanding. However, here's a brief overview of their functions.
For a comprehensive guide on Captions, Click Here.
Manual Caption: This cell deals with image captioning, an essential component of the training process that we will cover on. Concept Caption: An advanced topic that we'll explore in another session.
Executing this cell involves utilizing Dreambooth to create a fine-tuned model based on your input images. While there are several configuration parameters available, it's advisable to begin by experimenting with just a few of them.
Resume_Training= False: One important parameter is "Resume Training," which enables you to continue training on an existing model. However, it's recommended to keep this setting as "false" initially, especially if you're still getting acquainted with the process.
Below, there are parameters related to training different components of Stable Diffusion, namely Unet and Text Encoder. Each of these components has associated values for training steps and learning rate.
For the learning rate, it's generally recommended to stick with the default values, as they have proven effective in many scenarios. However, when it comes to the number of Unet Training Steps, it's advised to calculate this based on the number of images in your training dataset. A common guideline is to use approximately 100 steps for each image in your training set.
I am using 64 images of Go Younjung cropped at 512x512. Download it here.
(64 images x 100 Unet Training Steps = 6400).
Keep in mind that the number of training steps is proportional to training time, so for larger datasets, you may opt for 60-80 steps per image rather than 100.
Additionally, remember that longer training doesn't always guarantee better results, as there's typically an optimal duration for training. These are general guidelines, so experimentation is key to finding what works best for your specific case.
Text Encoder Training Steps: A common choice is 350. While there may not be extensive documentation on some of these parameters, this value has proven effective in various use cases. It's worth noting that some references suggest setting Text Encoder Training Steps to roughly 30 percent of the number of Unet Training Steps. However, starting with 350 as a value is a good point of reference.
Save Your Training Model Incrementally
Save_Checkpoint_Every_n_Steps= False This feature allows you to periodically save your model based on the number of Unet training steps completed.
If you enable this setting (set it to "True"), it will automatically save a fine-tuned version of the model every 500 steps (Based on the designated steps). This means you'll have multiple models available, each trained for different durations. This flexibility allows you to experiment with various models to assess their performance.
However, it's important to note that each of these saved models will consume approximately 2 gigabytes of storage space. Given the limited disk space available on RunPod, this is a consideration worth keeping in mind.
In Summary, based on 64 images, your settings should be:
As for "External Caption," we will touch on this aspect later in the process.
PRESS SHIFT + ENTER, EXECUTE and TRAIN
Shift+Enter to Execute. Or press the Play button.
After executing the notebook training, it will commence with the text encoder training and then proceed to the Unet training phase. This process typically takes a few minutes, and upon completion, you'll receive a message in the console indicating that the model has been successfully created.
At this point, you can navigate to the workspace:
Fast-Dreambooth folder > Sessions Folder > Your Fine-Tuned Model Folder
Here, you will find your fine-tuned model ready for use.
You can simply right-click on the model and download it to your local system, making it available for use in your Stable Diffusion WebUI.
Right Click to Download Fine-Tuned Model
Test Your Trained Model
In the "Test the Trained Model" work cell, simply press Shift+Enter to execute it, which will launch the Stable Diffusion WebUI Automatic1111. From there, you can thoroughly test the model using the trained token.
Once you've finished testing and are satisfied with the results, you can proceed to download the fine-tuned model. You can accomplish this by downloading the model and placing it in the "models > Stable-diffusion" folder.
Launch WebUI by Clicking the Link
Testing with Automatic1111 WebUI
Upload The Trained Model to Hugging Face
I will come back to this later.
Free Up Space
In this final cell, you have the option to free up space by tidying up some of the assets you've generated. Deleting them manually through the folders isn't possible in the Notebook and will result in a failure message.
However, you can use the code cell designed for freeing up space to accomplish this task. Upon execution, it will present you with a list of sessions, allowing you to specify which sessions you wish to remove from your workspace.
We've successfully demonstrated the process of fine-tuning a Stable Diffusion model using a limited set of images, in this case, featuring Go Youn-Jung. As a result, our model is now capable of generating images resembling her likeness. This achievement highlights the remarkable versatility of Stable Diffusion, which can create images of various subjects and objects worldwide.
However, what sets this process apart is its ability to generate images of highly personalized or previously unknown subjects. Whether it's replicating other individuals, objects, landscapes, or anything unique to you, Stable Diffusion offers a level of personalization that extends beyond its general capabilities. While Stable Diffusion excels at generating a wide array of images, it may struggle to capture the likeness of something deeply personal to you.
In such cases, crafting a text prompt that enables the generation of these specific images may prove challenging.