In our prior guide, we learned how to set-up DreamBooth for training and even began the training process. However, one critical aspect we didn't explore is how to utilize captions to instruct Stable Diffusion on what to focus on in the images.
Training Stable Diffusion with DreamBooth + Captions
When fine-tuning in DreamBooth, captions play an important role in informing Stable Diffusion about what we aim to achieve. Captions help guide the model's learning process, especially when it comes to refining specific aspects of an image. While providing a set of images to the model, it doesn't inherently understand what to focus on or what we intend to teach it.
Without clear guidance, the model may start learning unintended details from the images. For instance, if you're training a model to capture the likeness of a person's face, it's essential to ensure that the model understands this specific objective. Captions become valuable in such cases, as they serve as explicit instructions for the model. They help convey what elements are significant and what should be emphasized during the training process.
For instance, if all the images of a person feature them wearing a jacket, you can add captions indicating this fact. The captions can specify that the person is wearing a jacket, emphasizing this aspect of their appearance. Similarly, if you want to capture variations like straight hair or a ponytail, mentioning these details in the captions can guide the model.
However, it's equally important to exercise discretion when using captions. For aspects that are part of a person's regular appearance, such as eye glasses for someone who always wears them, captions may not be necessary. In such cases, it's better to focus on unique or specific characteristics that you want the model to learn.
These principles extend beyond just images of people. Whether you're training a model to generate images of objects, landscapes, or any other subject matter and elements, 'captions' can help make sure that the model's learning aligns with your intentions.
With this understanding, let's learn how to effectively use DreamBooth Captions in training our model. We'll start by utilizing the model listed below, or you can continue from where we left off in the previous guide.
Since this exercise primarily focuses on the use of captions, I've conducted thorough training experiments both with and without captions. Surprisingly, I've observed that not using captions often yields superior results, especially when training on images of people. It enhances the realism of the generated content. I find that captions is necessary for fine-tuning, not teaching a style, or for teaching a subject. In the case of this guide, we're fine-tuning.
If you're interested by in fine-tuning captions and wish to continue learning, you can certainly continue to explore their nuances and potential benefits in more detail below.
Learn How to Use Manual Captions to Shape Go Younjung into Your Ideal AI Girlfriend
1. Download the Model:
For this demonstration, we'll be working with CyberRealistic V3.3 as our base model. You can access it via the following link: Model_Link = "https://civitai.com/api/download/models/138176"
2. Download Reference Images:
Download Reference Images Here to get started.
From this point onward, our focus will be on discussing the various images featuring Go Younjung or for those who are Alchemy of Souls fans, Naksu!
3. Activate Runpod:
Go to your Runpod and access your list of Pods. Open the POD you created during your previous training session using the RNPD-Dreamboothv1.ipynb notebook. You can refer to the previous guide for more information Previous Guide Here.
(Keep in mind that if you stop your Pod sessions, you might need to re-execute the cells at the top of the notebook from the beginning.)
4. Execute Dependencies and Download Model
Once you've re-executed the Dependencies and Downloaded the Model (Either Click the Play Button or Press Shift+Enter to Execute), navigate down to the Create/Load a Session section. Since our aim is to train a new model, select a name that holds personal meaning and is easy to recall.
In my case, I'll choose "sd15-gynjng-CR33-styles," which is a shortened version of "StableDiffusion1.5-Goyounjung-CyberRealistic V3.3-styles."
Session_Name = "sd15-gynjng-CR33-styles"
5. Upload Your Images:
Execute the Instance Images cell, upload your images, and then click on the Upload button.
6. Write Your Captions:
Execute the Manual Captioning cell to begin adding captions to your images.
Here's how it functions:
When you select an image file, it will display the images on the far-right side box, allowing you to input captions in the text box. For each image, describe its contents in the caption. It's crucial to include your unique token for subject identification within each caption. In our case, our unique caption for Go Younjung photos is "gynjng," so ensure that it appears in every description.
After composing captions for all images, remember to click the save button.
Write your caption with the unique token, and press Save.
If every image of Go Younjung displayed her wearing a jacket, I could enhance the photos with captions, explicitly stating her attire, such as "Go Younjung wearing a jacket." Similarly, if her hairstyle varied between straight hair and a ponytail, I'd mention this in the captions to provide clarity.
For instance, consider a scenario where a person usually wears eyeglasses. In this case, there's no need to include a mention of the glasses in the caption because it's a typical feature of their appearance. However, if the individual doesn't typically wear glasses, you can select an image where they aren't wearing glasses and indicate "No Glasses" in the caption. This ensures that when you generate an image, it accurately reflects whether the person is wearing glasses or not.
7. Once you've named your captions, you can confirm their presence by accessing the captions folder. Navigate to Fast-Dreambooth > Sessions > sd15-gynjng-CR33-styles (or whatever your session name was) > Captions. Here, you'll find all the .txt files that were generated when you wrote the captions for each image during this process. Click on a few of these files to ensure that the captions you entered are present.
Verify that Captions are Saved.
8. In the DreamBooth cell, adjust the settings accordingly based on your dataset. For instance, if you're using 81 images, you would set the Unet Training Steps to 8100. Modify the Text Encoder Training Steps to 350. Also, change External_Captions from false to True. This tells the training process to utilize the caption files.
With these adjustments made, execute this cell to initiate the training process, which should take several minutes.
Settings based on 81 images. (81 images x 100 steps)
After training completion, you can navigate to your Sessions folder to verify that your model has been successfully created.
Navigate to Fast-Dreambooth > Sessions > sd15-gynjng-CR33-styles As a reminder, you can always download the trained model to be used locally on your pc as well.
External_Captions = True to train with captions.
9. Test the Trained Model:
Re-execute the 'Test the Trained Model' cell to generate a new link by pressing Shift+Enter.
Click on this link to access the Stable Diffusion WebUI, where you can test your model.
Remember that the trained token is "gynjng," so be sure to include it in your prompt to generate images of Go Younjung. Trained token: gynjng
Click on the link to open the Stable Diffusion WebUI
Is it possible to achieve desired results solely by adjusting your prompts without incorporating captions into the training process?
The answer is yes, to some extent.
Instead of utilizing captions, you can modify your prompts, for example, by adding "not smiling," and this approach will yield results to a certain degree. However, it's generally more effective to take a precise approach by curating your image selection and leveraging captions to provide clearer guidance during the training process.
In essence, using captions allows you to guide the model with greater precision, helping it focus on the specific aspects you intend to capture while training. Captions provide explicit instructions, enabling the model to understand what to emphasize and what to disregard. This can be especially beneficial in situations where you want the model to ignore unwanted artifacts or details in the training images.
Consider a scenario where you have a substantial number of images of a person, with over half of them featuring the person wearing eyeglasses. If your goal is to have the model learn the person's likeness without associating it with eyeglasses, adding captions becomes invaluable. By indicating in the captions that the person is wearing eyeglasses in certain images, you can steer the model toward developing a likeness that excludes this specific attribute.
It's worth noting that starting with a diverse range of high-quality images is essential. Begin by training a baseline model without captions to assess its performance. Then, as you fine-tune your model and require more precise control, consider introducing image captions to enhance the guidance provided during the training process. Ultimately, the choice between using prompts or captions depends on your specific objectives and the level of detail and control you seek in your AI model.