After going through some of the playbooks, I wanted to experiment further with fine-tuning on the DGX Spark. I came across SimpleTuner and what was supposed to be a quick test turned into many hours trying to get it to work, i.e. the usual friction with the amd64 architecture and getting OpenCV to build against CUDA 13.0.
Since I’ve already spent the hours fixing the problems I ran into, I packaged everything into a Docker-based workflow so others can leverage them if they want. You can find the repo here: https://github.com/provos/dgx-spark-fine-tuning-workflow
It includes tools to download regularization images, image captioning, fine-tuning and inference.
Hope this saves someone some time :)
Ps: With my current settings (2000 steps, LoRA rank 256, Prodigy optimizer, gradient accumulation of 2), training takes about 10 hours. I noticed the official NVIDIA Dreambooth example runs in about 4 hours but uses a gradient accumulation of 6. Not quite sure about the discrepancy.
2000 steps with gradient accumulation of 2 took about 10 hours. The dreambooth script that was one of the example playbooks from Nvidia took about ~4 hours with gradient accumulation of 6. I don’t know what the step equivalent would be. That said here is an image at step 1500 for my cat/dog example. Prior steps looked pretty good already, too. So, if you do 1000 steps with gradient accumulation of 1 you might get decent results in a quarter of the time, i.e. 2.5 hours.
Thank you @provos . I really appreciate your work. Can you please tell what is your efficiency during inference? How long does it take to generate 1 image? Thank you!