[SUPPORT] Workbench Example Project: SDXL Customization

@edwli, Thank you for your great help, i try to change the requirements.txt file, but does not find the numpy, why did you say downgrade?
After i add the numpy==1.23.1 to the requirements.txt file, and I rebuild the project and there are errors occurred:

19 74.74 Requirement already satisfied: tornado>=6.1 in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (6.3.3)

#19 74.74 Requirement already satisfied: pyzmq>=17 in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (25.1.1)

#19 74.74 Requirement already satisfied: argon2-cffi in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (23.1.0)

#19 74.74 Requirement already satisfied: jupyter-core>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (5.3.1)

#19 74.74 Requirement already satisfied: jupyter-client>=5.3.4 in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (8.3.1)

#19 74.74 Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (0.2.0)

#19 74.74 Requirement already satisfied: nbformat in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (5.9.2)

#19 74.74 Requirement already satisfied: nbconvert>=5 in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (7.8.0)

#19 74.75 Requirement already satisfied: nest-asyncio>=1.5 in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (1.5.7)

#19 74.75 Requirement already satisfied: ipykernel in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (6.25.2)

#19 74.75 Requirement already satisfied: Send2Trash>=1.8.0 in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (1.8.2)

#19 74.75 Requirement already satisfied: terminado>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (0.17.1)

#19 74.75 Requirement already satisfied: prometheus-client in /usr/local/lib/python3.10/dist-packages (from notebook==6.4.10->-r /opt/project/build/requirements.txt (line 8)) (0.17.1)

#19 77.50 Collecting torch>=1.10.0 (from accelerate==1.0.1->-r /opt/project/build/requirements.txt (line 5))

#19 77.50 Obtaining dependency information for torch>=1.10.0 from https://files.pythonhosted.org/packages/6d/13/b5e8bacd980b2195f8a1741ce11cbb9146568607795d5e4ff510dcff1064/torch-2.1.0-cp310-cp310-manylinux1_x86_64.whl.metadata

#19 77.76 Downloading torch-2.1.0-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)

#19 77.81 INFO: pip is looking at multiple versions of scipy to determine which version is compatible with other requirements. This could take a while.

#19 77.82 ERROR: Cannot install -r /opt/project/build/requirements.txt (line 11), -r /opt/project/build/requirements.txt (line 2), -r /opt/project/build/requirements.txt (line 3), -r /opt/project/build/requirements.txt (line 5), -r /opt/project/build/requirements.txt (line 9) and numpy==1.23.1 because these package versions have conflicting dependencies.

#19 77.82

#19 77.82 The conflict is caused by:

#19 77.82 The user requested numpy==1.23.1

#19 77.82 transformers 4.40.2 depends on numpy>=1.17

#19 77.82 diffusers 0.28.0 depends on numpy
#19 77.82 accelerate 1.0.1 depends on numpy<3.0.0 and >=1.17
#19 77.82 torchvision 0.16.0 depends on numpy
#19 77.82 scipy 1.14.1 depends on numpy<2.3 and >=1.23.5
#19 77.82
#19 77.82 To fix this you could try to:
#19 77.82 1. loosen the range of package versions you’ve specified
#19 77.82 2. remove package versions to allow pip attempt to solve the dependency conflict
#19 77.82
#19 77.82 ERROR: ResolutionImpossible: for help visit Dependency Resolution - pip documentation v25.1.dev0
#19 81.66
#19 81.66 [notice] A new release of pip is available: 23.2.1 → 25.0
#19 81.66 [notice] To update, run: python -m pip install --upgrade pip
#19 ERROR: process “/bin/bash -c pip install --user -r /opt/project/build/requirements.txt” did not complete successfully: exit code: 1

[14/17] RUN pip install --user -r /opt/project/build/requirements.txt:
77.82 scipy 1.14.1 depends on numpy<2.3 and >=1.23.5
77.82
77.82 To fix this you could try to:
77.82 1. loosen the range of package versions you’ve specified
77.82 2. remove package versions to allow pip attempt to solve the dependency conflict
77.82
77.82 ERROR: ResolutionImpossible: for help visit Dependency Resolution - pip documentation v25.1.dev0
81.66
81.66 [notice] A new release of pip is available: 23.2.1 → 25.0
81.66 [notice] To update, run: python -m pip install --upgrade pip

Containerfile:47

46 |
47 | >>> RUN pip install --user
48 | >>> -r /opt/project/build/requirements.txt

49
ERROR: failed to solve: process “/bin/bash -c pip install --user -r /opt/project/build/requirements.txt” did not complete successfully: exit code: 1

So sorry for the delay. Got access to the repo again and just pushed an update that should resolve the issue. Version bump across the board.

5/7/2025

Fix build issue on mac

6/9/2025

Bug fixes, pip package version updates

6/12/2025

Fixed issue on mac

Running on a Jetson AGX Orin 64GB. When attempting to run

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

I get the following error:

File ~/.local/lib/python3.10/site-packages/torch/cuda/__init__.py:310, in _lazy_init() 
305 raise RuntimeError( 
306 "Cannot re-initialize CUDA in forked subprocess. To use CUDA with " 
307 "multiprocessing, you must use the 'spawn' start method" 
308 ) 
309 if not hasattr(torch._C, "_cuda_getDeviceCount"): --> 
310 raise AssertionError("Torch not compiled with CUDA enabled") 
311 if _cudart is None: 
312 raise AssertionError( 
313 "libcudart functions unavailable. It looks like you have a broken build?" 
314 ) 

AssertionError: Torch not compiled with CUDA enabled

I’m going to try a different container image to see if I get better luck, and will report back.

hi robert

is this a workbench related issue or are you doing this manually?

See here: What is NVIDIA AI Workbench? — NVIDIA AI Workbench User Guide

If it’s not workbench related, let’s find you the proper topic.

This was in workbench, but I haven’t had the chance to get back to it since my last reply.

Has anyone gotten this example to work on a DGX Spark? I got past a huggingface_hub runtime version conflict (it somehow transitively brought in 1.0.1 which was too new for diffusers), but now I am getting some sort of fp16/fp32 conflict:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[4], line 2
      1 prompt = "toy jensen in space"
----> 2 image = pipe(prompt=prompt).images[0]
      4 image

File /usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:124, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    120 @functools.wraps(func)
    121 def decorate_context(*args, **kwargs):
    122     # pyrefly: ignore [bad-context-manager]
    123     with ctx_factory():
--> 124         return func(*args, **kwargs)

File /workspace/diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py:1292, in StableDiffusionXLPipeline.__call__(self, prompt, prompt_2, height, width, num_inference_steps, timesteps, sigmas, denoising_end, guidance_scale, negative_prompt, negative_prompt_2, num_images_per_prompt, eta, generator, latents, prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds, ip_adapter_image, ip_adapter_image_embeds, output_type, return_dict, cross_attention_kwargs, guidance_rescale, original_size, crops_coords_top_left, target_size, negative_original_size, negative_crops_coords_top_left, negative_target_size, clip_skip, callback_on_step_end, callback_on_step_end_tensor_inputs, **kwargs)
   1289 else:
   1290     latents = latents / self.vae.config.scaling_factor
-> 1292 image = self.vae.decode(latents, return_dict=False)[0]
   1294 # cast back to fp16 if needed
   1295 if needs_upcasting:

File /workspace/diffusers/src/diffusers/utils/accelerate_utils.py:46, in apply_forward_hook.<locals>.wrapper(self, *args, **kwargs)
     44 if hasattr(self, "_hf_hook") and hasattr(self._hf_hook, "pre_forward"):
     45     self._hf_hook.pre_forward(self)
---> 46 return method(self, *args, **kwargs)

File /workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl.py:294, in AutoencoderKL.decode(self, z, return_dict, generator)
    292     decoded = torch.cat(decoded_slices)
    293 else:
--> 294     decoded = self._decode(z).sample
    296 if not return_dict:
    297     return (decoded,)

File /workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl.py:265, in AutoencoderKL._decode(self, z, return_dict)
    262 if self.post_quant_conv is not None:
    263     z = self.post_quant_conv(z)
--> 265 dec = self.decoder(z)
    267 if not return_dict:
    268     return (dec,)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1783, in Module._wrapped_call_impl(self, *args, **kwargs)
   1781     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1782 else:
-> 1783     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1794, in Module._call_impl(self, *args, **kwargs)
   1789 # If we don't have any hooks, we want to skip the rest of the logic in
   1790 # this function, and just call forward.
   1791 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1792         or _global_backward_pre_hooks or _global_backward_hooks
   1793         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1794     return forward_call(*args, **kwargs)
   1796 result = None
   1797 called_always_called_hooks = set()

File /workspace/diffusers/src/diffusers/models/autoencoders/vae.py:302, in Decoder.forward(self, sample, latent_embeds)
    300     # up
    301     for up_block in self.up_blocks:
--> 302         sample = up_block(sample, latent_embeds)
    304 # post-process
    305 if latent_embeds is None:

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1783, in Module._wrapped_call_impl(self, *args, **kwargs)
   1781     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1782 else:
-> 1783     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1794, in Module._call_impl(self, *args, **kwargs)
   1789 # If we don't have any hooks, we want to skip the rest of the logic in
   1790 # this function, and just call forward.
   1791 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1792         or _global_backward_pre_hooks or _global_backward_hooks
   1793         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1794     return forward_call(*args, **kwargs)
   1796 result = None
   1797 called_always_called_hooks = set()

File /workspace/diffusers/src/diffusers/models/unets/unet_2d_blocks.py:2639, in UpDecoderBlock2D.forward(self, hidden_states, temb)
   2637 def forward(self, hidden_states: torch.Tensor, temb: Optional[torch.Tensor] = None) -> torch.Tensor:
   2638     for resnet in self.resnets:
-> 2639         hidden_states = resnet(hidden_states, temb=temb)
   2641     if self.upsamplers is not None:
   2642         for upsampler in self.upsamplers:

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1783, in Module._wrapped_call_impl(self, *args, **kwargs)
   1781     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1782 else:
-> 1783     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1794, in Module._call_impl(self, *args, **kwargs)
   1789 # If we don't have any hooks, we want to skip the rest of the logic in
   1790 # this function, and just call forward.
   1791 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1792         or _global_backward_pre_hooks or _global_backward_hooks
   1793         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1794     return forward_call(*args, **kwargs)
   1796 result = None
   1797 called_always_called_hooks = set()

File /workspace/diffusers/src/diffusers/models/resnet.py:327, in ResnetBlock2D.forward(self, input_tensor, temb, *args, **kwargs)
    323     deprecate("scale", "1.0.0", deprecation_message)
    325 hidden_states = input_tensor
--> 327 hidden_states = self.norm1(hidden_states)
    328 hidden_states = self.nonlinearity(hidden_states)
    330 if self.upsample is not None:
    331     # upsample_nearest_nhwc fails with large batch sizes. see https://github.com/huggingface/diffusers/issues/984

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1783, in Module._wrapped_call_impl(self, *args, **kwargs)
   1781     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1782 else:
-> 1783     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1794, in Module._call_impl(self, *args, **kwargs)
   1789 # If we don't have any hooks, we want to skip the rest of the logic in
   1790 # this function, and just call forward.
   1791 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1792         or _global_backward_pre_hooks or _global_backward_hooks
   1793         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1794     return forward_call(*args, **kwargs)
   1796 result = None
   1797 called_always_called_hooks = set()

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:325, in GroupNorm.forward(self, input)
    324 def forward(self, input: Tensor) -> Tensor:
--> 325     return F.group_norm(input, self.num_groups, self.weight, self.bias, self.eps)

File /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2985, in group_norm(input, num_groups, weight, bias, eps)
   2978     raise RuntimeError(
   2979         f"Expected at least 2 dimensions for input tensor but received {input.dim()}"
   2980     )
   2981 _verify_batch_size(
   2982     [input.size(0) * input.size(1) // num_groups, num_groups]
   2983     + list(input.size()[2:])
   2984 )
-> 2985 return torch.group_norm(
   2986     input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled
   2987 )

RuntimeError: expected scalar type Half but found Float

I have been unsuccessful.
Tried several versions of fp16 and 32 conversion and casting to no avail.
Thinking updating the base image may help, I did a fork of that, and have been correcting different versions of required libraries.

GitHub - jacksodj/workbench-example-sdxl-customization at dgx-spark if you want to see what I was doing there.

I am currently stuck on nvwb build consuming all of the memory on the Spark and then crashing.

So, I am very interested in if there is a path to happiness here.

yeah I see, using yours, same thing happened to me, during Building wheel for flash-attn (setup.py): started. I’ve bumped my swap up to 128GB, see if it can get through it then, but it’s surprising that it needs to recompile flash attention, that seems like a strange thing not to have a binary aarch64 wheel for…

Hi @simon.thornington @joe173 ,

Thanks for bringing this to our attention. Just pushed a fix that bumps the package versions in the project.

I recommend re-cloning the project and trying again. Verified that it is running properly on my end, but please reach out if you run into further issues.

Apologies for the inconvenience, and thank you for your patience.

Fantastic thanks! If you wouldn’t mind, could you describe how you arrived at this fix? Why is it necessary to pip install something postBuild, rather than as a pip dependency in the regular container requirements? Where did that git hash come from? Thanks!

To get the fine-tuning working with webp images I had to install the apt package libwebp-devel and the pip package pillow==10.4.0 I believe