Has anyone successfully run SGlang[diffusion] on spark? I first try to build docker image from scratch and install SGlang[diffusion] from source but server will not even start. In my second try, I pull sglang:dev-arm64 from Docker hub but it also failed after received the http call.
The existing SGLang for Inference playbook needs some modification in order to serve Stable Diffusion models like Z-Image Turbo
docker pull lmsysorg/sglang:dev-arm64
docker run --gpus all \
–shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
–env “HF_TOKEN=$HF_TOKEN” \
–ipc=host \
lmsysorg/sglang:dev-arm64 \
sglang serve --model-path Tongyi-MAI/Z-Image-Turbo --port 30000
After server started I sent this prompt :
curl http://localhost:30000/v1/images/generations \
-o >(jq -r ‘.data[0].b64_json’ | base64 --decode > example.png) \
-H “Content-Type: application/json” \
-d ‘{\
“model”: “Tongyi-MAI/Z-Image-Turbo”,\
“prompt”: “A cute baby sea otter”,\
“n”: 1,\
“size”: “1024x1024”,
“response_format”: “b64_json”\
}’\
SGLang server was not able to complete the request and display the following message:
=====================================================================
[2026-01-26 14:43:19] INFO: Uvicorn running on http://0.0.0.0:30000 (Press CTRL+C to quit)
[01-26 14:43:26] Sampling params:
width: 1024
height: 1024
num_frames: 1
prompt: A cute baby sea otter
neg_prompt: None
seed: 1024
infer_steps: 9
num_outputs_per_prompt: 1
guidance_scale: 0.0
embedded_guidance_scale: 6.0
n_tokens: None
flow_shift: None
image_path: None
save_output: True
output_file_path: outputs/81a5cd28-4792-4554-abb9-fbd5fb835ec7.jpg
[01-26 14:43:26] Running pipeline stages: [‘input_validation_stage’, ‘prompt_encoding_stage_primary’, ‘conditioning_stage’, ‘timestep_preparation_stage’, ‘latent_preparation_stage’, ‘denoising_stage’, ‘decoding_stage’]
[01-26 14:43:26] [InputValidationStage] started…
[01-26 14:43:26] [InputValidationStage] finished in 0.0001 seconds
[01-26 14:43:26] [TextEncodingStage] started…
[01-26 14:43:27] [TextEncodingStage] finished in 0.9637 seconds
[01-26 14:43:27] [ConditioningStage] started…
[01-26 14:43:27] [ConditioningStage] finished in 0.0000 seconds
[01-26 14:43:27] [TimestepPreparationStage] started…
[01-26 14:43:27] [TimestepPreparationStage] finished in 0.0009 seconds
[01-26 14:43:27] [LatentPreparationStage] started…
[01-26 14:43:27] [LatentPreparationStage] finished in 0.0051 seconds
[01-26 14:43:27] [DenoisingStage] started…
0%| | 0/9 [00:00<?, ?it/s]
[01-26 14:43:27] [DenoisingStage] Error during execution after 315.6838 ms: RMSNorm failed with error code no kernel image is available for execution on the device
Traceback (most recent call last):
File “/sgl-workspace/sglang/python/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py”, line 200, in call
result = self.forward(batch, server_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “/sgl-workspace/sglang/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py”, line 1020, in forward
noise_pred = self._predict_noise_with_cfg(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/sgl-workspace/sglang/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py”, line 1265, in _predict_noise_with_cfg
noise_pred_cond = self._predict_noise(
^^^^^^^^^^^^^^^^^^^^
File “/sgl-workspace/sglang/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py”, line 1211, in _predict_noise
return current_model(
^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/sgl-workspace/sglang/python/sglang/multimodal_gen/runtime/models/dits/zimage.py”, line 620, in forward
x = layer(x, x_freqs_cis, adaln_input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/sgl-workspace/sglang/python/sglang/multimodal_gen/runtime/models/dits/zimage.py”, line 262, in forward
self.attention_norm1(x) * scale_msa,
^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/sgl-workspace/sglang/python/sglang/multimodal_gen/runtime/layers/custom_op.py”, line 29, in forward
return self._forward_method(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/sgl-workspace/sglang/python/sglang/multimodal_gen/runtime/layers/layernorm.py”, line 88, in forward_cuda
out = rmsnorm(x, self.weight.data, self.variance_epsilon)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/sgl_kernel/elementwise.py”, line 45, in rmsnorm
torch.ops.sgl_kernel.rmsnorm.default(out, input, weight, eps, enable_pdl)
File “/usr/local/lib/python3.12/dist-packages/torch/_ops.py”, line 841, in call
return self._op(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^