The motivation was simple: original large image generation models can be painfully slow on DGX Spark.
vitoom-nunchaku is optimized specifically for DGX Spark, with the goal of accelerating image inference and reducing VRAM usage.
Flux.2 Klein 9B Inference Benchmark
Environment: DGX Spark
Model: Flux.2 Klein 9B
Steps: 8
| Configuration | Load Time | Inference Speed | Inference Time | Peak VRAM | Transformer VRAM | Text Encoder VRAM |
|---|---|---|---|---|---|---|
| fp16, no pretouch | 249.748s | 1.25s/it | 10s | 37.14GB | 16.91GB | 15.26GB |
| fp16, pretouch | 15.999s | 1.25s/it | 10s | 37.14GB | 16.91GB | 15.26GB |
| Nunchaku quantized transformer, pretouch | 15.999s | 1.82it/s | 4s | 25.61GB | 5.40GB | 15.26GB |
| Nunchaku quantized transformer + text encoder, pretouch | 15.999s | 1.83it/s | 4s | 15.21GB | 5.40GB | 4.86GB |
Summary
Enabling pretouch significantly improves model loading time on DGX Spark. For the fp16 model, load time drops from 249.748s to 15.999s, which is about a 15.6x speedup. It does not change inference speed or VRAM usage.
Using the Nunchaku quantized transformer improves inference performance substantially. End-to-end inference time for 8 steps drops from 10s to 4s, giving a 2.5x total inference speedup. Peak VRAM decreases from 37.14GB to 25.61GB, a reduction of 11.53GB.
Adding the Nunchaku quantized text encoder further reduces memory usage. Peak VRAM drops to 15.21GB, which is 21.93GB less than fp16, or about a 59% reduction. Inference speed remains roughly the same as transformer-only quantization, at around 1.83it/s.
Nunchaku wheel:
- Hugging Face repo:
tonera/vitoom-nunchaku - File:
nunchaku-1.3.0.dev20260622+cu13.0torch2.11-cp311-cp311-linux_aarch64.whl
Quantized text encoder:
- Hugging Face repo:
tonera/Qwen3-text-Nunchaku
Vitoom Nunchaku for DGX Spark
This version of vitoom-nunchaku is optimized specifically for DGX Spark. It is designed to accelerate image inference and reduce VRAM usage.
In addition to Flux.2 Klein 9B, it also supports the following image generation models:
tonera/Qwen-Image-2512-Lightning-Nunchakutonera/Chroma1-HD-SVDQtonera/Qwen-Image-Edit-2511-Lightning-Nunchakutonera/FLUX.2-klein-9b-kv-Nunchakutonera/FLUX.2-klein-4B-Nunchaku- Z-Image-Turbo
- Flux series
- SDXL series
Of course, you can also use my previously open-source Vitoom project to build your own local DGX Spark AI workstation: Vitoom: Browser-first multimodal AIGC + AI Agent for DGX Spark / RTX Spark . It not only optimizes image inference performance but also supports custom local models for text, video, and audio.