Ostris' AI Toolkit on DGX Spark

RazielAU · December 21, 2025, 7:35am

Just letting everyone know, my PR for DGX OS support in AI Toolkit has been merged by Ostris:

github.com/ostris/ai-toolkit

Add support for DGX OS

main ← raziel2001au:main

opened 01:22PM - 04 Dec 25 UTC

raziel2001au

+101 -1

Adds support for devices running DGX OS (including DGX Spark). I **had to remove… the _dw_pose_ dependency**, as there is no version that works with DGX OS (likely due to the lack of an ARM64 build of onnxruntime). Looking at the code, it seems like it will simply error out if the dependency is not found and prompt you to install it manually, from my discussion with Ostris it sounded like he'd be okay with this approach. From toolkit/control_generator.py: ```python elif control_type == 'pose': self.debug_print("Generating pose control") if self.control_pose_model is None: try: import onnxruntime onnxruntime.set_default_logger_severity(3) except ImportError: raise ImportError( "onnxruntime is not installed. Please install it with pip install onnxruntime or onnxruntime-gpu") try: from easy_dwpose import DWposeDetector self.control_pose_model = DWposeDetector( device=str(device)) except ImportError: raise ImportError( "easy-dwpose is not installed. Please install it with pip install easy-dwpose") ``` There are also some issues where DGX OS can't identify which versions of various dependencies it needs, so I've created a separate dgx_requirements.txt to include the exact versions of those dependencies, it then includes the normal requirements.txt to install the rest of the dependencies. They could be merged into the existing requirements.txt, but I thought it would be cleaner to handle it this way, and isolate anything that is DGX OS specific to its own requirements file. The instructions I've added advise to install the latest PyTorch version, which officially supports the DGX Spark, as the older version recommended on the main setup page is quite old, and is about three times slower than the current release of PyTorch, which likely has optimisations for Grace Blackwell systems. Finally, I've added a separate readme with installation instructions for DGX OS, as it is a little more involved than it is on other systems.

It contains instructions and changes needed to get it running on the Spark and other DGX OS devices. I would consider the support ‘initial’, it works, but there are areas that could be improved. I’m hoping NVIDIA might be able to supply a DGX Spark to Ostris himself as that would be the best way to ensure good support for the platform going forward. It is one of the most popular fine-tuners out there for image and video models, so it makes sense to make sure it is well supported on the DGX Spark, just something to think about for the team at NVIDIA.

iftgaf · December 22, 2025, 11:43am

Great news, Been waiting for this

joey28 · December 25, 2025, 4:30am

I was able to get AI Toolkit to run on my DGX Spark using the specific instructions linked to the github page. The only issue was a conflict between dgx_requirements.txt and requirements.txt related to scipy==1.16.0 in the former and scipy==1.12.0 in the latter. I deleted the one in requirements.txt and everything installed.

Also, it is easier to install nodejs and npm via apt update.

For ZImage LoRA training, it runs a training iteration in 5.3s/it for 20 images. The dashboard reports using 34.4 GB of RAM.

Thanks for your work on getting AI Toolkit to run on DGX Spark. Much appreciated.

RazielAU · December 25, 2025, 5:16am

Looks like the scipy entry was added to requirements.txt yesterday. I’ll have a chat to Ostris about it. We may need to introduce a shared base requirements file that’s included by the other requirements files.

Alternatively, we could move all requirements into the dgx_requirements.txt file so it doesn’t depend on requirements.txt at all, but then you have to maintain both files, so if any new libraries are added, they would also have to be added to dgx_requirements.txt for DGX OS devices.

I’ll chat to Ostris, see which way he wants to go and then make the change.

It is unfortunately difficult to avoid this kind of problem, especially since Ostris doesn’t have access to a DGX Spark, so it’s impossible for him to validate that changes don’t break things on DGX devices. This is why I want NVIDIA to give Ostris a DGX Spark, so things like this don’t happen, and these devices get the support they deserve (which I can’t provide, I’m just doing what I can to keep it running).

joey28 · December 25, 2025, 11:44am

The first training was successful, taking 4 hours 33 minutes to run 3000 training steps and making one image every 250 steps.

RazielAU · December 26, 2025, 11:05am

I’ve created a PR to fix the dgx_requirements.txt issue, after speaking with Ostris, he preferred that we just separate the requirements into its own file and maintain them separately. I guess this also offers the benefit that we can use newer versions if anything is identified that works much better on the Spark.

I looked into your suggestion for NodeJS, but ‘apt install nodejs‘ seems to install NodeJS v18, rather than v24. I think it’s better for people to run the current 24 LTS release, so I left the instructions as-is.

joey28 · January 7, 2026, 11:23pm

AI Toolkit is working very well. I’ve created LoRAs with Flux1, ZImage, Illustrious, and Qwen 2512.

What’s the best way to upgrade to a new version as new models are released?

RazielAU · January 7, 2026, 11:32pm

Generally, you should be able to just do a git pull. The python code doesn’t need anything specific unless the requirements.txt has been updated, in which case just pip install those. The command in the instructions for the Node based UI will already build it everything for you. In short, you should really have to do anything specific most of the time. Just git pull and if something breaks, see if you need to update a requirement.

vinceanido · January 16, 2026, 9:44pm

Would it be possible for you to post some comparisons with any other regular GPU you might have? I feel like DGX Spark is literally built for this use-case but I can’t find anyone posting genai video training benchmarks.

I’m especially curious about directly comparing a regular GPU (like a 5090 or 4090) and batch sizes vs more iterrations & gradient accumulating. With all of that effective memory the Spark should be able to be very competitive with big GPUs in this task.

joey28 · January 16, 2026, 10:06pm

The only other GPU I can compare the DGX Spark to is a Gigabyte laptop with a 3070TI (8GB VRAM). The DGX Spark is 2x to 3x faster with the ai and video models I have tried on both machines. The other advantage of the Spark is I can make normal sized videos, the 3070TI was limited to 512x768.

The biggest advantage of the Spark over other GPUs is I never really run out of memory. Even when I’m running gpt-oss:120b, I can still run all of the ComfyUI workflows I have with the exception of Flux 2, and there’s a known bug in ComfyUI related to the shared memory of the Spark that causes that problem.

joey28 · January 16, 2026, 10:08pm

By the way, git pull worked perfectly for updating AI-Toolkit the last time I tried it.

RazielAU · January 17, 2026, 3:50am

What you’re asking for is really difficult, ended up taking me literally a couple of hours to find a configuration that would even train on a 5090, 32GB is just not enough for any kind of serious training of video models. To train 109 frames which is what my dataset is currently made for, I had to use low vram setting and switching to WAN 2.2 5B, and train at 512, as soon as I tried 768 I would keep running out of VRAM on the 5090.

Long story short, in a like-for-like training of WAN 2.2 5B, I got the following for the training steps:
5090: 6.57s/it (for training)
DGX Spark: 21.27s/it (for training)

And the following in like-for-like sample generation:
5090: 1.65s/it (for sample image generation)
DGX Spark: 9.14s/it (for sample image generation)

I’ve never worked out why, but in AI toolkit, sample generation has always been particularly slow on the Spark, in this case about 5.5x slower than the 5090. Training is closer to what I’d expect, 3.24x the time it takes on the 5090. In general I tell people the DGX Spark is around 4 times slower than the 5090, but that’s just a rough number, depending on what you’re doing it can obviously be faster or slower than that. In terms of performance, DGX Spark is 1000 TOPS, compared to the 5090’s 3352 TOPS, so in compute heavy tasks, it’s probably going to be fairly close to those numbers as we saw above, but then the memory is a lot slower (273GB/s vs 1.79TB/s, which at worst could mean 6.5x lower performance, but I’ve personally never seen an example where the difference was that large, but it is technically possible), so for workloads that specifically hit the memory hard, such as LLM inference, it would be somewhat slower.

Now in theory, there’s options you can mess around with on the Spark, but at that point we’re not doing a like-for-like comparison, I’m also not going to mess around further as it’s really difficult to do any sort of video fine-tuning on the 5090, it just doesn’t have enough VRAM for it.

As for batch sizes, I usually train with batch size 1 as I haven’t found the increase in speed to be significantly better, and unless something has changed in the last few years, higher batch sizes usually reduce the quality of the training, so it only makes sense if it gives you enough of a boost in performance to justify it. As a test, I switched to batch size 2 (something I normally don’t do), and as you can see, doing twice as much per iteration also happens to take about twice as long: 40.83s/it

This is not a bad thing as it means the GPU is well saturated at batch size 1, you’re already getting near 100% out of the GPU, so doing twice as much just means you’re doing each at about half the speed.

The 5090 is a really fast GPU, but the important thing about the DGX Spark is that it can run and train models that won’t even work on the best consumer GPUs in the first place. If I could only have one or the other, I’d pick the DGX Spark every time.

Topic		Replies	Views
Has anyone been able to get Ostris' AI Toolkit running on DGX Spark? DGX Spark / GB10	22	2954	December 19, 2025
Effective PyTorch and CUDA DGX Spark / GB10 cudnn	23	10368	January 12, 2026
Anyone got nanochat training working on the DGX spark? DGX Spark / GB10	10	1603	November 21, 2025
DGX Spark PyTorch LLM training throughput up to 8x slower than expected DGX Spark / GB10	3	567	April 2, 2026
Impressions on DGX Spark after a day's use DGX Spark / GB10	6	1467	October 23, 2025
Suggestion for https://build.nvidia.com/spark/ DGX Spark / GB10	2	551	February 18, 2026
How are you planning on using your DGX spark? DGX Spark / GB10 Projects	22	2777	February 24, 2026
Which PyTorch base image to use in AI Workbench with DGX Spark? DGX Spark / GB10	13	980	November 18, 2025
DGX Spark performance DGX Spark / GB10	50	4832	February 27, 2026
AI Workbench for DGX Spark? DGX Spark / GB10	2	448	October 24, 2025

Ostris' AI Toolkit on DGX Spark

Related topics