Basic tutorial to get Ostris’ AI Toolkit running on DGX Spark:
Here are some extra dependencies you MIGHT need. I’m unsure which of these are actually required as I was experimenting quite a lot, I suspect none of these are needed, but would appreciate if someone can confirm and let us know which, if any, are needed, then I’ll update these instructions:
gfortran:
sudo apt install gfortran
OpenBLAS:
$ git clone https://github.com/OpenMathLib/OpenBLAS.git
$ make
$ sudo make install
liblapack and libblas:
$ sudo apt install liblapack-dev libblas-dev
Rust + Cargo:
$ sudo apt install cargo
Here are the steps to install AI Toolkit:
1) Install node
I’m not going to go into a huge amount of detail with this, you basically want to get the ARM64 version for Linux:
https://nodejs.org/dist/v24.11.1/node-v24.11.1-linux-arm64.tar.xz
Extract that somewhere and add it into your path, I simply added the following to my ~/.bashrc file:
export PATH=“/opt/node-v24.11.1-linux-arm64/bin:$PATH”
2) Get Python 3.11 (miniconda recommended for this)
There are packages that require at least 3.10, and other packages that require a version lower than 3.12, so through experimentation I’ve concluded that it really only works with Python 3.11.
The easiest way to do this is to install miniconda:
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh
$ chmod u+x Miniconda3-latest-Linux-aarch64.sh
$ ./Miniconda3-latest-Linux-aarch64.sh
If you want to disable it loading the base environment by default (which I recommend), run:
$ conda config --set auto_activate_base false
Now you can create a Python 3.11 environment for ai-toolkit:
$ conda create --name ai-toolkit python=3.11
And then activate the environment:
$ conda activate ai-toolkit
3) Install PyTorch (make sure your conda environment from step 2 is activated)
To be honest, I’m not sure if you need this exact version of PyTorch, but that’s what’s mentioned on the official AI Toolkit page, that version does not have a CUDA 13.0 version available, just a CUDA 12.8 version, so that’s what we’re going to install here, I suspect the latest CUDA 13 version will likely work:
$ pip3 install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
4) Tweak the requirements.txt file
For some reason the DGX Spark seems to have trouble figuring out which versions of some of the dependencies it needs to install, as a result, we need to lock in some of the versions, to do this, add these entries to the requirements.txt file:
scipy==1.16.0
tifffile==2025.6.11
imageio==2.37.0
scikit_image==0.25.2
clean_fid==0.1.35
pywavelets==1.9.0
contourpy==1.3.3
opencv_python_headless==4.11.0.86
Now remove:
git+https://github.com/jaretburkett/easy_dwpose.git
Based on my discussions with Ostris, easy_dwpose was used for auto generating pose estimations for flex.2 but he says he can make it optional if import fails. I’ve only tested training Qwen-Image-Edit-2509 and that worked fine without it. This dependency pulls in a bunch of troublesome libraries like onnxruntime, so my recommendation is to skip it.
5) Install the requirements.txt
$ pip3 install -r requirements.txt
6) Compile and run the node UI
Go to your ai-toolkit/ui folder and run:
$ npm run build_and_start
If all went well, you’ll be able to access the UI and kick off training jobs. If you’re not getting output when you’re starting a job, most likely it’s crashing before the process has started, the best way to debug these issues is using the CLI: The CLI is called from the UI, it just does it in the background. Just set up something in the UI, go to the advanced config screen and copy and paste the config into a file like train.yaml then with the virtual environment active, just run:
$ python run.py path/to/train.yaml