[SOLVED] DGX OS - Toolkit package conflict

Hello.

I received my DGX Spark in mid-December, but with work and the holidays, I didn’t have time to use it until now. I just opened the box, ran the initial setup wizard, installed ComfyUI, and that’s about it.

Now, I want to run serious workloads on this machine and compile llama.cpp, but here’s the issue: when I check my “nvcc,” I see that I have CUDA version 12 installed (Build cuda_12.0.r12.0/compiler.32267302_0), whereas, if I understand correctly, I should have the version for CUDA 13.

Looking at what’s installed on my system with apt list --installed | grep -E 'toolkit', it seems I have two versions of the toolkit:

cuda-toolkit-13-0-config-common/inconnu,now 13.0.96-1 all  [installé, automatique]
cuda-toolkit-13-0/inconnu,now 13.0.2-1 arm64  [installé]
cuda-toolkit-13-config-common/inconnu,now 13.1.80-1 all  [installé, automatique]
cuda-toolkit-config-common/inconnu,now 13.1.80-1 all  [installé, automatique]
nvidia-container-toolkit-base/inconnu,now 1.18.1-1 arm64  [installé, automatique]
nvidia-container-toolkit/inconnu,now 1.18.1-1 arm64  [installé, automatique]
nvidia-cuda-toolkit-doc/noble,now 12.0.1-4build4 all  [installé, automatique]
nvidia-cuda-toolkit/noble,now 12.0.140~12.0.1-4build4 arm64  [installé]

I really don’t know how I ended up in this situation. I barely touched the system, just did a few updates. Looking at my history, I ran an apt full-upgrade; maybe that’s the culprit, but I’m not sure.

Any advice on how to clean up this mess and keep only the CUDA 13 version (which is supposed to be the default on this system, right)? I can “reset” the spark by reinstalling everything if necessary, but since this issue might recur, I would prefer a permanent solution.

Thanks for your help.

same problem here, to build llama.cpp try:
cmake -B build -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc -DLLAMA_CURL=ON -DGGML_CUDA=ON

[SOLVED] I found the culprit… it’s me! 😅

After investigating with grep in /var/log/apt/history.log*, I found this:

Start-Date: 2025-12-08 13:52:55
Commandline: apt install nvidia-cuda-toolkit

So I installed it myself. I don’t remember exactly why I installed it in the first place, but here’s what happened:

I did a sudo bash to get a root shell, but this opens a non-login shell, which means /etc/profile.d/*.sh scripts are not sourced. The DGX OS sets up the CUDA path in /etc/profile.d/nv_paths.sh:

export PATH=/usr/local/cuda/bin:/opt/bin/:$PATH

Since this wasn’t loaded, nvcc wasn’t in my PATH. I assumed CUDA wasn’t installed and ran apt install nvidia-cuda-toolkit — which pulled in the Ubuntu repo’s CUDA 12 version, shadowing the proper CUDA 13 from /usr/local/cuda/bin.

The fix:

sudo apt remove nvidia-cuda-toolkit
sudo apt autoremove

This removed CUDA 12 (and ~60 orphaned dependencies). The proper CUDA 13 (cuda-toolkit-13-0) was already installed and remains untouched at /usr/local/cuda/.

Lesson learned: On DGX OS, use sudo -i or sudo su - instead of sudo bash to get a proper login shell with all paths configured. Or just remember that CUDA lives in /usr/local/cuda/bin/ and the package is called cuda-toolkit-13-0, not nvidia-cuda-toolkit.

Hope this helps anyone who falls into the same trap!