Reinstalling the NVIDIA driver on DGX Spark

~$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

How can I reinstall the driver?

it says nvidia-driver-580-open already installed.

Have a look into this thread:

Can you do a

$ dpkg -l|grep nvidia-driver

before installing that new version - just to check which exact version you currently have installed?

driver updates are delivered from the DGX Spark update process. You should not need to update the driver.
Latest driver version is: 580.95.05

can you share the output from /etc/fastos-release and /etc/dgx-release

Are you able to restore from the USB recovery?

ii nvidia-driver-580-open 580.95.05-0ubuntu0.24.04.2 arm64 NVIDIA driver (open kernel) metapackage

sudo: /etc/fastos-release: command not found

sudo: /etc/dgx-release: command not found

sudo cat /etc/fastos-release

abull trying to see the contents of these files

NAME=“DGX SPARK FASTOS”
DATE=“2025-09-12T22:05:35+00:00”
VERSION=“1.81.38”
BUILD_TYPE=“customer”

DGX_NAME=“DGX Spark”
DGX_PRETTY_NAME=“NVIDIA DGX Spark”
DGX_SWBUILD_DATE=“2025-09-10-13-50-03”
DGX_SWBUILD_VERSION=“7.2.3”
DGX_COMMIT_ID=“833b4a7”
DGX_PLATFORM=“DGX Server for KVM”
DGX_SERIAL_NUMBER=“Not Specified”

When did this problem start, was it an update, or modifications you made?

It looks like you have pending updates. Latest updates listed in /etc/dgx-release would have the following lines:

DGX_SWBUILD_DATE=“2025-10-04-06-28-28”
DGX_SWBUILD_VERSION=“7.2.3”
DGX_COMMIT_ID=“03dc741”

You can check what updates are available: apt list --upgradable

Can you update your DGX Spark, reboot it and check nvidia-smi?

Its self inflicted, I want to use faiss-gpu which is not available for current DGX, installed lower version of the driver with apt get. it messed up nvidia-smi, now unbale to connect to GPU.

trying to find a way without recovering the whole system.

~$ sudo apt list --upgradable
[sudo] password for pothineni:
Listing… Done
code/stable 1.106.0-1762878358 arm64 [upgradable from: 1.105.1-1760482225]
N: There are 162 additional versions. Please use the ‘-a’ switch to see them.
(base) pothineni~$ sudo apt list --upgradable -a
Listing… Done
code/stable 1.106.0-1762878358 arm64 [upgradable from: 1.105.1-1760482225]
code/stable,now 1.105.1-1760482225 arm64 [installed,upgradable to: 1.106.0-1762878358]
code/stable 1.105.0-1759933569 arm64
code/stable 1.104.3-1759409526 arm64
code/stable 1.104.2-1758714550 arm64
code/stable 1.104.1-1758154116 arm64
code/stable 1.104.0-1757488163 arm64
code/stable 1.103.2-1755710123 arm64
code/stable 1.103.1-1755017259 arm64
code/stable 1.103.0-1754517464 arm64
code/stable 1.102.3-1753759610 arm64
code/stable 1.102.2-1753188107 arm64

………………..

$:docker run --gpus all -it --rm --ipc=host
-v $HOME/.cache/huggingface:/root/.cache/huggingface
-v ${PWD}:/workspace -w /workspace

nvcr.io/nvidia/pytorch:25.09-py3

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown

Run ‘docker run --help’ for more information

disabling the secure boot, resolved the issue.

I think secure boot blocking the driver, if it reinstalled.

If you are using the signed drivers, they will work with secure boot. If you reinstalled the driver, you need to see if you used the signed drivers.

1 Like

Did you run this command?

sudo apt install nvidia-driver-580-open

this tool is also a good way to provide logs back to us, feel free to send as a DM.

sudo nvidia-bug-report.sh
 apt install nvidia-driver-580-open

Yes

sudo nvidia-bug-report.sh

SEnt

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.