~$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
How can I reinstall the driver?
it says nvidia-driver-580-open already installed.
~$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
How can I reinstall the driver?
it says nvidia-driver-580-open already installed.
Have a look into this thread:
Can you do a
$ dpkg -l|grep nvidia-driver
before installing that new version - just to check which exact version you currently have installed?
driver updates are delivered from the DGX Spark update process. You should not need to update the driver.
Latest driver version is: 580.95.05
can you share the output from /etc/fastos-release and /etc/dgx-release
Are you able to restore from the USB recovery?
ii nvidia-driver-580-open 580.95.05-0ubuntu0.24.04.2 arm64 NVIDIA driver (open kernel) metapackage
sudo: /etc/fastos-release: command not found
sudo: /etc/dgx-release: command not found
sudo cat /etc/fastos-release
abull trying to see the contents of these files
NAME=“DGX SPARK FASTOS”
DATE=“2025-09-12T22:05:35+00:00”
VERSION=“1.81.38”
BUILD_TYPE=“customer”
DGX_NAME=“DGX Spark”
DGX_PRETTY_NAME=“NVIDIA DGX Spark”
DGX_SWBUILD_DATE=“2025-09-10-13-50-03”
DGX_SWBUILD_VERSION=“7.2.3”
DGX_COMMIT_ID=“833b4a7”
DGX_PLATFORM=“DGX Server for KVM”
DGX_SERIAL_NUMBER=“Not Specified”
When did this problem start, was it an update, or modifications you made?
It looks like you have pending updates. Latest updates listed in /etc/dgx-release would have the following lines:
DGX_SWBUILD_DATE=“2025-10-04-06-28-28”
DGX_SWBUILD_VERSION=“7.2.3”
DGX_COMMIT_ID=“03dc741”
You can check what updates are available: apt list --upgradable
Can you update your DGX Spark, reboot it and check nvidia-smi?
Its self inflicted, I want to use faiss-gpu which is not available for current DGX, installed lower version of the driver with apt get. it messed up nvidia-smi, now unbale to connect to GPU.
trying to find a way without recovering the whole system.
~$ sudo apt list --upgradable
[sudo] password for pothineni:
Listing… Done
code/stable 1.106.0-1762878358 arm64 [upgradable from: 1.105.1-1760482225]
N: There are 162 additional versions. Please use the ‘-a’ switch to see them.
(base) pothineni~$ sudo apt list --upgradable -a
Listing… Done
code/stable 1.106.0-1762878358 arm64 [upgradable from: 1.105.1-1760482225]
code/stable,now 1.105.1-1760482225 arm64 [installed,upgradable to: 1.106.0-1762878358]
code/stable 1.105.0-1759933569 arm64
code/stable 1.104.3-1759409526 arm64
code/stable 1.104.2-1758714550 arm64
code/stable 1.104.1-1758154116 arm64
code/stable 1.104.0-1757488163 arm64
code/stable 1.103.2-1755710123 arm64
code/stable 1.103.1-1755017259 arm64
code/stable 1.103.0-1754517464 arm64
code/stable 1.102.3-1753759610 arm64
code/stable 1.102.2-1753188107 arm64
………………..
$:docker run --gpus all -it --rm --ipc=host
-v $HOME/.cache/huggingface:/root/.cache/huggingface
-v ${PWD}:/workspace -w /workspace
nvcr.io/nvidia/pytorch:25.09-py3
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown
Run ‘docker run --help’ for more information
disabling the secure boot, resolved the issue.
I think secure boot blocking the driver, if it reinstalled.
If you are using the signed drivers, they will work with secure boot. If you reinstalled the driver, you need to see if you used the signed drivers.
Did you run this command?
sudo apt install nvidia-driver-580-open
this tool is also a good way to provide logs back to us, feel free to send as a DM.
sudo nvidia-bug-report.sh
apt install nvidia-driver-580-open
Yes
sudo nvidia-bug-report.sh
SEnt
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.