Weird DGX Spark issues - unable to do a docker pull

New DGX Spark, having a lot of issues like this, when I do a docker pull from docker registry it always complains that there is a sha256 mismatch. However, that same pull works flawless on my other linux machine on the same network.

I’ve tried:

  • booting a different kernel
  • restoring the complete system
  • running the field diagnostic test - PASS
  • switching from docker-ce to docker.io
  • connect it to a different network

But I’m unable to do anything with Spark.

Any ideas? SSL broken? File truncating issues?

$ docker pull nvidia/cuda:13.2.0-devel-ubuntu24.04

13.2.0-devel-ubuntu24.04: Pulling from nvidia/cuda

9e4aabb282ff: Download complete

314a2f652b6c: Download complete

66a4bbbfab88: Pull complete

15039e7d116d: Downloading [====>                                              ]  131.1MB/1.426GB

e5e5f1aa8be2: Download complete

18218423db25: Pull complete

949f935e0756: Download complete

6f596c6704f1: Download complete

32cc32da7083: Downloading [===>                                               ]  139.5MB/1.956GB

067f10bdf22e: Downloading [==========================================>        ]  126.9MB/147.8MB

4f4fb700ef54: Download complete

09e9408034f3: Downloading [==================================================>]  138.9MB/138.9MB

3fa23874706e: Download complete

failed commit on ref “layer-sha256:09e9408034f3348fbee4f5265943b9138f2ec94d181f5790ea2661e5d9c47436”: commit failed: unexpected commit digest sha256:040837fb823953c2ad1b0a6cff95ecc231d76ba1d5beee5f4170bce4d766c144, expected sha256:09e9408034f3348fbee4f5265943b9138f2ec94d181f5790ea2661e5d9c47436: failed precondition

Try “docker pull –no-cache”. I did a test pull and it worked okay:

elsaco@spark2:~$ docker pull nvidia/cuda:13.2.0-devel-ubuntu24.04
13.2.0-devel-ubuntu24.04: Pulling from nvidia/cuda
9e4aabb282ff: Pull complete
314a2f652b6c: Pull complete
15039e7d116d: Pull complete
66a4bbbfab88: Pull complete
18218423db25: Pull complete
e5e5f1aa8be2: Pull complete
09e9408034f3: Pull complete
3fa23874706e: Pull complete
949f935e0756: Pull complete
32cc32da7083: Pull complete
6f596c6704f1: Pull complete
4f4fb700ef54: Pull complete
067f10bdf22e: Pull complete
Digest: sha256:f9492f2eea77fbc3d0c14fa8738f35946b42da72917bf5959d284ca39b4f209a
Status: Downloaded newer image for nvidia/cuda:13.2.0-devel-ubuntu24.04
docker.io/nvidia/cuda:13.2.0-devel-ubuntu24.04

You might have old data that needs to be purged.

It’s getting even weirder. I’m running sha256sum on the same file and getting different results?!? Is this hardware broken?

Hi Igor70

The key clue is:

failed commit on ref "layer-sha256:..." unexpected commit digest ... expected ...

That means Docker downloaded bytes for a layer, then the bytes written to local store did not hash to the digest advertised by the registry. The same image pulling cleanly on another machine on the same network makes the registry and network path much less likely as the root cause.

Stale or corrupted local content is a likely suspect. Try ths

docker system prune -a --volumes
docker builder prune -a
sudo systemctl restart docker containerd

then pull the cuda image again?

I did try that, no help. But look at this, when I run sha256sum on the same file multiple times I get different sums?! It’s the exact same file, sha1 and md5 sums are the same but sha256sum is acting weird.

32cc32da70838bdb5ffb9cde096fc0459a8c212e4830df8ddff30594dddc676a test3.tar.gz
-rw-rw-r-- 1 nvidia nvidia 1955557337 Mar 26 21:23 test3.tar.gz
32cc32da70838bdb5ffb9cde096fc0459a8c212e4830df8ddff30594dddc676a test3.tar.gz
-rw-rw-r-- 1 nvidia nvidia 1955557337 Mar 26 21:23 test3.tar.gz
b1f7e219d54819b1d3ed23d069012d130f36308f0976081fbe95cf97fbf512c1 test3.tar.gz
-rw-rw-r-- 1 nvidia nvidia 1955557337 Mar 26 21:23 test3.tar.gz
32cc32da70838bdb5ffb9cde096fc0459a8c212e4830df8ddff30594dddc676a test3.tar.gz

Agree, this is very odd. Please run NVIDIA DGX Spark Field Diagnostics | NVIDIA , then DM me the logs and discuss next steps.

I’ve sent the logs in DM.

Thank you. Next steps discussed in DM. Recommend RMA.