New DGX OS 7.4.0

Updated just a Minute ago
NVIDIA-SMI 580.126.09 -
CUDA Version: 13.0
DGX_OTA_VERSION=“7.3.1”
linux-image-nvidia-hwe-24.04
Depends: linux-image-6.17.0-1008-nvidia

What I kind of “dislike” is that the Nvidia Dashboard shows that there is an update but does not say anything what it is. No Version numbers, no change log, nothing.
Feels like a Mystery Box.

I finally did the full upgrade and kernel upgrade and fun fact – we got access to more RAM!

125511968 / 119Gi → 127601452 / 121Gi

OLD:

drew@spark-2918:~$ free
               total        used        free      shared  buff/cache   available
Mem:       125511968    15067440    54014684       96472    58544652   110444528
Swap:       16777212     9375760     7401452
drew@spark-2918:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           119Gi        14Gi        51Gi        94Mi        55Gi       105Gi
Swap:           15Gi       8.9Gi       7.1Gi

NEW:

drew@spark-7bea:~$ free
               total        used        free      shared  buff/cache   available
Mem:       127601452   117618200      821208     1250520    11679600     9983252
Swap:       16777212     4401740    12375472
drew@spark-7bea:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           121Gi       112Gi       909Mi       1.2Gi        10Gi       9.6Gi
Swap:           15Gi       4.2Gi        11Gi
2 Likes

Founders or another oem/partner model?

Founders Edition

Interesting - I’m fully upgraded with an MSI variant, no change in total mem.

Looks like MSI pushed out a new image - might just try that out….for science.

It’s the new firmware! The FE has a new Embedded Controller package. Not sure if the same firmware would work on a partner variant. Try fwupdmgr get-updates for details.

Did this just release today? Did I miss a release notes hah

WARNING… I did figure out the hard way why why Nvidia is NOT recommending manual upgrades to driver v.590, but its the following, so would NOT recommend upgrading just yet! Downgraded to 580 and its all good now.
NVIDIA Driver 590.48.01 Critical Bug on DGX Spark (GB10) - Memory Leak

The Problem:

NVIDIA driver version 590.48.01 has a critical memory leak bug on GB10/DGX Spark hardware that makes the system unusable for AI/ML workloads.

Symptoms:

  • 114 GiB of RAM leaked/missing out of 121 GiB total (only 5.5 GiB free with nothing running!)

  • 87 GiB of memory is “unaccounted for” in CUDA memory breakdown

  • Model loading fails with error: cannot meet free memory target of 1024 MiB, need to reduce device memory by 25164 MiB

  • Only Xorg and basic system processes running, yet memory shows as consumed

  • nvidia-smi shows massive memory usage even with no GPU workloads

Affected Hardware:

  • NVIDIA DGX Spark (GB10) with unified memory architecture

  • ARM64 platform with 128 GiB shared CPU/GPU memory

Root Cause:

Driver 590 was NOT officially ready for GB10 hardware - NVIDIA confirmed this in forums. Most GB10 users are still on driver 580.126.09 even after kernel upgrades.

The Fix:

Downgrade to driver 580.126.09 - this is the stable, officially supported driver for GB10.

After downgrade:

  • ✅ Memory returns to normal: ~114 GiB free (not 5.5 GiB!)

  • ✅ Models load successfully

  • ✅ System is usable again

Warning to GB10/DGX Spark Users:

DO NOT upgrade to NVIDIA driver 590 on GB10 hardware! Stay on 580.126.09 until NVIDIA officially releases a fixed version for GB10.

1 Like

@kosta how do you check memory usage with the nvidia-smi? All I get is [N/A]

elsaco@spark2:~$ nvidia-smi --query-gpu=memory.used,memory.total,memory.free
memory.used [MiB], memory.total [MiB], memory.free [MiB]
[N/A], [N/A], [N/A]

On GB10, there IS NO separate GPU RAM!

That’s why nvidia-smi shows [N/A] for memory - the GB10 uses unified memory architecture. The CPU and GPU share the same 121 GiB of RAM.

So:

  • free -h shows your total system RAM = your GPU VRAM

  • They’re the same pool of memory

my EC firmware is 10400, what is yours at?

I recommend using uvx nvitop

1 Like

Per Johnny (at Nvidia):

“There has been an update: the reserved carveout in UEFI (not the Embedded Controller) was reduced from 4GB to 2GB. This change will be available to all partners once they adopt the OTA2 firmware and roll it out to their devices.”

This will hit partner devices soon enough.

1 Like

Guess 5 days ago MSI pushed it to testing, run these commands on your variants and see if your vendors have pushed anything to testing:

sudo fwupdmgr enable-remote lvfs-testing
sudo fwupdmgr refresh
sudo fwupdmgr update

read the fine print - then just hit yes.

you’ll be on par with the founders edition after. if anything else gets tweaked before stable just run the usual:
sudo fwupdmgr refresh
sudo fwupdmgr get-updates
sudo fwupdmgr update

enjoy!

hi guys, i updated my system today, founders edition, but my dgx is rebooting randomly, can somebody help?

Yep. It looks a lot like this:

Throughput tests with 6.17 shows just over 13Gb/s whilst reverting back to kernel 6.11 restores it to 100Gb/s as expected