What is the RAM value shown on the DGX Dashboard? It doesn't seem to match `free`

I created a small replacement for the built-in DGX Dashboard because I wanted to be able to access it without having to tunnel a port, I wanted to see RAM usage without cache/buffers, and I wanted some additional stats (like temps and CPU usage).

However when testing it after a reboot, I noticed a big diffence in the memory reported by each well beyond just what is in caches. My dashboard showed around 3GB while the built-in one showed 13GB!

I’m getting my numbers from free and it seems like these don’t match what’s shown in the built-in dashboard. Here’s a screenshot:

The dashboard shows 13GB used, but the output of free shows 3.8Gi used, 2.0Gi buffers/cache.

While there is some difference with GB/Gi, these numbers still don’t seem to add up. My guess is that the dashboard could be doing 128GB minus 115Gi (“available”) to get 13, but if these are not the same units then I don’t think this is sound. It should do 119-115 to get 4Gi used (and convert that to GB if required)?

3 Likes

The DGX Dashboard takes info from /proc/meminfo

@aniculescu are you able to share which values are read and how they are used? I’m also reading from there, but your numbers don’t seem to match.

For example, right now the built-in dashboard is showing 74.53 GB / 128 GB (which is 58.2%), and the output of cat /proc/meminfo | grep Mem is:

MemTotal: 125513944 kB
MemFree: 56076940 kB
MemAvailable: 121519624 kB

But 56076940/125513944 is 44.7%. The numbers in /proc/meminfo are in kiB (despite the confusing label) as can be inferred from MemTotal being 125x and not 128x. My suspicion was that perhaps MemFree is being treated as if it was kB resulting in the wrong amount of free memory, and the total is hard-coded as 128 (not also converted from MemTotal which would result in the correct % but the total shown as 125GB).

Clearly there’s a bug in the Dashboard calculation.

1 Like

Spark has UMA (Unified Memory Architecture) so /proc/meminfo is not that useful. This file has info of what the kernel knows about the main memory.

@aniculescu is the dashboard widget readings from cudaMemGetInfo()?

Can you elaborate on this? In what way would they not be accurate? I’m a bit of a noob and curious (on the surface, the numbers in this file seem reasonable for what I’m expecting).

We just released a software update today that addresses this issue, please make sure to update your systems

This looks better, but still doesn’t seem correct. Here’s my output of free -h --si and free -h and you can see that the actual usage is 3.6GB/128GB or 3.3Gi/119Gi, but the dashboard is showing 3.3/128. It seems that it’s still mixing up the units (showing the used memory in Gibibytes but the total in Gigabytes).

2 Likes