NVTOP with DGX Spark unified memory support

3.3.2 is broken, that was the current version when I posted that it’s been broken since 3.3.1.

It’s currently being looked at by parallelArchitect, but right now it’s still an ongoing issue, which you can track here: DGX Spark support worked in 3.3.0, has been broken since 3.3.1 · Issue #449 · Syllo/nvtop · GitHub

If you just need to see the numbers (or have a telemetry end-point to get them off of your box) - GitHub - wentbackward/nv-monitor: Lightweight nVidia telemetry and terminal system monitor - built for any architecture - Jetson, GB10, GB200, H100 · GitHub

It doesn’t have top like features though

Just based on that screenshot, it doesn’t seem to show VRAM total and used like nvtop does, which are the most useful aspects of it. i.e. if you are using 15GB of system ram, your total VRAM is 107GB, of which 90GB of VRAM is used. So it’s not a replacement for nvtop, as it’s literally missing the most important feature.

My recommendation for people is to download and build the version of nvtop that works, as described here: NVTOP with DGX Spark unified memory support - #19 by RazielAU

nvtop never worked for me on the spark, so I’m not sure what that looks like - are you saying, of the unified memory total, how much is allocated as VRAM?

Or you are looking for this view?

Can we stay on topic, this post is about running nvtop on DGX Spark, can you please not use this as an opportunity to plug your personal project? Between you advertising something different, and the other guy recommending that people run a broken version of nvtop, it’s adding a lot of unneccesary noise to an otherwise useful forum post.

There shouldn’t be any reason you can’t run nvtop on the Spark, lots of people have done it, and the first post here has pre-compiled binaries attached, so between the options of running the attached pre-compiled binaries, or following the instructions I linked to, to compile it yourself, everyone should be able to get the correct version running on the Spark.

My recommendation is to compile it yourself though as it’s pretty simple to do so: NVTOP with DGX Spark unified memory support - #19 by RazielAU

Perhaps you could explain in greater detail - for the benefit of us uninformed folks - what practical utility this “most important feature” on a system with unified RAM actually has.

For me, interpreting these different metrics is actually somewhat confusing.

Again, I’m trying to minimise the noise on a topic that’s purely supposed to be around getting nvtop running on the Spark.

So firstly, nvtop is not intended to monitor system RAM, while it has a basic process view at the bottom that will tell you the system ram used by processes, it’s specifically designed to monitor GPU resources, i.e. total VRAM + used VRAM. On a unified memory system, your VRAM is what is actually available for the GPU to allocate.

It is good to know how much VRAM you actually have available to the GPU as this is not always obvious. If a process is using a lot of system ram, it reduces the VRAM you have available. As a simple example, if we consider this screenshot below:

This is a training process running in Ostris’ AI toolkit. Now from this, I can see I only have about 64GB of VRAM in total. From this, I can immediately tell there is a huge amount of system ram allocated, but I wouldn’t expect the training process to use this much system ram, which likely indicates that there is a software bug or a misconfigured setting that needs to be fixed, perhaps a model is getting cached in system ram when it doesn’t need to be, or something that is currently running in system ram on the CPU should really be running on the GPU instead.

If all you have is the total unified memory, and amount of unified memory used, you’re completely flying blind. From that, all you can tell is that the process is using about ~90GB. It all looks fine from the outside, when in reality, there’s clearly a problem as you would expect the system ram usage to be closer to 3GB, instead we’re seeing usage of over 40GB. By knowing our available VRAM is 64GB, we can tell at a glance that there’s a problem.

In short: it’s just one additional data point that’s extremely useful, since without it, you would have no visibility on an issue like the one I mentioned above.

Some tips for you based on your screenshot, learn about tmux, it will make it so you don’t need to have all those terminal windows open, it also allows you to keep processes running even if the ssh session dies, and allows you to reconnect to that session, including all its windows and stuff. I often have nvtop and htop running at the same time which gives me all the information I need, though I do find myself using nvtop 99% of the time as it tells me what I need to know from a GPU perspective.

As for all the different metrics being confusing, it shouldn’t be confusing, basically, you have 128GB of RAM in total, the system seems to allocate part of that to something, perhaps related to iGPU functionality, but that’s just a guess. The rest is available to the Linux system, software running on it, and the GPU. What’s not being used by the OS and all the other software is available as VRAM that can be allocated by the GPU. So different tools will give you different numbers depending on what exactly it’s showing you. As mentioned, the main tools I suggest using is htop and nvtop, I don’t personally use Nvidia’s web-based monitoring, as I’m not convinced it gives an accurate view of the usage to the degree that the other tools I mentioned do.

I use btop, and I’m completely satisfied with it.

I’m really sorry but I find it to be quite rude, trying to trivialize or shut down the contributions of people who are either a. genuinely offering help or b. trying to understand and learn.

Many threads on this forum start on one topic and the conversation goes off into other ‘related’ topics and I generally find these conversations to be enlightening. Talking about a tool specifically because it’s missing some information, to then talking about that information doesn’t seem off topic whatsoever to me - the title of the topic literally says ‘unified memory’ not ‘build this software’.

Thank you for keeping the forums civil and for making people feel welcomed to contribute.

I was trying to keep the discussion focused on the original topic: “NVTOP with DGX Spark unified memory support.” Your replies aren’t about nvtop and have caused the thread to drift off-topic, which is what I was trying to prevent. Instead of having easy access to the build tutorial and precompiled binaries, people now have to sift through several unrelated posts to find the information they were looking for.