Max GPU usage 96%? Is this normal?

I assume it’s normal, but wanted to check with people, in programs like nvidia-smi, nvtop etc. I’m finding the GPU on the Spark never seems to hit 100%, from what I’ve seen, mine seems to top out at 96%.

To help show what I mean, right now I’m training a diffusion model, and this is what I see in DGX dashboard, but it’s pretty much the same result no matter where I look, I never seem to get past 96% of GPU utilisation:

I just want to check if this is normal.

Can someone please check and let me know if this is normal?

Over 40 people have viewed this post and it just feels like no one can be bothered to respond, but it would set my mind at ease knowing it’s not a problem with my unit. Just put any heavy AI load on the spark and check DGX Dashboard or nvtop, does it max out at 96% ?

Are you seeing specific performance concerns? If so, please share expected performance kpi vs the kpi you are seeing with your unit.

Gpu utilization alone is not enough context. Typically, gpu sm utilization is helpful in giving “directional” context on if your workload is saturating the gpu. But 96 vs 100% is hard to say.

It’s not a complicated question or something performance related, I just want to know, when others do anything heavy on the system, does the system report 96% max? It’s hard for me to know what exactly is expected performance wise, all I can say is that I’ve never seen my GPU utilisation go higher than 96%, I just want to know if that is normal or not. I get 96% when generating images, 96% when training, 96% when running LLMs. My point is, 96% is what I’m getting when my GPU is fully utilised, and so, it leaves me with the question, why am I getting 96% why not 100% or close to it? I just want someone to check if this is what they’re getting as well.

Hi, based on the context shared, the behavior is normal.

Have you actually checked and confirmed on a real DGX Spark? I don’t want a theoretical answer, I want confirmation from someone who has verified the behaviour on real hardware and can confirm that they observe the same results I’m seeing.

I don’t think I’ve ever seen 100% on any of my Sparks. Not sure if 96% was the limit, but definitely not 100%.

96% seems to be the highest I see here while it’s under load too. Seems a bit odd to me.

I haven’t seen 100 either. I wonder where the remaining 4% went?

Thanks for responding everyone, @eugr also confirmed via PMs that he’s seeing 96% utilisation at full load as well, so it seems like this is ‘normal’, to quote @NVES. It probably warrants investigation by NVIDIA, but I don’t think we should assume we’re missing 4% performance, it’s likely just a bug in how the utilisation is calculated.