Quick question about nsys GPU metrics for SM Warp Occupancy.
I’ve noticed that “compute warps in flight” and “unallocated warps in Active SMs” don’t always sum to 100%. Why does this occur? Does it indicate the GPU is waiting for data (e.g., memory latency) during computation?
If you look through the info in User Guide — nsight-systems 2025.3 documentation (direct link to the GPU metrics section of the docs, ignore the link text) you’ll see some suggestions of things that could cause this.
Thank you for your explanation, and it helped clarify my understanding.
A Follow-up Question: Is it possible to view all GPU metrics here directly in the nsys GUI panel?
I’ve seen SQL query examples for retrieving these metrics, but I’m unsure if they’re all accessible via the GUI tooltips. Having absolute values for all these metrics would also be helpful for profiling. Thanks in advance for your guidance!
@hwilper
Apologies if I am too pedantic, but I’m hoping to clarify.
Is it possible to enable the GPU metric “Idle SM Unused Warp Slots” in the GUI? Because it is not shown as we see, so it is a contradiction(?)
Thanks for replying! I’m using an A100 GPU.
Could you elaborate on the *-gfxt metric set? I’m not familiar with it, so if you have a link to some documentation, that would be really helpful.
Do you mean this: `nsys profile --gpu-metrics-devices=0 --gpu-metrics-set=tu10x-gfxt`
-gfxt sets are graphics throughput metric sets which are meant for consumer GPUs, we don’t have those for compute GPUs. My first guess was that you have a consumer GPU.
So for A100 we don’t have a readily available metric set that has this metric you need. You’d need a custom metric set for that. I’ll check what can be done and let you know later today