I want to monitor GPU activity from application perspective. Is there any object “below” PID that can be monitored to get SM utilization? For example: if a LLM is executing inferences, can I monitor each inference activity? or if there are parallel threads within a process, one running on GPU0 and other on GPU1, GPU2, GPU3 can I get separate util metrics for these 2 threads within a process? or all metrics are available at PID level only?
Hi, @avdhoot.joshi
This is forum support for developer tools - cuda gdb.
From your description, I think maybe Nsight Systems | NVIDIA Developer can meet your requirement. Please check.
Thanks @veraj will post in other forum.
Greetings,
Metrics, such as SMs Active, SM Instructions, and Warp Occupancy are not at the pid level, but at the device level. Even if the GPU was quiescent with respect to the application being profiled, SM utilization would still be captured if other applications were utilizing the GPU at that time. You can observe this by running:
sudo nsys profile --gpu-metrics-device=all
And then running a GPU application in another terminal.
Now. with that said, you can observe kernel launches and execution on a per-thread basis. Clicking on a kernel on the GPU timeline should highlight the correlated activity on the CPU thread timeline to tell you where it is coming from. If your kernels are sufficiently long (such that your frequency of metrics collection is able to get enough samples during the kernel) then you can empirically correlate the SM metric value to that kernel.
Thanks @mhallock for the detailed reply! Will check further!