How to Get the Exact Amount of Resources the GPU Uses at the Moment (e.g., Used Tensor Cores) Regardless of the Running Process

rsmrsmeeman · January 9, 2025, 12:29pm

As the Nsight tool relies on defining a process or executable to attach to, and it only shows resources allocated to a specific process, it seems like there is no hardware feature from Nvidia to monitor totally what resources are being used in the GPU at the moment. So, Nsight acts as a software layer that attaches to a process and records what it does and what it will allocate.
Isn’t it? If not, how to achieve this? Thanks.
EDIT: The key to this question is whether Nvidia offers any hardware-accurated feature that reports the exact amount of resource usage (e.g., used tensor cores) or not?

Curefab · January 9, 2025, 6:25pm

There is Nsight Compute, which is more process-centric, and Nsight Systems, collecting more information about the whole system.

rsmrsmeeman · January 10, 2025, 1:22pm

Thanks for your answer. However, even with Nsight Systems, there is no feature provided to get the exact amount of resources the GPU uses at the moment (e.g., used tensor cores) without specifying a process or executable, regardless of whether this feature is a GPU hardware feature or a software-implemented feature.

Curefab · January 10, 2025, 4:35pm

Not a perfect solution: You could create a background program using the debugger API and profiler API (Debugger API :: CUDA Toolkit Documentation and NVIDIA CUDA Profiling Tools Interface (CUPTI) - CUDA Toolkit | NVIDIA Developer), this background program attaching to processes automatically.

Another way could be to write a program, which in the background tries to be active at the same time as other Cuda programs, use not many resources (4 warps per SM) and demanding access to the Tensor Core in regular intervals and measure the speed. It would not work well, if another process needs the whole SM (with the maximum number of threads), however.

rsmrsmeeman · January 11, 2025, 7:18pm

Thanks for your answer , So, there is no software way implemented from NVIDIA to get access to accurate performance metrics (e.g., tensor cores in use) regardless of the running process. As PTX/SASS doesn’t have access to GPU (Performance Monitoring Unit)PMUs(?), and it being managed by the driver itself, like context switching(?), it seems that recording performance metrics will be possible regardless of the running process, but it’s not implemented at the driver level or in API libraries.(?)

Curefab · January 13, 2025, 2:42pm

I have no knowledge about such a possibility.

SASS has access to some performance monitoring features, but not in a documented way, and more for creating monitored events, e.g. to increase some performance counters, whenever those SASS instructions are executed.

Topic		Replies	Views
Core by core performance CUDA-GDB	1	649	July 14, 2021
How can I get the utilization of cuda core and tensor core respectively? Profiling Linux Targets	5	3740	January 10, 2023
Any hardware performance counters for number of cores/SMs occupied? CUDA Programming and Performance	2	1185	January 20, 2020
Nsight Compute to measure metrics data Nsight Compute	1	577	January 29, 2021
Is there a tool to monitor the real time usage of the SM or the cores inside SM CUDA Programming and Performance	1	658	October 23, 2013
Get tensor core usage through nvml System Management and Monitoring (NVML)	4	2371	December 17, 2022
Tensor core metrics not showing up in NSight? Profiling Linux Targets pytorch	9	3606	May 18, 2024
"tools" to monitor Tensor core usage System Management and Monitoring (NVML)	1	2322	December 19, 2022
showing gpu utlization per process CUDA Programming and Performance	5	2273	October 12, 2018
GPU metrics not working in Nsight System / Compute Profiling Linux Targets cuda , nsight , performance-metrics	2	883	December 12, 2023

How to Get the Exact Amount of Resources the GPU Uses at the Moment (e.g., Used Tensor Cores) Regardless of the Running Process

Related topics