Is there a way to see if CUDA API execution happened on Tensor Cores or not?

u39kun · December 20, 2017, 8:55pm

I’m trying to identify bottlenecks in GPU execution performance for deep learning models on Titan V / V100.
I understand that certain requirements must be met for the underlying kernel execution to be performed on Tensor Cores based on https://devblogs.nvidia.com/parallelforall/programming-tensor-cores-cuda-9/

“nvprof” provides an easy way to dump and aggregate API execution stats on GPU, but the list of API calls made does not seem to say whether the execution happened on Tensor Cores or not.
Is there a way to get this info through nvprof or some other way?

njuffa · December 20, 2017, 9:06pm

If the profiler currently does not (or cannot) split out Tensor Core utilization, it might be a good idea to file an enhancement request with NVIDIA to add that feature. Enhancements requests can be filed using the bug reporting form reached via the registered developer website; simply prefix the synopsis with “RFE:” to mark it as an enhancement request, rather than a functional bug.

u39kun · December 20, 2017, 9:44pm

Thanks for the tip, @njuffa.

sergio.negri · September 17, 2018, 1:58pm

Are there any news on the subject? Is there any way to understand if the load on the GPU is taken care of by CUDA cores or Tensor cores (and in which proportion)?

cbuchner1 · September 18, 2018, 9:48am

Scanning cuobjdump / nvdisasm output for HMMA instructions may give you an idea what fraction of the code makes use of the Tensor cores. Combine this with a benchmark run that samples GPU runtime per line of code to get an idea of how much time is spent in the code.

Topic		Replies	Views
How to confirm whether Tensor Core is working or not. Jetson AGX Xavier	8	11421	October 18, 2021
Nsight Profile of NVIDIA/CUDALibrarySamples/cuTENSOR. Does it use CUDA Programming and Performance	4	588	November 22, 2022
How to measure Tensor core utilization using NVIDIA profiling tools such as Nsight System, DLProf, nvprof etc TensorRT cudnn	4	2019	January 31, 2024
Am I using Tensor Core? CUDA Programming and Performance	3	809	August 4, 2021
Concurrent execution of CUDA and Tensor cores CUDA Programming and Performance	34	9375	November 3, 2024
Calculating utilization (core load) of Tensor core and cuda core seperately CUPTI – CUDA Profiler Tools Interface	1	1697	January 15, 2021
"tools" to monitor Tensor core usage System Management and Monitoring (NVML)	1	2309	December 19, 2022
How can I get the utilization of cuda core and tensor core respectively? Profiling Linux Targets	5	3723	January 10, 2023
nvprof seems to make inference slower, no tensor cores being used Jetson AGX Xavier	4	1042	October 18, 2021
Can you use nsight to see tensor core occupancy? Nsight Compute cudnn	4	1234	March 23, 2024

Is there a way to see if CUDA API execution happened on Tensor Cores or not?

Related topics