How to confirm whether Tensor Core is working or not.

dusty_nv · February 13, 2019, 6:27pm

Thanks Kaka, I understand now. You can use the nvprof CUDA profiler tool to capture the Tensor Core usage while your application runs. nvprof supports two metrics for Tensor Core utilization:

tensor_precision_fu_utilization: The utilization level of the multiprocessor function units that execute floating-point tensor core instructions on a scale of 0 to 10
tensor_int_fu_utilization: The utilization level of the multiprocessor function units that execute int8 tensor core instructions on a scale of 0 to 10

Here is an example output of running it on the HMMA cudaTensorCoreGemm sample:

$ sudo /usr/local/cuda/bin/nvprof --kernels compute_gemm --metrics tensor_precision_fu_utilization,tensor_int_fu_utilization ./cudaTensorCoreGemm
Initializing...
==24384== NVPROF is profiling process 24384, command: ./cudaTensorCoreGemm
GPU Device 0: "Xavier" with compute capability 7.2

M: 4096 (16 x 256)
N: 4096 (16 x 256)
K: 4096 (16 x 256)
Preparing data for GPU...
Required shared memory size: 64 Kb
Computing... using high performance kernel compute_gemm 
==24384== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
Replaying kernel "compute_gemm(__half const *, __half const *, float const *, float*, float, float)" (done)
Time: 1086.695679 msvents
TFLOPS: 0.13
==24384== Profiling application: ./cudaTensorCoreGemm
==24384== Profiling result:
==24384== Metric result:
Invocations                               Metric Name                           Metric Description         Min         Max         Avg
Device "Xavier (0)"
    Kernel: compute_gemm(__half const *, __half const *, float const *, float*, float, float)
          1           tensor_precision_fu_utilization   Tensor-Precision Function Unit Utilization     Mid (5)     Mid (5)     Mid (5)
          1                 tensor_int_fu_utilization         Tensor-Int Function Unit Utilization    Idle (0)    Idle (0)    Idle (0)

Note that in this example, tensor_int_fu_utilization metric is shown as idle, because the sample uses HMMA FP16 operations and not IMMA INT8.

Topic		Replies	Views
How to measure Tensor core utilization using NVIDIA profiling tools such as Nsight System, DLProf, nvprof etc TensorRT cudnn	4	1995	January 31, 2024
Am I using Tensor Core? CUDA Programming and Performance	3	809	August 4, 2021
Nsight Profile of NVIDIA/CUDALibrarySamples/cuTENSOR. Does it use CUDA Programming and Performance	4	587	November 22, 2022
Is there a way to see if CUDA API execution happened on Tensor Cores or not? CUDA Programming and Performance	4	1005	September 18, 2018
How to confirm Tensor Core is working or not in CuSPARSE GPU-Accelerated Libraries cuda	4	1006	May 12, 2023
nvprof seems to make inference slower, no tensor cores being used Jetson AGX Xavier	4	1042	October 18, 2021
How can I prevent my customized CUDA kernel function from using tensor cores on a Jetson Orin device? Jetson AGX Orin cuda , kernel	19	1230	February 5, 2024
Calculating utilization (core load) of Tensor core and cuda core seperately CUPTI – CUDA Profiler Tools Interface	1	1695	January 15, 2021
How to make tensor cores work? Frameworks (archived) cuda , pytorch	2	1021	May 18, 2023
Tensorcore identification while running inference Jetson AGX Orin tensorflow	2	481	December 20, 2022

How to confirm whether Tensor Core is working or not.

Related topics