How to confirm whether Tensor Core is working or not.

Kaka_m · February 8, 2019, 12:10am

Hi Community member,

Please let me confirm the following question.

Would you please teach me the way to confirm whether the Tensor core is working or not.

Best regards.
Kaka

dusty_nv · February 8, 2019, 7:17pm

Hi Kaka, please refer to this topic: [url]https://devtalk.nvidia.com/default/topic/1047176/jetson-agx-xavier/tenson-core-sample[/url]

Kaka_m · February 11, 2019, 11:00pm

Hi Dusty,

Thank you for your response.
But I could not find the answer for my question.

Again, how should we confirm whether the TensorCore is working or not?
Do you have any tools/status to check it?

Best regards.
Kaka

dusty_nv · February 12, 2019, 4:22pm

Hi Kaka, that sample runs using the Tensor Core HMMA operations, it wouldn’t run if Tensor Cores weren’t working properly for some reason (not that it should occur).

Kaka_m · February 12, 2019, 11:30pm

Hi Dusty,

Thank you for your support. I understood that if used the this sample code, the TensorCore will work.
But I would like to confirm whether the TensorCore is working or not in the case of making the original code and
Do you know the way to confirm it?

Best regards.
Kaka

dusty_nv · February 13, 2019, 6:27pm

Thanks Kaka, I understand now. You can use the nvprof CUDA profiler tool to capture the Tensor Core usage while your application runs. nvprof supports two metrics for Tensor Core utilization:

tensor_precision_fu_utilization: The utilization level of the multiprocessor function units that execute floating-point tensor core instructions on a scale of 0 to 10
tensor_int_fu_utilization: The utilization level of the multiprocessor function units that execute int8 tensor core instructions on a scale of 0 to 10

Here is an example output of running it on the HMMA cudaTensorCoreGemm sample:

$ sudo /usr/local/cuda/bin/nvprof --kernels compute_gemm --metrics tensor_precision_fu_utilization,tensor_int_fu_utilization ./cudaTensorCoreGemm
Initializing...
==24384== NVPROF is profiling process 24384, command: ./cudaTensorCoreGemm
GPU Device 0: "Xavier" with compute capability 7.2

M: 4096 (16 x 256)
N: 4096 (16 x 256)
K: 4096 (16 x 256)
Preparing data for GPU...
Required shared memory size: 64 Kb
Computing... using high performance kernel compute_gemm 
==24384== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
Replaying kernel "compute_gemm(__half const *, __half const *, float const *, float*, float, float)" (done)
Time: 1086.695679 msvents
TFLOPS: 0.13
==24384== Profiling application: ./cudaTensorCoreGemm
==24384== Profiling result:
==24384== Metric result:
Invocations                               Metric Name                           Metric Description         Min         Max         Avg
Device "Xavier (0)"
    Kernel: compute_gemm(__half const *, __half const *, float const *, float*, float, float)
          1           tensor_precision_fu_utilization   Tensor-Precision Function Unit Utilization     Mid (5)     Mid (5)     Mid (5)
          1                 tensor_int_fu_utilization         Tensor-Int Function Unit Utilization    Idle (0)    Idle (0)    Idle (0)

Note that in this example, tensor_int_fu_utilization metric is shown as idle, because the sample uses HMMA FP16 operations and not IMMA INT8.

Kaka_m · February 14, 2019, 1:31am

Hi Dusty,

Thank you for your response and information. I got it!

Also we would like to confirm the TensorCore performance. Do you know any sample codes which we can set as enabling or disabling in order to compare it?

Best regards.
Kaka

dusty_nv · February 14, 2019, 4:39pm

If you are using TensorRT, TensorRT will automatically enable Tensor Cores, so they can’t be disabled other than by not using FP16 / FP32.

If you are using cuDNN directly, you can choose not to have layers executed on the Tensor Cores by not specifying CUDNN_TENSOR_OP_MATH: [url]http://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#tensor_ops[/url]

If you are using TensorFlow, it looks like there is a setting to disable Tensor Core usage in cuDNN: [url]https://docs.nvidia.com/deeplearning/dgx/tensorflow-user-guide/index.html#tf_disable_tensor_op_math[/url]

For cuBLAS, Tensor Cores are used through cublasGemmEx(), so use normal cublasGemm() function if you don’t want to use the Tensor Cores.

Topic		Replies	Views
How to confirm Tensor Core is working or not in CuSPARSE GPU-Accelerated Libraries cuda	4	873	May 12, 2023
GPU cuda cores or Tensor cores Jetson AGX Xavier cuda	2	946	October 18, 2021
Tensor Cores Jetson AGX Xavier	8	1314	October 18, 2021
tensor core sample Jetson AGX Xavier	3	3207	October 18, 2021
TensorRT 7 conv3d is not running on Tensor Cores Jetson Xavier NX tensorrt	16	1514	December 1, 2021
Is there a way to see if CUDA API execution happened on Tensor Cores or not? CUDA Programming and Performance	4	929	September 18, 2018
How to confirm whether Tensor Core is working or not in Windows ML (Windows UWP) Deep Learning (Training & Inference) mixed-precision	1	1037	November 25, 2019
How to measure Tensor core utilization using NVIDIA profiling tools such as Nsight System, DLProf, nvprof etc TensorRT cudnn	4	1571	January 31, 2024
Jetson AGX Xavier DDR Test Jetson AGX Xavier performance	16	1726	October 18, 2021
How can I prevent my customized CUDA kernel function from using tensor cores on a Jetson Orin device? Jetson AGX Orin cuda , kernel	19	1000	February 5, 2024

How to confirm whether Tensor Core is working or not.

Related topics