Tensor Cores

john.mcinnis · May 1, 2020, 1:19pm

If the Xavier has 64 Tensor cores in addition to the 512 CUDA cores, can we use the Tensor cores to supplement the processing power of the CUDA cores? Like launching a kernel to use both sets of cores to maximize processing power. So far I have only seen support for the Tensor cores with cuDNN and GEMMs in cuBLAS, but it seems wasteful to have another set of cores which go unused most of the time.

Thanks

dkreutz · May 1, 2020, 1:57pm

CUDA and Tensor cores have different capabilities and specializiation - see Tensor Cores: Versatility for HPC & AI | NVIDIA for more details and/or google “cuda vs tensor core”.

john.mcinnis · May 1, 2020, 4:56pm

Thanks for the reply dkreutz,
What I take from this is that the Tensor cores are exclusively used to compute matrix multiplications, which explains their limited uses in cuBLAS/cuDNN. Would this assumption be correct?

dkreutz · May 2, 2020, 1:00am

Yes, Tensor core does matrix operations only but in a highly efficient way.

eyalhir74 · May 3, 2020, 9:36am

I guess the OP question still stands no? If we have to do Matrix mul for cuDNN and it runs fastest and highly efficient on the Tensor cores, why not give 20% (say) of the work to the regular CUDA cores to do this in parallel of the tensor cores?

thanks
Eyal

AastaLLL · May 4, 2020, 3:53am

Hi,

May I know your use case?

In most our SDK, GPU cores is used together with Tensor cores.
For example, TensorRT will choose a resource to use for the best performance.

To check this, you can profile the application with nvprof:

tensor_precision_fu_utilization: The utilization level of the multiprocessor function units that execute tensor core instructions on a scale of 0 to 10 (HMMA)
tensor_int_fu_utilization: The utilization level of the multiprocessor function units that execute tensor core int8 instructions on a scale of 0 to 10. This metric is only available for device with compute capability 7.2. (IMMA)

Thanks.

eyalhir74 · May 4, 2020, 3:58am

Hi,
I guess the question is do they run at the same time, splitting the work between them (Tensor and CUDA cores) to gain maximum performance or is it either one at any one time?
I’ll check the tensor_xxx params as you’ve suggested, thanks!

thanks
Eyal

AastaLLL · May 5, 2020, 6:50am

Hi,

This depends on the implementation.
If you trigger both function call in cuDNN, and then they can run the same time.

There is no limitation to concurrent launch the Tensor Core and GPU core.

Thanks.

Topic		Replies	Views
GPU cuda cores or Tensor cores Jetson AGX Xavier cuda	2	946	October 18, 2021
Tensor cores and CUDA cores work in parallel Video Processing & Optical Flow cuda	2	192	July 10, 2024
Is it possible to use cuda core and tensorcore concurrently ? Deep Learning (Training & Inference) mixed-precision	0	1622	October 13, 2019
Programming Tensor Cores in CUDA 9 Technical Blog	14	1095	November 28, 2022
Nsight Profile of NVIDIA/CUDALibrarySamples/cuTENSOR. Does it use CUDA Programming and Performance	4	516	November 22, 2022
CUDA cores vs Tensor Cores Jetson AGX Xavier cuda , nvbugs	16	4764	October 18, 2021
Parallel execution on tensor cores and cuda cores on the same SM Jetson AGX Xavier tensorrt	4	1213	October 18, 2021
How to confirm whether Tensor Core is working or not. Jetson AGX Xavier	8	10930	October 18, 2021
Run Parallel Tensor Cores GEMM and Cuda GEMM GPU-Accelerated Libraries cuda , cublas	9	2496	August 14, 2022
Struggling to Optimize Kernel with Tensor Cores for Dot Products CUDA Programming and Performance	4	83	December 29, 2024

Tensor Cores

Related topics