Run Parallel Tensor Cores GEMM and Cuda GEMM

uniadam · June 30, 2021, 10:26pm

Hi,

Is it possible to run Tensor Cores GEMM and Cuda GEMM in parallel? e.g. in two diferent stream run parallely Tensor Cores GEMM and Cuda GEMM.

Do we have a shared hardware here or Cuda core is seprate from Tensor Cores?

mnicely · July 1, 2021, 12:33pm

You can certainly try and if resources are available, two kernels could execute in parallel.
It is true Tensor Cores are separate hardware from the ALUs, used for CUDA GEMMS.
There are other resources to consider, mainly registers and shared memory. I expect one of these to be your limiting factor.

uniadam · July 1, 2021, 7:18pm

Thanks for reply. Do we have any control over it to limit the resource that used by Tensor Cores? I mean forcing them to do not try to be occupied at maximum (maybe changing some vriable inside of Cuda).

mnicely · July 1, 2021, 7:21pm

I’m not following… Tensor Cores are a resource by themselves.

Data is loaded to Tensors Cores from registers in threads.

uniadam · July 5, 2021, 11:41pm

I tryed to disable TensorCore to see if is it possible or not.

Based on bellow documentation i write this line to change the type of computation to normal mode.

> cublasSetMathMode(handle, CUBLAS_DEFAULT_MATH);

But after checking by nsys I am seeing

cutlass::kernel<cutlass_80_tensorop_d884gemm_64x32_16x4_nn_align1>

which means that tensor core is used.

mnicely · July 6, 2021, 12:45pm

Depending on the data type, the default math mode may be to use Tensor Cores. What precision/compute type are you using?

uniadam · July 6, 2021, 4:02pm

I am using DGEMM so the datatype is double for all inputs and output. I hope to disable tensor core for this stream and perform my normal cuda core operation. I am using A100 GPU.

mnicely · July 6, 2021, 6:03pm

Did some more digging. FP64 Tensor core can’t be disabled with cuBLAS. They are IEEE complaint, so there’s really no need.
For your scenario of Tensor core and CUDA core in parallel, they can’t be issued in parallel. They do share resources, I wasn’t aware of that early. Sorry for leading you down a rabbit hole.

uniadam · July 6, 2021, 7:05pm

Thanks for your information, in this picture from profiling of magma_dpotrf we can see that maybe two different type of kernel for GEMM are in use (maybe this is a problem of warper or nsys). But actualy for large dimention it is happening.

plutohk99 · August 14, 2022, 9:36am

Excuse me, do you have any different views on this issue now?

Topic		Replies	Views
Multiple Streams on Tensor Cores CUDA Programming and Performance	4	652	February 14, 2019
Is it possible to use cuda core and tensorcore concurrently ? Deep Learning (Training & Inference) mixed-precision	0	1629	October 13, 2019
cublasGemmEx is a Tensor Core operation or CUDA core? GPU-Accelerated Libraries cublas	3	948	October 3, 2021
Parallel execution on tensor cores and cuda cores on the same SM Jetson AGX Xavier tensorrt	4	1214	October 18, 2021
How to use cuda core and tensor core simultaneously？ GPU-Accelerated Libraries cuda	4	621	August 16, 2022
Tensor cores and CUDA cores work in parallel Video Processing & Optical Flow cuda	2	200	July 10, 2024
Tensor Cores Jetson AGX Xavier	8	1321	October 18, 2021
Benchmark result with vs. without tensor core GPU-Accelerated Libraries	7	58	February 15, 2025
Disable Tensor Cores in cuBLAS functions explicity GPU-Accelerated Libraries cublas	4	2227	January 28, 2022
Is CUBLAS_GEMM_DEFAULT_TENSOR_OP in cublasGemmEX no longer supported? GPU-Accelerated Libraries cublas , cutensor	3	1298	September 6, 2023

Run Parallel Tensor Cores GEMM and Cuda GEMM

Related topics