cublasGemmEx is a Tensor Core operation or CUDA core?

uniadam · October 1, 2021, 2:16pm

Hi,

I was considering cublasGemmEx as a Tesor Core operation and based on that I was thinking that I can execute other operation with CUDA core. But it seems that is not possible to run all the time concurently.

So here my question is:
cublasGemmEx is a Tensor Core operation or CUDA core ? If it is a Tensor Core operation why I can not use all resources of CUDA core?

Also for Ampere machine I am seeing that inside of nsys the name of the kernel for double Tensor Core GEMM is cutlass kernel. How it is possible? I was thinking that it is a cuBlas kernel.

Best regards,
Nima

mnicely · October 3, 2021, 12:32pm

First, there is not simply Tensor core and CUDA cores. There are other resources that are shared among the on-chip resources, when kernels are called. Just because your primary compute is done on a Tensor core doesn’t mean there’s enough resources to prep data for CUDA core usage. Tensor cores are used to accelerate GEMMs.

What you’re seeing in Nsys is a cuBLAS kernel that has been build with CUTLASS. It does not affect your program. It simply means someone determined the CUTLASS implementation was optimal and then chose it to be distributed with cuBLAS for that release.

uniadam · October 3, 2021, 12:52pm

Thanks for explanation. Do we have any documents that clearly represent the shared resources?

mnicely · October 3, 2021, 1:06pm

Not at that level. Just know this, if resources were available to allow parallel work. The hardware scheduler would handle it for you.

Topic		Replies	Views
Run Parallel Tensor Cores GEMM and Cuda GEMM GPU-Accelerated Libraries cuda , cublas	9	2503	August 14, 2022
Tensor Core utilization in cuDSS GPU-Accelerated Libraries cublas , cudss	1	27	March 12, 2025
Is CUBLAS_GEMM_DEFAULT_TENSOR_OP in cublasGemmEX no longer supported? GPU-Accelerated Libraries cublas , cutensor	3	1276	September 6, 2023
Is it possible to use cuda core and tensorcore concurrently ? Deep Learning (Training & Inference) mixed-precision	0	1622	October 13, 2019
cuBLAS vs CUDA kernels Performance GPU-Accelerated Libraries	1	1278	September 14, 2020
Multiple Streams on Tensor Cores CUDA Programming and Performance	4	652	February 14, 2019
Ensuring the execution of GEMM done by Tensor Core GPU-Accelerated Libraries	1	421	August 19, 2022
cuTensor contraction ~5X slower than equivalent CuBLAS sgemm? GPU-Accelerated Libraries	0	1013	August 30, 2020
Does CUBLAS SGEMM work with tensor cores yet? GPU-Accelerated Libraries	3	1107	February 26, 2020
Struggling to Optimize Kernel with Tensor Cores for Dot Products CUDA Programming and Performance	4	85	December 29, 2024

cublasGemmEx is a Tensor Core operation or CUDA core?

Related topics