Is CUBLAS_GEMM_DEFAULT_TENSOR_OP in cublasGemmEX no longer supported?

solahn00 · September 6, 2023, 1:38pm

Hi, I am using the Nvidia Jetson Orin Developer Kit 64GB (Jetpack 5.0.2).

I was trying to use cublasGemmEX to run gemm operations using only Tensor Core.

My question is as follows.

is it correct that I can execute gemm operation with only Tensor Core by using the below function? I will also leave a link to the source of the function. (https://github.com/NVIDIA-developer-blog/code-samples/blob/master/posts/tensor-cores/simpleTensorCoreGEMM.cu)

cublasErrCheck(cublasGemmEx(cublasHandle, CUBLAS_OP_N, CUBLAS_OP_N,
                matrix_m, matrix_n, matrix_k,
                &alpha,
                a_fp32, CUDA_R_32F, MATRIX_M,
                b_fp32, CUDA_R_32F, MATRIX_K,
                &beta,
                c_cublas, CUDA_R_32F, MATRIX_M,
                cuda_r_32f, cublas_gemm_default_tensor_op));

I checked the documentation and saw that CUBLAS_GEMM_DEFAULT_TENSOR_OP is no longer supported, is that correct? If so, is there any way to use a similar function? Using other BLAS library or CUBLAS library’s other functions…
(1. Introduction — cublas 12.2 documentation)
While researching to execute gemm operations using only Tensor Core, I heard that I can use gemm from a library called cuTENSOR. Is it possible to use TensorCore Contraction to execute gemm operations using only Tensor Core?

Are there any libraries or functions for gemm operations that run on Tensor Core alone provided by Nvidia?

mnicely · September 6, 2023, 2:35pm

Just use CUBLAS_GEMM_DEFAULT. Heuristics will chose the fastest implementation, whether it’s tensor cores or not.

Same with cuTENSOR, the library will chose the best implementation.

Tensor cores are generally used by default, if a kernel exist.

solahn00 · September 6, 2023, 5:53pm

What I want is to proceed with Gemm using only Tensor Core without using CUDA core.

If I proceed with the option you told me (CUBLAS_GEMM_DEFAULT), I don’t think I’m running the gemm operation using Tensor Core only, is that right?

Then can you tell me the option to proceed with Gemm using Tensor Core only? (Without CUDA core)

mnicely · September 6, 2023, 6:01pm

GEMM kernels are either Tensor core accelerated or SIMT (using CUDA cores). An example of a SIMT kernels would be FP64 GEMM on Pascal. TC kernels are usually faster than SIMT kernels for the same hardware. The older flag, CUBLAS_GEMM_DEFAULT_TENSOR_OP was for a time when Tensor Core path wasn’t default. It is today. So just use CUBLAS_GEMM_DEFAULT and let heuristics chose the best kernel. If you don’t think it is, you can manually benchmark each algorithm.

Topic		Replies	Views
Benchmark result with vs. without tensor core GPU-Accelerated Libraries	7	202	February 15, 2025
cublasGemmEx is a Tensor Core operation or CUDA core? GPU-Accelerated Libraries cublas	3	1095	October 3, 2021
Does CUBLAS SGEMM work with tensor cores yet? GPU-Accelerated Libraries	3	1231	February 26, 2020
Disable Tensor Cores in cuBLAS functions explicity GPU-Accelerated Libraries cublas	4	2370	January 28, 2022
How to enable Tensor core for cublasSgemmBatched on H100? GPU-Accelerated Libraries cuda , kernel , cublas , cutlass	5	1062	November 17, 2023
Questions about algo paramter in cublasGemmEx, is there any detailed description? GPU-Accelerated Libraries cublas	1	483	August 29, 2023
Ensuring the execution of GEMM done by Tensor Core GPU-Accelerated Libraries	1	471	August 19, 2022
Run Parallel Tensor Cores GEMM and Cuda GEMM GPU-Accelerated Libraries cuda , cublas	9	2648	August 14, 2022
Tensor core boiler plate with cublas, can not compile GPU-Accelerated Libraries cudnn	3	78	February 18, 2025
cuBLAS convolution does not use Tensor Cores GPU-Accelerated Libraries cublas	6	2357	June 8, 2021

Is CUBLAS_GEMM_DEFAULT_TENSOR_OP in cublasGemmEX no longer supported?

Related topics