Hi, I am using the Nvidia Jetson Orin Developer Kit 64GB (Jetpack 5.0.2).
I was trying to use cublasGemmEX to run gemm operations using only Tensor Core.
My question is as follows.
- is it correct that I can execute gemm operation with only Tensor Core by using the below function? I will also leave a link to the source of the function. (https://github.com/NVIDIA-developer-blog/code-samples/blob/master/posts/tensor-cores/simpleTensorCoreGEMM.cu)
cublasErrCheck(cublasGemmEx(cublasHandle, CUBLAS_OP_N, CUBLAS_OP_N, matrix_m, matrix_n, matrix_k, &alpha, a_fp32, CUDA_R_32F, MATRIX_M, b_fp32, CUDA_R_32F, MATRIX_K, &beta, c_cublas, CUDA_R_32F, MATRIX_M, cuda_r_32f, cublas_gemm_default_tensor_op));
I checked the documentation and saw that CUBLAS_GEMM_DEFAULT_TENSOR_OP is no longer supported, is that correct? If so, is there any way to use a similar function? Using other BLAS library or CUBLAS library’s other functions…
(1. Introduction — cublas 12.2 documentation)
While researching to execute gemm operations using only Tensor Core, I heard that I can use gemm from a library called cuTENSOR. Is it possible to use TensorCore Contraction to execute gemm operations using only Tensor Core?
Are there any libraries or functions for gemm operations that run on Tensor Core alone provided by Nvidia?