We tried to use GEMM with INT8 (using cuBLAS GEMMEX API), but we met the following issues, In our typical settings, M=768, N=786432, K=128, GEMM with INT8 (volta_sgemm_int8_128x128_nt) is much slower than FP16 (turing_h1688gemm_128x128_ldg8_nt), 21.443ms vs. 8.6957ms. I changed to CUDA version fr…

cuBLAS GEMM INT8 is much slower than FP16 in T4

Accelerated Computing GPU-Accelerated Libraries

kenny5312012 April 19, 2023, 9:09pm 7

After studying the document for a few hours, I guess the API does support int8 with tensor cores.

According to the document, cublasGemmEx computeType with tensor core only supports: CUBLAS_COMPUTE_16F, CUBLAS_COMPUTE_32F_FAST_16F, CUBLAS_COMPUTE_32F_FAST_16BF, CUBLAS_COMPUTE_32F_FAST_TF32

Not sure if this is the reason. Would someone confirm this? Thanks.

Cublas and tflop measure, is it possible at all to measure tflop to any reasonable degree of accuracy?

Topic		Replies	Views
Xavier Tensor Core int8 Peformance cannot reach 22TOPS with cublasGemmEx API? Jetson AGX Xavier	8	1036	October 18, 2021
cublasGemmEx doesn't work with INT8 utilizing __dp4a instruction on NVIDIA 1080TI CUDA Programming and Performance	12	3798	September 25, 2017
cuBLAS INT8 tensor core mode vs. FP16 mode GPU-Accelerated Libraries cublas	13	6141	December 5, 2022
cuBLAS INT8 tensor core mode vs. FP16 mode GPU-Accelerated Libraries	0	939	February 15, 2019
About cublasGemm INT8 support GPU-Accelerated Libraries	3	2871	September 15, 2017
How can I perform GEMM with INT8 in cuBLAS CUDA Programming and Performance	3	2205	February 24, 2017
INT8 cublasGemmEx support on Tegra X2 and Tesla P100 GPU-Accelerated Libraries	4	1914	October 17, 2017
How can I perform GEMM with INT8 in cuBLAS with DRIVE PX2 General	6	2284	May 18, 2017
cublasGemmEx cant use CUDA_R_8I compute type on GTX1080 GPU-Accelerated Libraries	4	1430	February 12, 2018
cuBLAS convolution does not use Tensor Cores GPU-Accelerated Libraries cublas	6	2375	June 8, 2021

cuBLAS GEMM INT8 is much slower than FP16 in T4

Related topics