How can I perform GEMM with INT8 in cuBLAS

CUDA version: 8.0.50
OS: Ubuntu 16.04
Hardware: DRIVE PX 2, on dGPU (GP106)

Call cublasGemmEx with M=N=K=lda=ldb=ldc=4096, alpha=1, beta=0 (both in int32_t on host), Atype=Btype=CUDA_R_8I, Ctype=computeType=CUDA_R_32I will always return CUBLAS_STATUS_NOT_SUPPORTED, no matter which algorithm I use (CUBLAS_GEMM_DFALT/ALGO0/1/2/3/4/5/6/7).

I noticed that CUDA 8 Performance Overview (released in november 2016, page 22) has benchmark for GEMM with INT8 on Tesla P40 and achieves 32TFLOPS throughtput.
cuBLAS’s main page (, in Key Feature section) also said that cuBLAS supports integer (INT8) matrix multiplication operations.

Test passed on CUDA version: 8.0.61, Ubuntu 16.04 x86_64 and GTX 1080
But DRIVE PX2 is still not work.

Note that there is a special forum section for Drive PX related questions:

Thanks for your kind remind.

I have open a new thread in the Drive PX forum.