How can I perform GEMM with INT8 in cuBLAS

CUDA version: 8.0.50
OS: Ubuntu 16.04
Hardware: DRIVE PX 2, on dGPU (GP106)

Call cublasGemmEx with M=N=K=lda=ldb=ldc=4096, alpha=1, beta=0 (both in int32_t on host), Atype=Btype=CUDA_R_8I, Ctype=computeType=CUDA_R_32I will always return CUBLAS_STATUS_NOT_SUPPORTED, no matter which algorithm I use (CUBLAS_GEMM_DFALT/ALGO0/1/2/3/4/5/6/7).

I noticed that CUDA 8 Performance Overview (released in november 2016, page 22) has benchmark for GEMM with INT8 on Tesla P40 and achieves 32TFLOPS throughtput.
cuBLAS’s main page (https://developer.nvidia.com/cublas, in Key Feature section) also said that cuBLAS supports integer (INT8) matrix multiplication operations.

Update:
Test passed on CUDA version: 8.0.61, Ubuntu 16.04 x86_64 and GTX 1080
But DRIVE PX2 is still not work.

Note that there is a special forum section for Drive PX related questions:

https://devtalk.nvidia.com/default/board/182/drive-platforms/

Thanks for your kind remind.

I have open a new thread in the Drive PX forum.
https://devtalk.nvidia.com/default/topic/996059/drive-platforms/how-can-i-perform-gemm-with-int8-in-cublas-with-drive-px2/