Hi folks,

I am trying to use **cublasLtMatmul** to perform matrix int8-multiplications. According to the documentation **CUBLAS-LT** support matrix multiplication of type D = alpha (A*B) + C. One of the supported data-type is where all A,B,C and D are int8. I am trying to get this **Int8 output** mode to work.

https://docs.nvidia.com/cuda/cublas/index.html#cublasLtMatmul

My configuration for the multiplication is the following

A: CUDA_R_8I

B: CUDA_R_8I

C: CUDA_R_8I

D: CUDA_R_8I

Scale type: CUDA_R_32F

Compute type: CUBLAS_COMPUTE_ 32I

CUDA Version: 11.2

The documentation does not mention any architecture dependence. However when I run the code it **errors out on Volta architecture (Titan V) but it runs on Turing (RTX 2060).** The algorithm heuristics are unable to find any algorithm for the multiplication for the Volta architecture.

Some other observations:

- cublasltmatmul only worked on Turning when the external dimensions of the matrices were multiple of 16.
- nvprof analysis of binary indicates that tensorcore is being used for Int8 computation on Turing.

My question is why does the multiplication not work on Volta (is it a bug) ? or is this the expected behaviour ?

I have attached a simple reproducible example script which runs on Turing but not on Volta demonstrating this issue.

ltmatmul_bug.rar (2.1 KB)

Thanks in advance :)