I am trying to use cublasLtMatmul to perform matrix int8-multiplications. According to the documentation CUBLAS-LT support matrix multiplication of type D = alpha (A*B) + C. One of the supported data-type is where all A,B,C and D are int8. I am trying to get this Int8 output mode to work.
My configuration for the multiplication is the following
Scale type: CUDA_R_32F
Compute type: CUBLAS_COMPUTE_ 32I
CUDA Version: 11.2
The documentation does not mention any architecture dependence. However when I run the code it errors out on Volta architecture (Titan V) but it runs on Turing (RTX 2060). The algorithm heuristics are unable to find any algorithm for the multiplication for the Volta architecture.
Some other observations:
- cublasltmatmul only worked on Turning when the external dimensions of the matrices were multiple of 16.
- nvprof analysis of binary indicates that tensorcore is being used for Int8 computation on Turing.
My question is why does the multiplication not work on Volta (is it a bug) ? or is this the expected behaviour ?
I have attached a simple reproducible example script which runs on Turing but not on Volta demonstrating this issue.
ltmatmul_bug.rar (2.1 KB)
Thanks in advance :)