Hi guys, is there currently any way to perform INT4 ops with turing tensor cores? CuBLAS only allows float16 and float32, according to https://docs.nvidia.com/cuda/cublas/index.html#cublassetmathmode
The CuDNN docs say int8 data types are available, but only on sm_72 (not Turing I think) https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#tensor-ops-speedup-tips
Is a new API coming out soon or something like that? Cheers.