I’m trying to implement INT8 convolution on cuDNN 6, and I am seeing errors that I’ve never seen for 32-bit float. I followed the instructions in page 64 of the User Manual where it requires (copied directly):
For the datatype configurations INT8_CONFIG and INT8_EXT_CONFIG, the only algo supported is CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMPUTED_GEMM with the following conditions:
- xDesc Format Support: CUDNN_TENSOR_NHWC
- yDesc Format Support: CUDNN_TENSOR_NHWC
- Input and output features maps must be multiple of 4
- wDesc Format Support: CUDNN_TENSOR_NHWC
- Dilation: 1 for all dimensions
I see a few problems here:
- CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMPUTED_GEMM is NOT supported in cudnnConvolutionFwdAlgo_t. The closest alternative seems to be CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM ("PRECOMP" instead of "PRECOMPUTED")
- The job fails at cudnnConvolutionForward() with CUDNN_STATUS_NOT_SUPPORTED error. This happens regardless of what algorithm I choose. I tested all algo types in page 16 of the manual.
- 32 bit float implementation (CUDNN_DATA_FLOAT) doesn't have this issue
Can someone take a look? Code snippet is attached below. Same problem for both GTX 1070 and Titan X (Pascal). Running CUDA 8 and cuDNN 6.0
checkCudaErrors(cudaSetDevice(gpuid));
checkCUDNN(cudnnCreate(&cudnnHandle));
checkCUDNN(cudnnCreateTensorDescriptor(&dataTensor));
checkCUDNN(cudnnCreateFilterDescriptor(&conv1filterDesc));
checkCUDNN(cudnnCreateConvolutionDescriptor(&conv1Desc));
checkCUDNN(cudnnCreateTensorDescriptor(&conv1Tensor));
checkCUDNN(cudnnSetTensor4dDescriptor(dataTensor, CUDNN_TENSOR_NHWC,
CUDNN_DATA_INT8, n, c, h, w));
checkCUDNN(cudnnSetFilter4dDescriptor(conv1filterDesc, CUDNN_DATA_INT8,
CUDNN_TENSOR_NHWC, out_channels, in_channels, conv.kernel_size, conv.kernel_size));
checkCUDNN(cudnnSetConvolution2dDescriptor(conv1Desc, pad_height, pad_width, 1, 1, 1, 1,
CUDNN_CONVOLUTION, CUDNN_DATA_INT32));
checkCUDNN(cudnnGetConvolution2dForwardOutputDim(conv1Desc, dataTensor,
conv1filterDesc, &n, &c, &h, &w));
checkCUDNN(cudnnSetTensor4dDescriptor(conv1Tensor, CUDNN_TENSOR_NHWC,
CUDNN_DATA_INT8, n, c, h, w));
// Documentation says CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMPUTED_GEMM, but apparently
// This is not in the list of supported algo types (check page 16 of the manual).
// Since CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMPUTED_GEMM is not supported, I am using
// the closest alternative here
conv1algo = CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM;
// Get workspace size
size_t sizeInBytes = <some hardcoded number>;
// Calling checkCUDNN(cudnnGetConvolutionForwardWorkspaceSize(cudnnHandle, dataTensor,
// conv1filterDesc, conv1Desc, conv1Tensor, conv1algo, &sizeInBytes));
// fails with CUDNN_STATUS_NOT_SUPPORTED. I used a hardcoded number here.
// ... ALLOCATE WORKSPACE AND VARIABLES
....
// Fails here
checkCUDNN(cudnnConvolutionForward(cudnnHandle, &alpha, dataTensor, data,
conv1filterDesc, pconv1, conv1Desc, conv1algo, workspace, m_workspaceSize,
&beta, conv1Tensor, conv1));