I want to run for INT8 convolutions i.e DP4A product enabled GPUs for 4x faster inference.
I checked the CUDNN user guide and found “INT8x4_EXT_CONFIG” configuration which takes xdesc and wdesc as CUDNN_DATA_INT8x4 4-byte packed signed integers as inputs with convdesc as CUDNN_DATA_INT32 and giving output as CUDNN_DATA_FLOAT.
According to NVIDIA CUDNN guide, Pg 63 : “Tensors can be converted to/from CUDNN_TENSOR_NCHW_VECT_C with
This statement means if I read my input images as NCHW format, I can convert them to CUDNN_TENSOR_NCHW_VECT_C format using cudnnTransformTensor() . -> FINE ! Is my understanding correct ?
But; how to convert the filter descriptors in the same format ? I can’t use the same cudnnTransformTensor() API because it’s arguments are ‘cudnnTensorDescriptor_t’ and not “cudnnFilterDescriptor_t” for input and transformed output.
Please let me know this !!