Recently, I found that for the same convolution operation, using float as data type would allocate much less work_space than using FP16 as data type, even using same convolution algorithm. How could I using tensor core and FP16 as data type for convolution operation with less work_space allocation ?
Hi,
What cudnn convolution fwd preference you are using in this case?
Please find below link for your reference:
https://docs.nvidia.com/deeplearning/sdk/cudnn-archived/cudnn_765/cudnn-api/index.html#cudnnConvolutionFwdPreference_t
Thanks