I have a question concerning the recommended usage of cudnnHandle_t contexts.
- We are using a caffe implementation of convolutions that use a lot of different cudnnHandles in order to use different cuda streams . see https://github.com/BVLC/caffe/blob/master/src/caffe/layers/cudnn_conv_layer.cpp and https://github.com/BVLC/caffe/blob/master/src/caffe/layers/cudnn_conv_layer.cu
2 According to the cudnn documentation https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html#cudnnCreate this seems ok, especially for using different cuda streams (which is the point for this caffe implementation)
3 when cudnn handles are destroyed, they do not make GPU memory reusable for other processes (at least on our standard ubuntu 18.04 + cuda 10.1 + cudnn 7.6.5 setup), but memory still is reusable by the same process
4 in other implementation, like in pytorch, it seem that some complicated handle pool is used in order to limit handle numbers (ie one per thread), see https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cudnn/Handle.cpp and https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cuda/detail/DeviceThreadHandles.h
So my questions are : is there something I am missing concerning some limitation on the usage of cudnnHandles? Is there a way to force full memory release that should be done in addition to cudnnDestroy()? Is the only way to limit memory consumption to use a limited number of cudnnHandles?
THank you a lot !