cudnn create() / handle_t usage and memory reuse


I have a question concerning the recommended usage of cudnnHandle_t contexts.

  1. We are using a caffe implementation of convolutions that use a lot of different cudnnHandles in order to use different cuda streams . see and

2 According to the cudnn documentation this seems ok, especially for using different cuda streams (which is the point for this caffe implementation)

3 when cudnn handles are destroyed, they do not make GPU memory reusable for other processes (at least on our standard ubuntu 18.04 + cuda 10.1 + cudnn 7.6.5 setup), but memory still is reusable by the same process

4 in other implementation, like in pytorch, it seem that some complicated handle pool is used in order to limit handle numbers (ie one per thread), see and

So my questions are : is there something I am missing concerning some limitation on the usage of cudnnHandles? Is there a way to force full memory release that should be done in addition to cudnnDestroy()? Is the only way to limit memory consumption to use a limited number of cudnnHandles?

THank you a lot !


cudnnHandle context should be destroyed at the end using cudnnDestroy().

As you pointed out, the two main constrains are as mentioned in below link:

  • The recommended best practice is to call cudnnCreate/cudnnDestroy outside of performance-critical code paths.

  • For multithreaded applications that use the same device from different threads, the recommended programming model is to create one cuDNN handle(s) per thread and use that cuDNN handle for the entire life of the thread.