cudnn create() / handle_t usage and memory reuse

guillaume.infantes · January 31, 2020, 4:20pm

Hi

I have a question concerning the recommended usage of cudnnHandle_t contexts.

We are using a caffe implementation of convolutions that use a lot of different cudnnHandles in order to use different cuda streams . see https://github.com/BVLC/caffe/blob/master/src/caffe/layers/cudnn_conv_layer.cpp and https://github.com/BVLC/caffe/blob/master/src/caffe/layers/cudnn_conv_layer.cu

2 According to the cudnn documentation https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html#cudnnCreate this seems ok, especially for using different cuda streams (which is the point for this caffe implementation)

3 when cudnn handles are destroyed, they do not make GPU memory reusable for other processes (at least on our standard ubuntu 18.04 + cuda 10.1 + cudnn 7.6.5 setup), but memory still is reusable by the same process

4 in other implementation, like in pytorch, it seem that some complicated handle pool is used in order to limit handle numbers (ie one per thread), see https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cudnn/Handle.cpp and https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cuda/detail/DeviceThreadHandles.h

So my questions are : is there something I am missing concerning some limitation on the usage of cudnnHandles? Is there a way to force full memory release that should be done in addition to cudnnDestroy()? Is the only way to limit memory consumption to use a limited number of cudnnHandles?

THank you a lot !

SunilJB · February 3, 2020, 8:16am

Hi,

cudnnHandle context should be destroyed at the end using cudnnDestroy().

As you pointed out, the two main constrains are as mentioned in below link:

The recommended best practice is to call cudnnCreate/cudnnDestroy outside of performance-critical code paths.
For multithreaded applications that use the same device from different threads, the recommended programming model is to create one cuDNN handle(s) per thread and use that cuDNN handle for the entire life of the thread.

https://docs.nvidia.com/deeplearning/sdk/cudnn-archived/cudnn_765/cudnn-api/index.html#cudnnCreate

Thanks

Topic		Replies	Views
memory leak in cuDNN？ CUDA Programming and Performance	5	2249	September 29, 2020
CUcontext creation and destruction leads to handles leak How to create/destroy context in the worker CUDA Programming and Performance	10	10317	February 17, 2009
cublas handle reuse GPU-Accelerated Libraries	3	4865	June 15, 2016
cufft and OpenMP gives problems CUDA Programming and Performance	9	8364	October 13, 2010
CUDA Parallel Convolution Scheduling Issues(cuDNN) cuDNN kernel , cudnn	2	60	April 29, 2025
cuDNN Stream Priority cuDNN cudnn	5	1014	May 17, 2021
CUDNN_STATUS_INTERNAL_ERROR when using convolution GPU-Accelerated Libraries	3	5625	February 13, 2018
Accelerate Machine Learning with the cuDNN Deep Neural Network Library Technical Blog	32	664	February 26, 2018
Caffe make faild Jetson Nano cuda , caffe	11	2043	December 22, 2021
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR cuDNN	3	8111	November 7, 2019

cudnn create() / handle_t usage and memory reuse

Related topics