For Multiple GPU, cudnn Convolution operation Api call by CPU is not completing parallelly(at the same time))

Tried to run two convolution operation parallel in separate GPU. I used OpenMP threads to call cudnn convolution operation. As cudnn is thread-safe, I assumed both will run parallel. But I found pthread_mutex_lock is blocking the OS to complete the convolution operation call parallelly. I am attaching the trace of Nsights… is it normal?


Sorry, fir the delayed response.
Could you please share with us the API log output and if possible minimal issue repro for better debugging?
Btw, which version of the cuDNN are you using?

cuDNN legacy API is believed to be thread-safe as long as each thread has its own cudnnHandle and uses that handle for its entire life, could you please check whether your app conforms to the following doc

The api log would help to show the usage of cudnnHandle of course.

Thank you.