cuDNN 8.0.5 load libcudnn_cnn_infer.so crashes

Trying to run an inferencing application on a VM (32 GB system ram, RTX 6000 allocated) upon which this application had been working under cuDNN 7.6.5. In version 8.0.5, as soon as I hit an API that’s implemented in libcudnn_cnn_infer.so, I get a segmentation fault. To isolate this from anything but loading libraries, I made a toy application that simply does

printf ("CUDNN version %d\n", cudnnGetVersion ());

printf ("open library ops_infer pointer %p\n", dlopen ("libcudnn_ops_infer.so", RTLD_LAZY));
printf ("open library ops_infer pointer %p\n", dlopen ("libcudnn_adv_infer.so", RTLD_LAZY));
printf ("open library adv_infer pointer %p\n", dlopen ("libcudnn_ops_train.so", RTLD_LAZY));
printf ("open library ops_train pointer %p\n", dlopen ("libcudnn_adv_train.so", RTLD_LAZY));
printf ("open library cnn_infer pointer %p\n", dlopen ("libcudnn_cnn_infer.so", RTLD_LAZY));
printf ("open library cnn_train pointer %p\n", dlopen ("libcudnn_cnn_train.so", RTLD_LAZY));

No CUDA code. My output is

CUDNN version 8005
open library ops_infer pointer 0x12900f0
open library ops_infer pointer 0x13d9570
open library adv_infer pointer 0x1403080
open library ops_train pointer 0x140f820
Segmentation fault (core dumped)

The crash occurs with either cnn_*.so, as the output is the same if I reorder them. The same application runs without incident on a bare metal system (48 GB RAM, RTX 4000). The equivalent application, loading only libcudnn.so, runs on a VM with cuDNN 7.6.5. All systems are Centos 7.

Clues?

Hi @jbaumgart,

does that mean the fourth dlopen crash or the same libcudnn_cnn_infer.so crash?
Also can you generate API logging to inform us what cuDNN API call you have made?

Thanks!

All dlopen calls that show a pointer succeeded. It looks like I have some cut and paste errors in the print statements, so it should say that those that succeed are ops_infer, adv_infer, ops_train, and adv_train, in that order. The crash in this case is when loading cnn_infer.so.

In the code that I’m running, no cuDNN APIs are being called other than cudnnGetVersion. When I first encountered this problem, the API being called was cudnnCreateConvolutionDescriptor().

Hi @jbaumgart,
Are you calling cudnnCreateConvolutionDescriptor() or not?

Thanks!

Not in the sample I show in the OP.

Hi @jbaumgart ,
Are you still facing the issue?

I haven’t tried it recently. I’ll have to upgrade the VM to cuDNN 8.1.2 and test it.

Please let us know in case if the issue still persist