Trying to run an inferencing application on a VM (32 GB system ram, RTX 6000 allocated) upon which this application had been working under cuDNN 7.6.5. In version 8.0.5, as soon as I hit an API that’s implemented in libcudnn_cnn_infer.so, I get a segmentation fault. To isolate this from anything but loading libraries, I made a toy application that simply does
printf ("CUDNN version %d\n", cudnnGetVersion ());
printf ("open library ops_infer pointer %p\n", dlopen ("libcudnn_ops_infer.so", RTLD_LAZY));
printf ("open library ops_infer pointer %p\n", dlopen ("libcudnn_adv_infer.so", RTLD_LAZY));
printf ("open library adv_infer pointer %p\n", dlopen ("libcudnn_ops_train.so", RTLD_LAZY));
printf ("open library ops_train pointer %p\n", dlopen ("libcudnn_adv_train.so", RTLD_LAZY));
printf ("open library cnn_infer pointer %p\n", dlopen ("libcudnn_cnn_infer.so", RTLD_LAZY));
printf ("open library cnn_train pointer %p\n", dlopen ("libcudnn_cnn_train.so", RTLD_LAZY));
No CUDA code. My output is
CUDNN version 8005
open library ops_infer pointer 0x12900f0
open library ops_infer pointer 0x13d9570
open library adv_infer pointer 0x1403080
open library ops_train pointer 0x140f820
Segmentation fault (core dumped)
The crash occurs with either cnn_*.so, as the output is the same if I reorder them. The same application runs without incident on a bare metal system (48 GB RAM, RTX 4000). The equivalent application, loading only libcudnn.so, runs on a VM with cuDNN 7.6.5. All systems are Centos 7.
Clues?