cuDNN 8 samples crash with illegal instruction error

I just did a fresh install of CUDA-11.0 (Driver 450.51.06) together with cuDNN 8.0.5 on a machine with Ubuntu 18.04 LTS and a Titan RTX. Head of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  TITAN RTX           On   | 00000000:03:00.0 Off |                  N/A |
| 41%   30C    P8     9W / 280W |     20MiB / 24219MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Running the CUDA samples works fine, but the cuDNN samples crash with an illegal instruction:

Executing: mnistCUDNN
cudnnGetVersion() : 8005 , CUDNN_VERSION from cudnn.h : 8005 (8.0.5)
Host compiler version : GCC 7.5.0

There are 1 CUDA capable devices on your machine :
device 0 : sms 72  Capabilities 7.5, SmClock 1770.0 Mhz, MemSize (Mb) 24219, MemClock 7001.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
[...]
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.024992 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.026176 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.062848 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.064416 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.101920 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.120832 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
[...]
Test passed!

Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.025696 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.053280 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.055040 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.062816 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.072736 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.102176 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
Illegal instruction (core dumped)

Apart from the illegal instruction error, I’m confused that the times for cudnnGetConvolutionForwardAlgorithm_v7 are negative and fixed at -1.0.

Does someone have a clue what is going on or knows how to debug this? If you need additional information about the system or output from another program, please let me know.

Hi @meier.philip,
cudnnGet does not exercise any GPU code so there’s no measurement of each algo.
The illegal instruction seem to be related to CPU.

Thanks!

The illegal instruction seem to be related to CPU.

True, but why? I built the cuDNN sample on the same machine I was running it. Doesn’t that mean that the build used an instruction that is not available on my CPU?

Yes this might be related to compiler. The compiler generates an instruction that is not available on your CPU.