cudnn v4 core dumped on Tesla M40

Hi all,

I ran cudnn v4 samples on Tesla M40.

The driver version is 352.79, which is the newest for M40.

The OS is CentOS 7 64bit.

When I ran, it reports:

./mnistCUDNN
cudnnGetVersion() : 3002 , CUDNN_VERSION from cudnn.h : 4007 (4.0.7)
Host compiler version : GCC 4.8.3
There are 2 CUDA capable devices on your machine :
device 0 : sms 24  Capabilities 5.2, SmClock 1112.0 Mhz, MemSize (Mb) 11519, MemClock 3004.0 Mhz, Ecc=1, boardGroupID=0
device 1 : sms 24  Capabilities 5.2, SmClock 1112.0 Mhz, MemSize (Mb) 11519, MemClock 3004.0 Mhz, Ecc=1, boardGroupID=1
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.031136 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.044992 time requiring 3464 memory
Segment Fault(Core Dumped)

You seem to have a mismatch. The library is version 3, the header is from version 4.

cudnnGetVersion() : 3002 , CUDNN_VERSION from cudnn.h : 4007 (4.0.7)

It solved! I removed the libcudnn.so* and copy all libcudnn v4 into my lib path, the results:

# ./mnistCUDNN
cudnnGetVersion() : 4007 , CUDNN_VERSION from cudnn.h : 4007 (4.0.7)
Host compiler version : GCC 4.8.3
There are 2 CUDA capable devices on your machine :
device 0 : sms 24  Capabilities 5.2, SmClock 1112.0 Mhz, MemSize (Mb) 11519, MemClock 3004.0 Mhz, Ecc=1, boardGroupID=0
device 1 : sms 24  Capabilities 5.2, SmClock 1112.0 Mhz, MemSize (Mb) 11519, MemClock 3004.0 Mhz, Ecc=1, boardGroupID=1
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.032320 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.055776 time requiring 57600 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.033504 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.055488 time requiring 28800 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000720 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!