I noticed something curious about this behavior.
I have downloaded the libcudnn8_8.2.1.32-1+cuda10.2_arm64.deb package and reinstall it when booting from the SD card (/dev/sda) using: * sudo dpkg -i ~/libcudnn8_8.2.1.32-1+cuda10.2_arm64.deb*
Preparing to unpack .../libcudnn8_8.2.1.32-1+cuda10.2_arm64.deb ...
Unpacking libcudnn8 (8.2.1.32-1+cuda10.2) over (8.2.1.32-1+cuda10.2) ...
Setting up libcudnn8 (8.2.1.32-1+cuda10.2) ...
Processing triggers for libc-bin (2.27-3ubuntu1.6) ...
Then, when I run the bash run_conv_sample.sh script, the tests pass, it seems the cuDNN starts working properly:
jetson@jetson-desktop:/usr/src/cudnn_samples_v8/conv_sample$ bash run_conv_sample.sh
Executing: conv_sample -c2048 -h7 -w7 -k512 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 2048, 7, 7
filter dims are 512, 2048, 1, 1
output dims are 1, 512, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 2048, 7, 7
padded filter dims are 512, 2048, 1, 1
padded output dims are 1, 512, 7, 7
Testing conv
^^^^ CUDA : elapsed = 1.68304 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 2048, 7, 7
filter dims are 512, 2048, 1, 1
output dims are 1, 512, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 2048, 7, 7
padded filter dims are 512, 2048, 1, 1
padded output dims are 1, 512, 7, 7
Testing conv
^^^^ CUDA : elapsed = 0.016382 sec,
Test PASSED
Executing: conv_sample -c512 -h28 -w28 -k128 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 512, 28, 28
filter dims are 128, 512, 1, 1
output dims are 1, 128, 28, 28
====PADDING DIMENSIONS====
padded input dims are 1, 512, 28, 28
padded filter dims are 128, 512, 1, 1
padded output dims are 1, 128, 28, 28
Testing conv
^^^^ CUDA : elapsed = 1.65834 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 512, 28, 28
filter dims are 128, 512, 1, 1
output dims are 1, 128, 28, 28
====PADDING DIMENSIONS====
padded input dims are 1, 512, 28, 28
padded filter dims are 128, 512, 1, 1
padded output dims are 1, 128, 28, 28
Testing conv
^^^^ CUDA : elapsed = 0.0119801 sec,
Test PASSED
Executing: conv_sample -c512 -h28 -w28 -k1024 -r1 -s1 -pad_h0 -pad_w0 -u2 -v2
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 512, 28, 28
filter dims are 1024, 512, 1, 1
output dims are 1, 1024, 14, 14
====PADDING DIMENSIONS====
padded input dims are 1, 512, 28, 28
padded filter dims are 1024, 512, 1, 1
padded output dims are 1, 1024, 14, 14
Testing conv
^^^^ CUDA : elapsed = 1.61455 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 512, 28, 28
filter dims are 1024, 512, 1, 1
output dims are 1, 1024, 14, 14
====PADDING DIMENSIONS====
padded input dims are 1, 512, 28, 28
padded filter dims are 1024, 512, 1, 1
padded output dims are 1, 1024, 14, 14
Testing conv
^^^^ CUDA : elapsed = 0.027364 sec,
Test PASSED
Executing: conv_sample -c512 -h28 -w28 -k256 -r1 -s1 -pad_h0 -pad_w0 -u2 -v2
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 512, 28, 28
filter dims are 256, 512, 1, 1
output dims are 1, 256, 14, 14
====PADDING DIMENSIONS====
padded input dims are 1, 512, 28, 28
padded filter dims are 256, 512, 1, 1
padded output dims are 1, 256, 14, 14
Testing conv
^^^^ CUDA : elapsed = 1.65665 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 512, 28, 28
filter dims are 256, 512, 1, 1
output dims are 1, 256, 14, 14
====PADDING DIMENSIONS====
padded input dims are 1, 512, 28, 28
padded filter dims are 256, 512, 1, 1
padded output dims are 1, 256, 14, 14
Testing conv
^^^^ CUDA : elapsed = 0.00695896 sec,
Test PASSED
Executing: conv_sample -c256 -h14 -w14 -k256 -r3 -s3 -pad_h1 -pad_w1 -u1 -v1
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 256, 14, 14
filter dims are 256, 256, 3, 3
output dims are 1, 256, 14, 14
====PADDING DIMENSIONS====
padded input dims are 1, 256, 14, 14
padded filter dims are 256, 256, 3, 3
padded output dims are 1, 256, 14, 14
Testing conv
^^^^ CUDA : elapsed = 1.67795 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 256, 14, 14
filter dims are 256, 256, 3, 3
output dims are 1, 256, 14, 14
====PADDING DIMENSIONS====
padded input dims are 1, 256, 14, 14
padded filter dims are 256, 256, 3, 3
padded output dims are 1, 256, 14, 14
Testing conv
^^^^ CUDA : elapsed = 0.019665 sec,
Test PASSED
Executing: conv_sample -c256 -h14 -w14 -k1024 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 256, 14, 14
filter dims are 1024, 256, 1, 1
output dims are 1, 1024, 14, 14
====PADDING DIMENSIONS====
padded input dims are 1, 256, 14, 14
padded filter dims are 1024, 256, 1, 1
padded output dims are 1, 1024, 14, 14
Testing conv
^^^^ CUDA : elapsed = 1.64909 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 256, 14, 14
filter dims are 1024, 256, 1, 1
output dims are 1, 1024, 14, 14
====PADDING DIMENSIONS====
padded input dims are 1, 256, 14, 14
padded filter dims are 1024, 256, 1, 1
padded output dims are 1, 1024, 14, 14
Testing conv
^^^^ CUDA : elapsed = 0.014518 sec,
Test PASSED
Executing: conv_sample -c1024 -h14 -w14 -k256 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 1024, 14, 14
filter dims are 256, 1024, 1, 1
output dims are 1, 256, 14, 14
====PADDING DIMENSIONS====
padded input dims are 1, 1024, 14, 14
padded filter dims are 256, 1024, 1, 1
padded output dims are 1, 256, 14, 14
Testing conv
^^^^ CUDA : elapsed = 1.63442 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 1024, 14, 14
filter dims are 256, 1024, 1, 1
output dims are 1, 256, 14, 14
====PADDING DIMENSIONS====
padded input dims are 1, 1024, 14, 14
padded filter dims are 256, 1024, 1, 1
padded output dims are 1, 256, 14, 14
Testing conv
^^^^ CUDA : elapsed = 0.0136962 sec,
Test PASSED
Executing: conv_sample -c1024 -h14 -w14 -k2048 -r1 -s1 -pad_h0 -pad_w0 -u2 -v2
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 1024, 14, 14
filter dims are 2048, 1024, 1, 1
output dims are 1, 2048, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 1024, 14, 14
padded filter dims are 2048, 1024, 1, 1
padded output dims are 1, 2048, 7, 7
Testing conv
^^^^ CUDA : elapsed = 1.68824 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 1024, 14, 14
filter dims are 2048, 1024, 1, 1
output dims are 1, 2048, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 1024, 14, 14
padded filter dims are 2048, 1024, 1, 1
padded output dims are 1, 2048, 7, 7
Testing conv
^^^^ CUDA : elapsed = 0.0313749 sec,
Test PASSED
Executing: conv_sample -c1024 -h14 -w14 -k512 -r1 -s1 -pad_h0 -pad_w0 -u2 -v2
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 1024, 14, 14
filter dims are 512, 1024, 1, 1
output dims are 1, 512, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 1024, 14, 14
padded filter dims are 512, 1024, 1, 1
padded output dims are 1, 512, 7, 7
Testing conv
^^^^ CUDA : elapsed = 1.66685 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 1024, 14, 14
filter dims are 512, 1024, 1, 1
output dims are 1, 512, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 1024, 14, 14
padded filter dims are 512, 1024, 1, 1
padded output dims are 1, 512, 7, 7
Testing conv
^^^^ CUDA : elapsed = 0.00818205 sec,
Test PASSED
Executing: conv_sample -c512 -h7 -w7 -k512 -r3 -s3 -pad_h1 -pad_w1 -u1 -v1
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 512, 7, 7
filter dims are 512, 512, 3, 3
output dims are 1, 512, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 512, 7, 7
padded filter dims are 512, 512, 3, 3
padded output dims are 1, 512, 7, 7
Testing conv
^^^^ CUDA : elapsed = 1.71264 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 512, 7, 7
filter dims are 512, 512, 3, 3
output dims are 1, 512, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 512, 7, 7
padded filter dims are 512, 512, 3, 3
padded output dims are 1, 512, 7, 7
Testing conv
^^^^ CUDA : elapsed = 0.051836 sec,
Test PASSED
Executing: conv_sample -c512 -h7 -w7 -k2048 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 512, 7, 7
filter dims are 2048, 512, 1, 1
output dims are 1, 2048, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 512, 7, 7
padded filter dims are 2048, 512, 1, 1
padded output dims are 1, 2048, 7, 7
Testing conv
^^^^ CUDA : elapsed = 1.65965 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 512, 7, 7
filter dims are 2048, 512, 1, 1
output dims are 1, 2048, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 512, 7, 7
padded filter dims are 2048, 512, 1, 1
padded output dims are 1, 2048, 7, 7
Testing conv
^^^^ CUDA : elapsed = 0.0163169 sec,
Test PASSED
Executing: conv_sample -c2048 -h7 -w7 -k512 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 2048, 7, 7
filter dims are 512, 2048, 1, 1
output dims are 1, 512, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 2048, 7, 7
padded filter dims are 512, 2048, 1, 1
padded output dims are 1, 512, 7, 7
Testing conv
^^^^ CUDA : elapsed = 1.66518 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 2048, 7, 7
filter dims are 512, 2048, 1, 1
output dims are 1, 512, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 2048, 7, 7
padded filter dims are 512, 2048, 1, 1
padded output dims are 1, 512, 7, 7
Testing conv
^^^^ CUDA : elapsed = 0.017077 sec,
Test PASSED
Executing: conv_sample -mathType1 -filterFormat2 -dataType2 -n1 -c512 -h100 -w100 -k64 -r8 -s8 -pad_h0 -pad_w0 -u1 -v1 -b
Using format CUDNN_TENSOR_NCHW_VECT_C (for single and double precision tests use a different format)
Device version 53 does not support int8x4!
Skipping test, SM53 does not support int8x32
Executing: conv_sample -mathType1 -filterFormat2 -dataType2 -n1 -c4096 -h64 -w64 -k64 -r4 -s4 -pad_h1 -pad_w1 -u1 -v1 -b
Using format CUDNN_TENSOR_NCHW_VECT_C (for single and double precision tests use a different format)
Device version 53 does not support int8x4!
Skipping test, SM53 does not support int8x32
Executing: conv_sample -mathType1 -filterFormat2 -dataType2 -n1 -c512 -h100 -w100 -k64 -r8 -s8 -pad_h1 -pad_w1 -u1 -v1 -b
Using format CUDNN_TENSOR_NCHW_VECT_C (for single and double precision tests use a different format)
Device version 53 does not support int8x4!
Skipping test, SM53 does not support int8x32
Executing: conv_sample -mathType1 -filterFormat2 -dataType2 -n1 -c512 -h128 -w128 -k64 -r13 -s13 -pad_h1 -pad_w1 -u1 -v1 -b
Using format CUDNN_TENSOR_NCHW_VECT_C (for single and double precision tests use a different format)
Device version 53 does not support int8x4!
Skipping test, SM53 does not support int8x32
Executing: conv_sample -mathType1 -filterFormat2 -dataType3 -n1 -c512 -h100 -w100 -k64 -r8 -s8 -pad_h0 -pad_w0 -u1 -v1 -b
Using format CUDNN_TENSOR_NCHW_VECT_C (for single and double precision tests use a different format)
Device version 53 does not support int8x4!
Skipping test, SM53 does not support int8x32
Executing: conv_sample -mathType1 -filterFormat2 -dataType3 -n1 -c4096 -h64 -w64 -k64 -r4 -s4 -pad_h1 -pad_w1 -u1 -v1 -b
Using format CUDNN_TENSOR_NCHW_VECT_C (for single and double precision tests use a different format)
Device version 53 does not support int8x4!
Skipping test, SM53 does not support int8x32
Executing: conv_sample -mathType1 -filterFormat2 -dataType3 -n1 -c512 -h100 -w100 -k64 -r8 -s8 -pad_h1 -pad_w1 -u1 -v1 -b
Using format CUDNN_TENSOR_NCHW_VECT_C (for single and double precision tests use a different format)
Device version 53 does not support int8x4!
Skipping test, SM53 does not support int8x32
Executing: conv_sample -mathType1 -filterFormat2 -dataType3 -n1 -c512 -h128 -w128 -k64 -r13 -s13 -pad_h1 -pad_w1 -u1 -v1 -b
Using format CUDNN_TENSOR_NCHW_VECT_C (for single and double precision tests use a different format)
Device version 53 does not support int8x4!
Skipping test, SM53 does not support int8x32
Executing: conv_sample -mathType1 -filterFormat2 -dataType3 -n5 -c32 -h16 -w16 -k32 -r5 -s5 -pad_h0 -pad_w0 -u1 -v1 -b -transformFromNCHW
Using format CUDNN_TENSOR_NCHW_VECT_C (for single and double precision tests use a different format)
Device version 53 does not support int8x4!
Skipping test, SM53 does not support int8x32
Executing: conv_sample -dgrad -c1024 -h14 -w14 -k2048 -r1 -s1 -pad_h0 -pad_w0 -u2 -v2 -fold
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 1024, 14, 14
filter dims are 2048, 1024, 1, 1
output dims are 1, 2048, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 1024, 14, 14
padded filter dims are 2048, 1024, 1, 1
padded output dims are 1, 2048, 7, 7
Testing dgrad
WORKSPACE = 1146699776
^^^^ CUDA : elapsed = 3.10011 sec,
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 1024, 14, 14
filter dims are 2048, 1024, 1, 1
output dims are 1, 2048, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 1024, 14, 14
padded filter dims are 2048, 1024, 1, 1
padded output dims are 1, 2048, 7, 7
Testing dgrad
WORKSPACE = 1146699776
^^^^ CUDA : elapsed = 1.51487 sec,
Test PASSED
However, this behavior does not persist. After I reboot the Nano, the old behavior reappears.
jetson@jetson-desktop:/usr/src/cudnn_samples_v8/conv_sample$ ./conv_sample
Executing: conv_sample
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 32, 4, 4
filter dims are 32, 32, 1, 1
output dims are 1, 32, 4, 4
====PADDING DIMENSIONS====
padded input dims are 1, 32, 4, 4
padded filter dims are 32, 32, 1, 1
padded output dims are 1, 32, 4, 4
Testing conv
CUDNN error at conv_sample.cpp:721, code=8 (CUDNN_STATUS_EXECUTION_FAILED) in 'cudnnConvolutionForward(handle_, (void*)(&alpha), cudnnIdesc, devPtrI, cudnnFdesc, devPtrF, cudnnConvDesc, algo, workSpace, workSpaceSize, (void*)(&beta), cudnnOdesc, devPtrO)'
Test FAILED, num errors = 1
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 32, 4, 4
filter dims are 32, 32, 1, 1
output dims are 1, 32, 4, 4
====PADDING DIMENSIONS====
padded input dims are 1, 32, 4, 4
padded filter dims are 32, 32, 1, 1
padded output dims are 1, 32, 4, 4
Testing conv
CUDNN error at conv_sample.cpp:721, code=8 (CUDNN_STATUS_EXECUTION_FAILED) in 'cudnnConvolutionForward(handle_, (void*)(&alpha), cudnnIdesc, devPtrI, cudnnFdesc, devPtrF, cudnnConvDesc, algo, workSpace, workSpaceSize, (void*)(&beta), cudnnOdesc, devPtrO)'
Test FAILED, num errors = 1
And then I need to reinstall the deb package again