Hi there, sure, pls take a look at the log below.
BTW I have to change the hardcoded CUDNN_DATA_FLOAT to CUDNN_DATA_HALF in line 1392 and line 1396 of the example to make it work.
Thanks
/cudnn_samples_v7/conv_sample$ /usr/local/cuda/bin/cuda-memcheck ./conv_sample -convBiasAct -mathType1 -filterFormat1 -dataType1 -c8 -h704 -w512 -k24 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1 -x
====USER DIMENSIONS====
input dims are 1, 8, 704, 512
filter dims are 24, 8, 1, 1
output dims are 1, 24, 704, 512
====PADDING DIMENSIONS====
padded input dims are 1, 8, 704, 512
padded filter dims are 24, 8, 1, 1
padded output dims are 1, 24, 704, 512
I! CuDNN (v7605) function cudnnCreate() called:
i! Time: 2021-08-24T11:44:12.717629 (0d+0h+0m+0s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnCreateTensorDescriptor() called:
i! Time: 2021-08-24T11:44:20.364463 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnCreateFilterDescriptor() called:
i! Time: 2021-08-24T11:44:20.364501 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnCreateTensorDescriptor() called:
i! Time: 2021-08-24T11:44:20.364506 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnCreateTensorDescriptor() called:
i! Time: 2021-08-24T11:44:20.364510 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnCreateActivationDescriptor() called:
i! Time: 2021-08-24T11:44:20.364514 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnCreateConvolutionDescriptor() called:
i! Time: 2021-08-24T11:44:20.364519 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnSetTensorNdDescriptor() called:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[1,8,704,512];
i! strideA: type=int; val=[2883584,1,4096,8];
i! Time: 2021-08-24T11:44:20.725572 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnSetTensorNdDescriptor() called:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[1,24,704,512];
i! strideA: type=int; val=[8650752,1,12288,24];
i! Time: 2021-08-24T11:44:20.725616 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnSetConvolutionNdDescriptor() called:
i! arrayLength: type=int; val=2;
i! padA: type=int; val=[0,0];
i! strideA: type=int; val=[1,1];
i! dilationA: type=int; val=[1,1];
i! mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! Time: 2021-08-24T11:44:20.725623 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnSetFilterNdDescriptor() called:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i! nbDims: type=int; val=4;
i! filterDimA: type=int; val=[24,8,1,1];
i! Time: 2021-08-24T11:44:20.725630 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnSetTensorNdDescriptor() called:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[1,24,1,1];
i! strideA: type=int; val=[24,1,1,1];
i! Time: 2021-08-24T11:44:20.725635 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnSetActivationDescriptor() called:
i! mode: type=cudnnActivationMode_t; val=CUDNN_ACTIVATION_RELU (1);
i! reluNanOpt: type=cudnnNanPropagation_t; val=CUDNN_NOT_PROPAGATE_NAN (0);
i! coef: type=double; val=10000.000000;
i! Time: 2021-08-24T11:44:20.725645 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v7605) function cudnnSetConvolutionMathType() called:
i! mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
i! Time: 2021-08-24T11:44:20.725676 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.
Testing convBiasAct
I! CuDNN (v7605) function cudnnGetConvolutionForwardWorkspaceSize() called:
i! handle: type=cudnnHandle_t; streamId=(nil) (defaultStream);
i! xDesc: type=cudnnTensorDescriptor_t:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[1,8,704,512];
i! strideA: type=int; val=[2883584,1,4096,8];
i! wDesc: type=cudnnFilterDescriptor_t:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[24,8,1,1];
i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i! convDesc: type=cudnnConvolutionDescriptor_t:
i! mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
i! reorderType: type=int; val=0;
i! arrayLength: type=int; val=2;
i! padA: type=int; val=[0,0];
i! strideA: type=int; val=[1,1];
i! dilationA: type=int; val=[1,1];
i! groupCount: type=int; val=1;
i! yDesc: type=cudnnTensorDescriptor_t:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[1,24,704,512];
i! strideA: type=int; val=[8650752,1,12288,24];
i! algo: type=cudnnConvolutionFwdAlgo_t; val=CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM (1);
i! Time: 2021-08-24T11:44:20.725741 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=0; Handle=0x56086bc80140; StreamId=(nil) (defaultStream).
I! CuDNN (v7605) function cudnnConvolutionBiasActivationForward() called:
i! handle: type=cudnnHandle_t; streamId=(nil) (defaultStream);
i! alpha1: type=CUDNN_DATA_FLOAT; val=0.800000;
i! xDesc: type=cudnnTensorDescriptor_t:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[1,8,704,512];
i! strideA: type=int; val=[2883584,1,4096,8];
i! xData: location=dev; addr=0x7f2adb600000;
i! wDesc: type=cudnnFilterDescriptor_t:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[24,8,1,1];
i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i! wData: location=dev; addr=0x7f2adaa2d800;
i! convDesc: type=cudnnConvolutionDescriptor_t:
i! mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
i! reorderType: type=int; val=0;
i! arrayLength: type=int; val=2;
i! padA: type=int; val=[0,0];
i! strideA: type=int; val=[1,1];
i! dilationA: type=int; val=[1,1];
i! groupCount: type=int; val=1;
i! algo: type=cudnnConvolutionFwdAlgo_t; val=CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM (1);
i! workSpace: location=dev; addr=0x7f2ade000000;
i! workSpaceSizeInBytes: type=size_t; val=2162696;
i! alpha2: type=CUDNN_DATA_FLOAT; val=0.000000;
i! zDesc: type=cudnnTensorDescriptor_t:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[1,24,704,512];
i! strideA: type=int; val=[8650752,1,12288,24];
i! zData: location=dev; addr=0x7f2adce00000;
i! biasDesc: type=cudnnTensorDescriptor_t:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[1,24,1,1];
i! strideA: type=int; val=[24,1,1,1];
i! bias: location=dev; addr=0x7f2adaa2dc00;
i! activationDesc: type=cudnnActivationDescriptor_t:
i! coef: type=double; val=10000.000000;
i! mode: type=cudnnActivationMode_t; val=CUDNN_ACTIVATION_RELU (1);
i! reluNanOpt: type=cudnnNanPropagation_t; val=CUDNN_NOT_PROPAGATE_NAN (0);
i! yDesc: type=cudnnTensorDescriptor_t:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[1,24,704,512];
i! strideA: type=int; val=[8650752,1,12288,24];
i! yData: location=dev; addr=0x7f2adbc00000;
i! Time: 2021-08-24T11:44:20.726253 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=0; Handle=0x56086bc80140; StreamId=(nil) (defaultStream).
========= CUDA-MEMCHECK
========= Invalid __global__ read of size 16
========= at 0x00002380 in turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1
========= by thread (127,0,0) in block (67,0,0)
========= Address 0x7f2adaa2dc70 is out of bounds
========= Device Frame:turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 (turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 : 0x2380)
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x2235d8]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b469]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b4f7]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x1671855]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fadb]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fafe]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xa5d77e]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x954cc2]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x95607b]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xd9ddd]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xda2df]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 (cudnnConvolutionBiasActivationForward + 0x879) [0xdb249]
========= Host Frame:./conv_sample [0x12b87]
========= Host Frame:./conv_sample [0xe850]
========= Host Frame:./conv_sample [0x62a7]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21bf7]
========= Host Frame:./conv_sample [0x23ca]
=========
========= Invalid __global__ read of size 16
========= at 0x00002380 in turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1
========= by thread (126,0,0) in block (67,0,0)
========= Address 0x7f2adaa2dc60 is out of bounds
========= Device Frame:turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 (turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 : 0x2380)
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x2235d8]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b469]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b4f7]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x1671855]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fadb]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fafe]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xa5d77e]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x954cc2]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x95607b]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xd9ddd]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xda2df]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 (cudnnConvolutionBiasActivationForward + 0x879) [0xdb249]
========= Host Frame:./conv_sample [0x12b87]
========= Host Frame:./conv_sample [0xe850]
========= Host Frame:./conv_sample [0x62a7]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21bf7]
========= Host Frame:./conv_sample [0x23ca]
=========
========= Invalid __global__ read of size 16
========= at 0x00002380 in turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1
========= by thread (119,0,0) in block (67,0,0)
========= Address 0x7f2adaa2dc70 is out of bounds
========= Device Frame:turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 (turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 : 0x2380)
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x2235d8]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b469]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b4f7]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x1671855]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fadb]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fafe]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xa5d77e]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x954cc2]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x95607b]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xd9ddd]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xda2df]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 (cudnnConvolutionBiasActivationForward + 0x879) [0xdb249]
========= Host Frame:./conv_sample [0x12b87]
========= Host Frame:./conv_sample [0xe850]
========= Host Frame:./conv_sample [0x62a7]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21bf7]
========= Host Frame:./conv_sample [0x23ca]
=========
========= Invalid __global__ read of size 16
========= at 0x00002380 in turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1
========= by thread (118,0,0) in block (67,0,0)
========= Address 0x7f2adaa2dc60 is out of bounds
========= Device Frame:turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 (turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 : 0x2380)
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x2235d8]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b469]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b4f7]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x1671855]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fadb]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fafe]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xa5d77e]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x954cc2]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x95607b]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xd9ddd]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xda2df]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 (cudnnConvolutionBiasActivationForward + 0x879) [0xdb249]
========= Host Frame:./conv_sample [0x12b87]
========= Host Frame:./conv_sample [0xe850]
========= Host Frame:./conv_sample [0x62a7]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21bf7]
========= Host Frame:./conv_sample [0x23ca]
=========