Cudnn 7.6.5 conv_sample running in half + nhwc + tensorcore + 2080ti, it gives out of bound error

Hi guys,

I am trying cudnn 7.6.5 conv_sample with following command:

/usr/local/cuda/bin/cuda-memcheck ./conv_sample -convBiasAct -mathType1 -filterFormat1 -dataType1 -c8 -h704 -w512 -k24 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1 -x

which is the scenario we have in our own code (half + nhwc + tensorcore). Then it gives error like:

CUDNN error at conv_sample.cpp:872, code=9 (CUDNN_STATUS_NOT_SUPPORTED) in 'cudnnConvolutionBiasActivationForward(handle_, (void*)(&alpha), cudnnIdesc, devPtrI, cudnnFdesc, devPtrF, cudnnConvDesc, algo, workSpace, workSpaceSize, (void*)(&beta), cudnnOdesc, devPtrZ, cudnnBiasdesc, devPtrBias, activationDesc, cudnnOdesc, devPtrO)'

I checked conv_sample code and find out it hardcoded CUDNN_DATA_FLOAT in line 1392 and line 1396. Although I don’t know why, but I replace them with CUDNN_DATA_HALF and run it one more time. Then it can execute but gives me out of bound error which is exactly as what we have in our own code:

I! CuDNN (v7605) function cudnnConvolutionBiasActivationForward() called:
i!     handle: type=cudnnHandle_t; streamId=(nil) (defaultStream);
i!     alpha1: type=CUDNN_DATA_FLOAT; val=0.800000;
i!     xDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,8,704,512];
i!         strideA: type=int; val=[2883584,1,4096,8];
i!     xData: location=dev; addr=0x7f6bc0000000;
i!     wDesc: type=cudnnFilterDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[24,8,1,1];
i!         format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     wData: location=dev; addr=0x7f6bdc02d800;
i!     convDesc: type=cudnnConvolutionDescriptor_t:
i!         mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
i!         reorderType: type=int; val=0;
i!         arrayLength: type=int; val=2;
i!         padA: type=int; val=[0,0];
i!         strideA: type=int; val=[1,1];
i!         dilationA: type=int; val=[1,1];
i!         groupCount: type=int; val=1;
i!     algo: type=cudnnConvolutionFwdAlgo_t; val=CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM (1);
i!     workSpace: location=dev; addr=0x7f6bbf200000;
i!     workSpaceSizeInBytes: type=size_t; val=2162696;
i!     alpha2: type=CUDNN_DATA_FLOAT; val=0.000000;
i!     zDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,24,704,512];
i!         strideA: type=int; val=[8650752,1,12288,24];
i!     zData: location=dev; addr=0x7f6bbe000000;
i!     biasDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,24,1,1];
i!         strideA: type=int; val=[24,1,1,1];
i!     bias: location=dev; addr=0x7f6bdc02dc00;
i!     activationDesc: type=cudnnActivationDescriptor_t: 
i!         coef: type=double; val=10000.000000;
i!         mode: type=cudnnActivationMode_t; val=CUDNN_ACTIVATION_RELU (1);
i!         reluNanOpt: type=cudnnNanPropagation_t; val=CUDNN_NOT_PROPAGATE_NAN (0);
i!     yDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,24,704,512];
i!         strideA: type=int; val=[8650752,1,12288,24];
i!     yData: location=dev; addr=0x7f6bc0600000;
i! Time: 2021-08-03T17:15:38.077186 (0d+0h+0m+9s since start)
i! Process=81411; Thread=81411; GPU=0; Handle=0x5579a18b5390; StreamId=(nil) (defaultStream).

========= Invalid __global__ read of size 16
=========     at 0x00002380 in turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1
=========     by thread (63,0,0) in block (67,0,0)
=========     Address 0x7f6bdc02dc70 is out of bounds
=========     Device Frame:turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 (turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 : 0x2380)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x2235d8]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b469]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b4f7]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x1671855]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fadb]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fafe]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xa5d77e]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x954cc2]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x95607b]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xd9ddd]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xda2df]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 (cudnnConvolutionBiasActivationForward + 0x879) [0xdb249]
=========     Host Frame:./conv_sample [0x12b87]
=========     Host Frame:./conv_sample [0xe850]
=========     Host Frame:./conv_sample [0x62a7]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21bf7]
=========     Host Frame:./conv_sample [0x23ca]
=========

I can tell it goes out of bound on bias. But I didn’t see any document talking about how to handle this, either padding or what. Even if I pad the bias to 256 aligned ( in our own code ), it still goes out of bound in some other cases. What is root cause of this error and what the best way to handle this?

Thanks

1 Like

Hi @zandwork ,
Can you please share the detailed API logs with us.

Thanks!

Hi there, sure, pls take a look at the log below.
BTW I have to change the hardcoded CUDNN_DATA_FLOAT to CUDNN_DATA_HALF in line 1392 and line 1396 of the example to make it work.
Thanks

/cudnn_samples_v7/conv_sample$ /usr/local/cuda/bin/cuda-memcheck ./conv_sample -convBiasAct -mathType1 -filterFormat1 -dataType1 -c8 -h704 -w512 -k24 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1 -x 

====USER DIMENSIONS====
input dims are 1, 8, 704, 512
filter dims are 24, 8, 1, 1
output dims are 1, 24, 704, 512
====PADDING DIMENSIONS====
padded input dims are 1, 8, 704, 512
padded filter dims are 24, 8, 1, 1
padded output dims are 1, 24, 704, 512

I! CuDNN (v7605) function cudnnCreate() called:
i! Time: 2021-08-24T11:44:12.717629 (0d+0h+0m+0s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnCreateTensorDescriptor() called:
i! Time: 2021-08-24T11:44:20.364463 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnCreateFilterDescriptor() called:
i! Time: 2021-08-24T11:44:20.364501 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnCreateTensorDescriptor() called:
i! Time: 2021-08-24T11:44:20.364506 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnCreateTensorDescriptor() called:
i! Time: 2021-08-24T11:44:20.364510 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnCreateActivationDescriptor() called:
i! Time: 2021-08-24T11:44:20.364514 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnCreateConvolutionDescriptor() called:
i! Time: 2021-08-24T11:44:20.364519 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnSetTensorNdDescriptor() called:
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!     nbDims: type=int; val=4;
i!     dimA: type=int; val=[1,8,704,512];
i!     strideA: type=int; val=[2883584,1,4096,8];
i! Time: 2021-08-24T11:44:20.725572 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnSetTensorNdDescriptor() called:
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!     nbDims: type=int; val=4;
i!     dimA: type=int; val=[1,24,704,512];
i!     strideA: type=int; val=[8650752,1,12288,24];
i! Time: 2021-08-24T11:44:20.725616 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnSetConvolutionNdDescriptor() called:
i!     arrayLength: type=int; val=2;
i!     padA: type=int; val=[0,0];
i!     strideA: type=int; val=[1,1];
i!     dilationA: type=int; val=[1,1];
i!     mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i! Time: 2021-08-24T11:44:20.725623 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnSetFilterNdDescriptor() called:
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!     format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     nbDims: type=int; val=4;
i!     filterDimA: type=int; val=[24,8,1,1];
i! Time: 2021-08-24T11:44:20.725630 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnSetTensorNdDescriptor() called:
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!     nbDims: type=int; val=4;
i!     dimA: type=int; val=[1,24,1,1];
i!     strideA: type=int; val=[24,1,1,1];
i! Time: 2021-08-24T11:44:20.725635 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnSetActivationDescriptor() called:
i!     mode: type=cudnnActivationMode_t; val=CUDNN_ACTIVATION_RELU (1);
i!     reluNanOpt: type=cudnnNanPropagation_t; val=CUDNN_NOT_PROPAGATE_NAN (0);
i!     coef: type=double; val=10000.000000;
i! Time: 2021-08-24T11:44:20.725645 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v7605) function cudnnSetConvolutionMathType() called:
i!     mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
i! Time: 2021-08-24T11:44:20.725676 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=NULL; Handle=NULL; StreamId=NULL.

Testing convBiasAct

I! CuDNN (v7605) function cudnnGetConvolutionForwardWorkspaceSize() called:
i!     handle: type=cudnnHandle_t; streamId=(nil) (defaultStream);
i!     xDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,8,704,512];
i!         strideA: type=int; val=[2883584,1,4096,8];
i!     wDesc: type=cudnnFilterDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[24,8,1,1];
i!         format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     convDesc: type=cudnnConvolutionDescriptor_t:
i!         mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
i!         reorderType: type=int; val=0;
i!         arrayLength: type=int; val=2;
i!         padA: type=int; val=[0,0];
i!         strideA: type=int; val=[1,1];
i!         dilationA: type=int; val=[1,1];
i!         groupCount: type=int; val=1;
i!     yDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,24,704,512];
i!         strideA: type=int; val=[8650752,1,12288,24];
i!     algo: type=cudnnConvolutionFwdAlgo_t; val=CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM (1);
i! Time: 2021-08-24T11:44:20.725741 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=0; Handle=0x56086bc80140; StreamId=(nil) (defaultStream).


I! CuDNN (v7605) function cudnnConvolutionBiasActivationForward() called:
i!     handle: type=cudnnHandle_t; streamId=(nil) (defaultStream);
i!     alpha1: type=CUDNN_DATA_FLOAT; val=0.800000;
i!     xDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,8,704,512];
i!         strideA: type=int; val=[2883584,1,4096,8];
i!     xData: location=dev; addr=0x7f2adb600000;
i!     wDesc: type=cudnnFilterDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[24,8,1,1];
i!         format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     wData: location=dev; addr=0x7f2adaa2d800;
i!     convDesc: type=cudnnConvolutionDescriptor_t:
i!         mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
i!         reorderType: type=int; val=0;
i!         arrayLength: type=int; val=2;
i!         padA: type=int; val=[0,0];
i!         strideA: type=int; val=[1,1];
i!         dilationA: type=int; val=[1,1];
i!         groupCount: type=int; val=1;
i!     algo: type=cudnnConvolutionFwdAlgo_t; val=CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM (1);
i!     workSpace: location=dev; addr=0x7f2ade000000;
i!     workSpaceSizeInBytes: type=size_t; val=2162696;
i!     alpha2: type=CUDNN_DATA_FLOAT; val=0.000000;
i!     zDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,24,704,512];
i!         strideA: type=int; val=[8650752,1,12288,24];
i!     zData: location=dev; addr=0x7f2adce00000;
i!     biasDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,24,1,1];
i!         strideA: type=int; val=[24,1,1,1];
i!     bias: location=dev; addr=0x7f2adaa2dc00;
i!     activationDesc: type=cudnnActivationDescriptor_t: 
i!         coef: type=double; val=10000.000000;
i!         mode: type=cudnnActivationMode_t; val=CUDNN_ACTIVATION_RELU (1);
i!         reluNanOpt: type=cudnnNanPropagation_t; val=CUDNN_NOT_PROPAGATE_NAN (0);
i!     yDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,24,704,512];
i!         strideA: type=int; val=[8650752,1,12288,24];
i!     yData: location=dev; addr=0x7f2adbc00000;
i! Time: 2021-08-24T11:44:20.726253 (0d+0h+0m+8s since start)
i! Process=6390; Thread=6390; GPU=0; Handle=0x56086bc80140; StreamId=(nil) (defaultStream).

========= CUDA-MEMCHECK
========= Invalid __global__ read of size 16
=========     at 0x00002380 in turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1
=========     by thread (127,0,0) in block (67,0,0)
=========     Address 0x7f2adaa2dc70 is out of bounds
=========     Device Frame:turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 (turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 : 0x2380)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x2235d8]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b469]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b4f7]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x1671855]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fadb]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fafe]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xa5d77e]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x954cc2]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x95607b]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xd9ddd]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xda2df]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 (cudnnConvolutionBiasActivationForward + 0x879) [0xdb249]
=========     Host Frame:./conv_sample [0x12b87]
=========     Host Frame:./conv_sample [0xe850]
=========     Host Frame:./conv_sample [0x62a7]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21bf7]
=========     Host Frame:./conv_sample [0x23ca]
=========
========= Invalid __global__ read of size 16
=========     at 0x00002380 in turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1
=========     by thread (126,0,0) in block (67,0,0)
=========     Address 0x7f2adaa2dc60 is out of bounds
=========     Device Frame:turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 (turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 : 0x2380)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x2235d8]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b469]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b4f7]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x1671855]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fadb]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fafe]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xa5d77e]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x954cc2]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x95607b]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xd9ddd]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xda2df]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 (cudnnConvolutionBiasActivationForward + 0x879) [0xdb249]
=========     Host Frame:./conv_sample [0x12b87]
=========     Host Frame:./conv_sample [0xe850]
=========     Host Frame:./conv_sample [0x62a7]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21bf7]
=========     Host Frame:./conv_sample [0x23ca]
=========
========= Invalid __global__ read of size 16
=========     at 0x00002380 in turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1
=========     by thread (119,0,0) in block (67,0,0)
=========     Address 0x7f2adaa2dc70 is out of bounds
=========     Device Frame:turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 (turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 : 0x2380)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x2235d8]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b469]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b4f7]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x1671855]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fadb]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fafe]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xa5d77e]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x954cc2]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x95607b]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xd9ddd]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xda2df]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 (cudnnConvolutionBiasActivationForward + 0x879) [0xdb249]
=========     Host Frame:./conv_sample [0x12b87]
=========     Host Frame:./conv_sample [0xe850]
=========     Host Frame:./conv_sample [0x62a7]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21bf7]
=========     Host Frame:./conv_sample [0x23ca]
=========
========= Invalid __global__ read of size 16
=========     at 0x00002380 in turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1
=========     by thread (118,0,0) in block (67,0,0)
=========     Address 0x7f2adaa2dc60 is out of bounds
=========     Device Frame:turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 (turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_interior_nhwc_tn_v1 : 0x2380)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x2235d8]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b469]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x163b4f7]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x1671855]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fadb]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x119fafe]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xa5d77e]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x954cc2]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0x95607b]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xd9ddd]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 [0xda2df]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcudnn.so.7 (cudnnConvolutionBiasActivationForward + 0x879) [0xdb249]
=========     Host Frame:./conv_sample [0x12b87]
=========     Host Frame:./conv_sample [0xe850]
=========     Host Frame:./conv_sample [0x62a7]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21bf7]
=========     Host Frame:./conv_sample [0x23ca]
=========
1 Like