cuDNN crash in v8.1.x

Hi all,

Recently I have faced with the following errors in cuDNN 8.1.x:

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x7fff29ae3590

Thread 39 "<app>" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 444169, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 3, lane 0]
0x00007fff29ae3788 in void LSTM_elementWise_fp<float, float, float, (cudnnRNNBiasMode_t)1>(int, int, int, int, float const*, float const*, float const*, float const*, cudnn::reduced_divisor, float*, float*, float*, float const*, float*, bool, int, cudnnRNNClipMode_t, cudnnNanPropagation_t, float, float)<<<(1,1,1),(256,1,1)>>> ()

and

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x7fff2ae1d418

Thread 39 "<app>" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 3, grid 21015, block (72,0,0), thread (224,0,0), device 0, sm 0, warp 6, lane 0]
0x00007fff2ae1d5d8 in void GRU_elementWise_fp<float, float, float, (cudnnRNNBiasMode_t)2>(int, int, int, int, float const*, float const*, float const*, float const*, cudnn::reduced_divisor, float*, float const*, float*, float*, bool, bool, int)<<<(8,9,1),(128,1,1)>>> ()

The reason I think it is related to cuDNN is that these issues is not reproducible on cuDNN 8.2.x

Have you seen something like this on cuDNN 8.1.x ?

Hi @redradist
Could you please share any repro script and detailed logs so we can help better?

Thanks

Hi @SunilJB

One of my teammates created small sample to reproduce this issue, see in attachment

gru-test.zip (5.7 KB)

@SunilJB Any update on this issue ?

Hi,

Could you please confirm if you’re still facing this issue on 8.3.1, which is the latest public version of cuDNN.

Thank you.