Memory corruption after calling cudnnRNNForward with CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED layout (CUDA 11.0 + cuDNN 8.0.5)

franck.zibi · November 28, 2020, 4:37pm

Hello,

I am facing a Device memory corruption when trying to use the ‘cudnnRNNForward’ method (available in cuDNN 8)
with a data layout set to CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED (cudnnRNNDataLayout_t).

I have no issues when using the CUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_UNPACKED layout.

We can use a very small input/output to reproduce the issue, as described below:

CUDA 11.0 on Windows 10 / x64
- cuda_11.0.1_451.22_win10.exe
cuDNN 8.0.5
- cudnn-11.0-windows-x64-v8.0.5.39.zip
simple unidirectional RNN (no LSTM /no GRU)
input ‘x’ shape for the cudnnRNNForward method: (1,1,1)
- timeSteps == batchSize == inputSize == 1
output ‘y’ shape for the cudnnRNNForward method: (1,1,1)
- timeSteps == batchSize == hiddenSize == 1
pseudo code to create the cudnnRNNDescriptor_t:
- cudnnCreateRNNDescriptor(cudnnRNNDescriptor_t* rnnDesc)
- cudnnSetRNNDescriptor_v8(rnnDesc, CUDNN_RNN_ALGO_STANDARD, CUDNN_RNN_TANH, CUDNN_RNN_SINGLE_INP_BIAS, CUDNN_UNIDIRECTIONAL, CUDNN_LINEAR_INPUT, CUDNN_DATA_FLOAT, CUDNN_DATA_FLOAT, CUDNN_DEFAULT_MATH, 1, 1, 1, 1, 0.0, CUDNN_RNN_PADDED_IO_ENABLED);
pseudo code to create the ‘cudnnRNNDataDescriptor_t’ associate with ‘x’ and ‘y’
- cudnnCreateRNNDataDescriptor(cudnnRNNDataDescriptor_t * RNNDataDesc)
- cudnnSetRNNDataDescriptor(RNNDataDesc, CUDNN_DATA_FLOAT, CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED , 1, 1, 1, [1], &paddingFill);
  where:
  - [1] is an array of 1 integer element containing ‘1’
  - paddingFill is a float with 0;
in the cudnnRNNForward call:
- ‘cudnnForwardMode_t fwdMode’ parameter is set to CUDNN_FWD_MODE_INFERENCE
- parameters releated to LSTM are let empty (null), we are only using a simple unidirectional RNN

For such a set up (with timeSteps == batchSize == inputSize == hiddenSize == 1),
I would have expected the 2 data layouts (CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED & CUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_UNPACKED)
to behave the same in cudnnRNNForward method,

but it doesn’t seems to be the case:

CUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_UNPACKED is working OK (both inference & training)
API log: ok_with_CUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_UNPACKED.zip (2.6 KB)
It is also working OK with all other input shapes I tried
CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED is not working (device memory corruption issues)
API log: ko_with_CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED.zip (1.5 KB)
It is also not working with all other shapes I tried

If you have already used CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED layout with the cudnnRNNForward method,
I would be very thankful for your help.

Thank you.

Franck

AakankshaS · December 1, 2020, 6:01am

Hi @franck.zibi,
Your concern has been reported, and the team is looking into this.
Please allow us some time.

Thanks!

AakankshaS · December 2, 2020, 9:13am

Hi @franck.zibi
Can you please share the code with us.

Thanks!

franck.zibi · December 2, 2020, 7:48pm

Hello Aakanksha,

The code is in an open source project on GitHub .

The relevant parts:

creation of the cudnnRNNDescriptor_t
- This method is called from here
  where auxFlags is set to CUDNN_RNN_PADDED_IO_ENABLED
creation of the cudnnRNNDataDescriptor_t
- taking into account the value of cudnnRNNDataLayout_t
call of the cudnnRNNForward method

If you have a small C/C++ standalone project using cuDNN8, I would be happy to update it to (try) to reproduce the issue.

Thank you.

Franck

AakankshaS · December 3, 2020, 5:55pm

Hi @franck.zibi,
Can you please check if devSeqLengths (here call of the cudnnRNNForward method) is allocated on the host or GPU?

Thanks!

franck.zibi · December 5, 2020, 8:14am

Hello Aakanksha,

Thanks for pointing me on the right direction:

when I allocate devSeqLengths on the device (instead of the host) everything is working fine !

If someone face the same issue:

parameter devSeqLengths should be allocated on the device (GPU) in methods:
parameter seqLengthArray should be allocate on the host (CPU) in method:
- cudnnSetRNNDataDescriptor

Again, thanks for your accurate and fast answer.

Franck