Memory corruption after calling cudnnRNNForward with CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED layout (CUDA 11.0 + cuDNN 8.0.5)

Hello,

I am facing a Device memory corruption when trying to use the ‘cudnnRNNForward’ method (available in cuDNN 8)
with a data layout set to CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED (cudnnRNNDataLayout_t).

I have no issues when using the CUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_UNPACKED layout.

We can use a very small input/output to reproduce the issue, as described below:

  • CUDA 11.0 on Windows 10 / x64

    • cuda_11.0.1_451.22_win10.exe
  • cuDNN 8.0.5

    • cudnn-11.0-windows-x64-v8.0.5.39.zip
  • simple unidirectional RNN (no LSTM /no GRU)

  • input ‘x’ shape for the cudnnRNNForward method: (1,1,1)

    • timeSteps == batchSize == inputSize == 1
  • output ‘y’ shape for the cudnnRNNForward method: (1,1,1)

    • timeSteps == batchSize == hiddenSize == 1
  • pseudo code to create the cudnnRNNDescriptor_t:

    • cudnnCreateRNNDescriptor(cudnnRNNDescriptor_t* rnnDesc)
    • cudnnSetRNNDescriptor_v8(rnnDesc, CUDNN_RNN_ALGO_STANDARD, CUDNN_RNN_TANH, CUDNN_RNN_SINGLE_INP_BIAS, CUDNN_UNIDIRECTIONAL, CUDNN_LINEAR_INPUT, CUDNN_DATA_FLOAT, CUDNN_DATA_FLOAT, CUDNN_DEFAULT_MATH, 1, 1, 1, 1, 0.0, CUDNN_RNN_PADDED_IO_ENABLED);
  • pseudo code to create the ‘cudnnRNNDataDescriptor_t’ associate with ‘x’ and ‘y’

    • cudnnCreateRNNDataDescriptor(cudnnRNNDataDescriptor_t * RNNDataDesc)
    • cudnnSetRNNDataDescriptor(RNNDataDesc, CUDNN_DATA_FLOAT, CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED , 1, 1, 1, [1], &paddingFill);
      where:
      • [1] is an array of 1 integer element containing ‘1’
      • paddingFill is a float with 0;
  • in the cudnnRNNForward call:

    • ‘cudnnForwardMode_t fwdMode’ parameter is set to CUDNN_FWD_MODE_INFERENCE
    • parameters releated to LSTM are let empty (null), we are only using a simple unidirectional RNN

For such a set up (with timeSteps == batchSize == inputSize == hiddenSize == 1),
I would have expected the 2 data layouts (CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED & CUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_UNPACKED)
to behave the same in cudnnRNNForward method,

but it doesn’t seems to be the case:

If you have already used CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED layout with the cudnnRNNForward method,
I would be very thankful for your help.

Thank you.

Franck

Hi @franck.zibi,
Your concern has been reported, and the team is looking into this.
Please allow us some time.

Thanks!

Hi @franck.zibi
Can you please share the code with us.

Thanks!

Hello Aakanksha,

The code is in an open source project on GitHub .

The relevant parts:

If you have a small C/C++ standalone project using cuDNN8, I would be happy to update it to (try) to reproduce the issue.

Thank you.

Franck

Hi @franck.zibi,
Can you please check if devSeqLengths (here call of the cudnnRNNForward method) is allocated on the host or GPU?

Thanks!

Hello Aakanksha,

Thanks for pointing me on the right direction:

  • when I allocate devSeqLengths on the device (instead of the host) everything is working fine !

If someone face the same issue:

Again, thanks for your accurate and fast answer.

Franck