Cudnn's TensorOP AUTO CONVERT not work on Xavier in RNN(LSTM) model.

Hi There,

According this blog : https://devblogs.nvidia.com/tensor-ops-made-easier-in-cudnn/ and this document in cudnn(https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#tensor-ops-rnn-functions),
There is a easy way to use float16 TensorCore in Xavier Volta GPU, CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION flag can convert float32 into float16 data , and using tensorCore do the compute.

I tried the above method, and the convolution part works, I will put the modified cudnn samples code later.

the profiler shows:

when compute conv, the kernel changed from

volta_scudnn_128x64_relu_interior_nn_v1

to

volta_s884cudnn_fp16_256x64_ldg8_relu_exp_interior_nhwc2nchw_tn_v1
volta_fp16_s884cudnn_fp16_256x64_ldg8_relu_f2f_exp_interior_nhwc2nchw_tn_v1

Which means AUTO CONVERT worked, and TensorCore is utilized.

But when In try this method in RNN mode, it always failed, the proflier show there no half kernel in invoke during RNN compute.
the FC kkernel between LSTM Cell is still:

"volta_sgemm_128x64_nn"

My input size, hidden size, batch size already aligned with 8, as the document required.

Because Cudnn is like a black box to me, so could someone help on this ?

The modified cudnn sample code, can download with this link: https://www.dropbox.com/s/up18uoyq9szwaqx/cudnn_samples_v7_auto_conv_tc.tgz?dl=0:

The RNN running command:

cd RNN; 
make clean;
make 
nvprof ./RNN 24 8 512 64 2

Machine: Xavier
Ubuntu 18.04.2 LTS \n \l
Linux jetson-0423218010724 4.9.108-tegra #1 SMP PREEMPT Wed Oct 31 15:17:21 PDT 2018 aarch64 aarch64 aarch64 GNU/Linux

cudnn_samples_v7_auto_conv_tc.tgz.zip (6.87 MB)

Hi,

Thanks for your issue.
We will check this and update more information with you later.

Hi,

We just got the feedback from our internal team. This issue is not a bug.

You will need to call the cudnnSetRNNMatrixMathType after cudnnSetRNNDescriptor_v6.
Since cudnnSetRNNDescriptor_v6 will re-initialize the field into CUDNN_DEFAULT_MATH.

You can find this information here:
https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#cudnnSetRNNMatrixMathType

<b>rnnDesc</b>
    Input. A previously created and initialized RNN descriptor.

Thanks.