Cudnn's TensorOP AUTO CONVERT not work on Xavier in RNN(LSTM) model.

Hi There,

According this blog : and this document in cudnn(,
There is a easy way to use float16 TensorCore in Xavier Volta GPU, CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION flag can convert float32 into float16 data , and using tensorCore do the compute.

I tried the above method, and the convolution part works, I will put the modified cudnn samples code later.

the profiler shows:

when compute conv, the kernel changed from




Which means AUTO CONVERT worked, and TensorCore is utilized.

But when In try this method in RNN mode, it always failed, the proflier show there no half kernel in invoke during RNN compute.
the FC kkernel between LSTM Cell is still:


My input size, hidden size, batch size already aligned with 8, as the document required.

Because Cudnn is like a black box to me, so could someone help on this ?

The modified cudnn sample code, can download with this link:

The RNN running command:

cd RNN; 
make clean;
nvprof ./RNN 24 8 512 64 2

Machine: Xavier
Ubuntu 18.04.2 LTS \n \l
Linux jetson-0423218010724 4.9.108-tegra #1 SMP PREEMPT Wed Oct 31 15:17:21 PDT 2018 aarch64 aarch64 aarch64 GNU/Linux (6.87 MB)


Thanks for your issue.
We will check this and update more information with you later.


We just got the feedback from our internal team. This issue is not a bug.

You will need to call the cudnnSetRNNMatrixMathType after cudnnSetRNNDescriptor_v6.
Since cudnnSetRNNDescriptor_v6 will re-initialize the field into CUDNN_DEFAULT_MATH.

You can find this information here:

    Input. A previously created and initialized RNN descriptor.