Linux Distro and version: Arch Linux 5.2.13-arch1-1-ARCH
GPU type: GeForce RTX 2080
nvidia driver version: 435.21
CUDA version: 10.0
cuDNN version: 7.4.2
Python version: 3.7.4
TensorFlow version: 1.14
TensorRT version: 184.108.40.206
I’m trying to use Mozilla DeepSpeech with TensorRT.
I’ve set up the RNN so that it uses CudnnCompatibleLSTMCell since that is listed as one of the only two TensorFlow LSTM layer’s supported by TensorRT according to section 220.127.116.11 of the TensorRT Developer Guide.
My exported model file converts fine to UFF using convert-to-uff however there are incompatible layers. The strange thing is the incompatibilities come directly from LSTMBlockCell, which is the direct parent class for CudnnCompatibleLSTMCell. And in fact, nearly the entire definition for CudnnCompatibleLSTMCell comes from LSTMBlockCell since CudnnCompatibleLSTMCell is a very thin layer on top which only defines an init() to call to the parent class with some explicit parameters.
First when converting to UFF I received this warning:
Warning: No conversion function registered for layer: Fill yet.
Converting cudnn_lstm/rnn/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/zeros as custom op: Fill
Then when attempting to create a TensorRT engine file an error happens on the call to UffParser.parse():
[TensorRT] ERROR: UffParser: Validator error: cudnn_lstm/rnn/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/zeros: Unsupported operation _Fill
CudnnCompatibleLSTMCell always sets use_peephole to False explicitly on the call to the parent class LSTMBlockCell.init(). This is expected as the docs clearly say CudnnCompatibleLSTMCell must always have use_peephole set False.
I was trying to figure out where this call to zeros was even coming from. Tracing into the TensorFlow source in LSTMBlockCell I noticed that in the case where use_peephole is set to False, on subsequent calls to LSTMBlockCell.call() it always initializes all of the peephole related tensors to zeros using this line:
wci = wcf = wco = array_ops.zeros([self._num_units], dtype=self.dtype)
So I tested changing this line to:
wci = wcf = wco = None
The result was that the zeros/Fill layers were no longer generated in the UFF file so I got past that error.
Now I am encountering this warning on UFF conversion:
Warning: No conversion function registered for layer: LSTMBlockCell yet.
Converting cudnn_lstm/rnn/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/LSTMBlockCell as custom op: LSTMBlockCell
And of course when trying to generate an engine on the UFF file generated I get a different error now on the call to UffParser.parse().
[TensorRT] ERROR: UffParser: Validator error: cudnn_lstm/rnn/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/LSTMBlockCell: Unsupported operation _LSTMBlockCell
I don’t know how to resolve this one.
My question is, why does it appear that one of the only two TensorFlow LSTM classes that is supposed to be TensorRT Compatible seems to just be completely incompatible? If CudnnCompatibleLSTMCell is supposed to be compatible with TensorRT, and it is only a thin layer subclass extending LSTMBlockCell then how would LSTMBlockCell not be compatible?
Do you know how I can resolve this?