Recurrent convolution

Hey guys,

I’m trying to get a simple RCNN to run in TensorRT 7.0 (or 6.0, tried both) in C++ using ONNX parser.

I want to make sure if there is an out of the box way of doing this using any of the frameworks or writing a custom layer in C++ is the way to go?

I’ve tried a lot of combinations - all of them working in the frameworks but not parsing in TRT, lets take one for an example.

Using Tensorflow 1.14 (tried also with 1.15, 2.0, 2.1) and Keras 2.3.1 (and tf.keras for TF 2.0+) for training and ONNX/keras2onnx 1.6 for exporting with opset anywhere from 8 to 11, this example is failing to parse on the TRT C++ side: https://keras.io/examples/conv_lstm/

You can break down the above architecture from the link to the most basic one, with one ConvLSTM2D layer or even use a Reshape and use a vanilla LSTM layer (parser failing with [Transpose]: ERROR: builtin_op_importers.cpp:1928 In function importTranspose: [8] Assertion failed: perm.order[BATCH_DIM] == BATCH_DIM) all fails to parse an exported and validated/checked ONNX file with a few different errors. Changing opset at ONNX export may change the error reported, but it still fails, the same applies for changing versions of TF, ONNX, keras2onnx, also tried changing data format (channel first/last) but it has no effect on the errors (as expected).

This is true for using a functional or sequential style to create the model in keras with explicitly specified batch. I’ve also tried the same architecture in PyTorch (1.2, 1.3) and Tensorflow alone with similar results.

Thank you for you time.

Hi unrealdev,

Can you share a some of these ONNX models so I can better debug the issues? Or perhaps the scripts to produce the ONNX models + commands to run, whatever’s easier.

TF 1.14
Keras 2.3.1
PyTorch 1.3

Of course! I’ve send you a PM with a link to download the models.

Hi unrealdev,

Sorry for the delay.

I tried a few things with each of your models using trtexec in TensorRT 7.

  1. TensorRT 7 Release
# Try each model with TensorRT 7 Release
for m in *.onnx; do trtexec --onnx=$m --explicitBatch 2>&1 | tee /workspace/$m.log; done
  1. TensorRT 7 Release + Updating OSS ONNX Parser
# Try each model with TensorRT 7 Release + update OSS ONNX Parser
wget https://raw.githubusercontent.com/rmccorm4/tensorrt-utils/20.01/OSS/build_OSS.sh
source build_OSS.sh
for m in *.onnx; do trtexec --onnx=$m --explicitBatch 2>&1 | tee /workspace/OSS_$m.log; done
  1. TensorRT 7 Release + Updating OSS ONNX Parser + Running onnx-simplifier on the model beforehand
# Try each model with TensorRT 7 Release + update OSS ONNX Parser + run onnx-simplifier on them
wget https://raw.githubusercontent.com/rmccorm4/tensorrt-utils/20.01/OSS/build_OSS.sh
source build_OSS.sh
pip3 install onnx-simplifier
for m in *.onnx; do python3 -m onnxsim $m /workspace/simple.$m; trtexec --onnx=/workspace/simple.$m --explicitBatch 2>&1 | tee /workspace/onnxsim_OSS_$m.log; done

These models PASSED all permutations:

Keras_2.3.1_AE_LSTM*.onnx
TF_1.14_RAE*.onnx

These models FAILED all permutations:

Keras_2.3.1_ConvLSTM2D*.onnx
PyTorch_1.3.1_RAE*.onnx

Due to this similar issue https://github.com/NVIDIA/TensorRT/issues/284, I expected the PyTorch models to pass after building OSS ONNX Parser or running onnx-simplifier like in this comment: https://github.com/NVIDIA/TensorRT/issues/284#issuecomment-572835659, however they still failed, not sure why exactly. For some reason, it seems like PyTorch is producing some weird ONNX graphs when it comes to Upsample / Resize ops, not sure if torch 1.4 or nightly or something produces a different ONNX graph - might be worth checking.

Here are the logs if interested: https://drive.google.com/file/d/114SExuzb9dGur2nMOZsSbiKBGaIA2W5O/view?usp=sharing

Hi NVES_R!

Thank you for all the information! Very peculiar.

I went ahead and created an Amazon EC2 p3 (Tesla V100 GPU) Ubuntu 18.04 instance with the latest pre-installed CUDA and ML components to try to replicate your results. There I used nvidia docker and the latest GPU cloud TensorRT container (docker pull nvcr.io/nvidia/tensorrt:19.12-py3) and failed to replicate your results, unfortunately I ended up replicating the results from my local Windows machine.

Would you be willing to connect to this cloud instance via SSH if I give you access?
Maybe you could try to find key environment differences between your machine in the above examples and this instance? Might totally solve my problem as my local machine has similar configuration and the same errors.

Please let me know, thanks!

Hi unrealdev,

nvcr.io/nvidia/tensorrt:19.12-py3 is still running TensorRT 6. I believe the next release 20.01 will have TensorRT 7. I think each version is usually released around the ~25th of the month.

Can you try doing tar.gz install of TensorRT 7 on your AWS instance?

Ouu! Updated it and I can confirm now that I get the same results as you!

This is fantastic. I’ll have another way to debug my Windows system state now. This lead me to recheck my TRT 7 on Win, looked fine, turns out I solved the problem by copying TRT 7 dll files to my application’s directory and that got me the same results from the parser. I guess something somewhere was linked to the older version (even though my VS paths and env. variables look right), maybe I copied some files somewhere sometime and they were getting picked up, doesn’t matter, I’ll figure that tiny thing out now that I know where to look.

I did transfer resulting trt models from AWS to my local machine, but it doesn’t deserialize successfully on Windows.

C:\source\rtSafe\coreReadArchive.cpp (38) - Serialization Error in nvinfer1::rt::CoreReadArchive::verifyHeader: 0 (Version tag does not match)
INVALID_STATE: Unknown exception
INVALID_CONFIG: Deserialize the cuda engine failed.

Which I think is expected behavior (trt models not cross platform). Just pointing out for anyone coming across this post.

Let us consider this matter resolved. Thank you for your help.

Yeah I would think that’s expected. Glad it worked out.

It was my TensorRT 6 for Tensorflow 1.14 that was getting loaded instead of what I thought it was/should. After I sorted everything out, I was getting a crash at createExecutionContext() function. Which was resolved by updating my logger code with the new code from TRT 7 “sampleONNXMNIST” sample (parameters then change from gLogger to gLogger.getTRTLogger()). Everything runs smoothly now, hopefully all of this helps someone else, cheers!