Recurrent convolution

unrealdev · January 13, 2020, 1:28pm

Hey guys,

I’m trying to get a simple RCNN to run in TensorRT 7.0 (or 6.0, tried both) in C++ using ONNX parser.

I want to make sure if there is an out of the box way of doing this using any of the frameworks or writing a custom layer in C++ is the way to go?

I’ve tried a lot of combinations - all of them working in the frameworks but not parsing in TRT, lets take one for an example.

Using Tensorflow 1.14 (tried also with 1.15, 2.0, 2.1) and Keras 2.3.1 (and tf.keras for TF 2.0+) for training and ONNX/keras2onnx 1.6 for exporting with opset anywhere from 8 to 11, this example is failing to parse on the TRT C++ side: https://keras.io/examples/conv_lstm/

You can break down the above architecture from the link to the most basic one, with one ConvLSTM2D layer or even use a Reshape and use a vanilla LSTM layer (parser failing with [Transpose]: ERROR: builtin_op_importers.cpp:1928 In function importTranspose: [8] Assertion failed: perm.order[BATCH_DIM] == BATCH_DIM) all fails to parse an exported and validated/checked ONNX file with a few different errors. Changing opset at ONNX export may change the error reported, but it still fails, the same applies for changing versions of TF, ONNX, keras2onnx, also tried changing data format (channel first/last) but it has no effect on the errors (as expected).

This is true for using a functional or sequential style to create the model in keras with explicitly specified batch. I’ve also tried the same architecture in PyTorch (1.2, 1.3) and Tensorflow alone with similar results.

Thank you for you time.

NVES_R · January 13, 2020, 9:11pm

Hi unrealdev,

Can you share a some of these ONNX models so I can better debug the issues? Or perhaps the scripts to produce the ONNX models + commands to run, whatever’s easier.

TF 1.14
Keras 2.3.1
PyTorch 1.3

unrealdev · January 14, 2020, 10:58am

Of course! I’ve send you a PM with a link to download the models.

NVES_R · January 16, 2020, 3:20am

Hi unrealdev,

Sorry for the delay.

I tried a few things with each of your models using trtexec in TensorRT 7.

TensorRT 7 Release

# Try each model with TensorRT 7 Release
for m in *.onnx; do trtexec --onnx=$m --explicitBatch 2>&1 | tee /workspace/$m.log; done

TensorRT 7 Release + Updating OSS ONNX Parser

# Try each model with TensorRT 7 Release + update OSS ONNX Parser
wget https://raw.githubusercontent.com/rmccorm4/tensorrt-utils/20.01/OSS/build_OSS.sh
source build_OSS.sh
for m in *.onnx; do trtexec --onnx=$m --explicitBatch 2>&1 | tee /workspace/OSS_$m.log; done

TensorRT 7 Release + Updating OSS ONNX Parser + Running onnx-simplifier on the model beforehand

# Try each model with TensorRT 7 Release + update OSS ONNX Parser + run onnx-simplifier on them
wget https://raw.githubusercontent.com/rmccorm4/tensorrt-utils/20.01/OSS/build_OSS.sh
source build_OSS.sh
pip3 install onnx-simplifier
for m in *.onnx; do python3 -m onnxsim $m /workspace/simple.$m; trtexec --onnx=/workspace/simple.$m --explicitBatch 2>&1 | tee /workspace/onnxsim_OSS_$m.log; done

These models PASSED all permutations:

Keras_2.3.1_AE_LSTM*.onnx
TF_1.14_RAE*.onnx

These models FAILED all permutations:

Keras_2.3.1_ConvLSTM2D*.onnx
PyTorch_1.3.1_RAE*.onnx

Due to this similar issue (Upsample) How can I use onnx parser with opset 11 ? · Issue #284 · NVIDIA/TensorRT · GitHub, I expected the PyTorch models to pass after building OSS ONNX Parser or running onnx-simplifier like in this comment: https://github.com/NVIDIA/TensorRT/issues/284#issuecomment-572835659, however they still failed, not sure why exactly. For some reason, it seems like PyTorch is producing some weird ONNX graphs when it comes to Upsample / Resize ops, not sure if torch 1.4 or nightly or something produces a different ONNX graph - might be worth checking.

Here are the logs if interested: https://drive.google.com/file/d/114SExuzb9dGur2nMOZsSbiKBGaIA2W5O/view?usp=sharing

unrealdev · January 16, 2020, 4:52pm

Hi NVES_R!

Thank you for all the information! Very peculiar.

I went ahead and created an Amazon EC2 p3 (Tesla V100 GPU) Ubuntu 18.04 instance with the latest pre-installed CUDA and ML components to try to replicate your results. There I used nvidia docker and the latest GPU cloud TensorRT container (docker pull nvcr.io/nvidia/tensorrt:19.12-py3) and failed to replicate your results, unfortunately I ended up replicating the results from my local Windows machine.

Would you be willing to connect to this cloud instance via SSH if I give you access?
Maybe you could try to find key environment differences between your machine in the above examples and this instance? Might totally solve my problem as my local machine has similar configuration and the same errors.

Please let me know, thanks!

NVES_R · January 16, 2020, 7:26pm

Hi unrealdev,

nvcr.io/nvidia/tensorrt:19.12-py3 is still running TensorRT 6. I believe the next release 20.01 will have TensorRT 7. I think each version is usually released around the ~25th of the month.

Can you try doing tar.gz install of TensorRT 7 on your AWS instance?

unrealdev · January 16, 2020, 10:26pm

Ouu! Updated it and I can confirm now that I get the same results as you!

This is fantastic. I’ll have another way to debug my Windows system state now. This lead me to recheck my TRT 7 on Win, looked fine, turns out I solved the problem by copying TRT 7 dll files to my application’s directory and that got me the same results from the parser. I guess something somewhere was linked to the older version (even though my VS paths and env. variables look right), maybe I copied some files somewhere sometime and they were getting picked up, doesn’t matter, I’ll figure that tiny thing out now that I know where to look.

I did transfer resulting trt models from AWS to my local machine, but it doesn’t deserialize successfully on Windows.

C:\source\rtSafe\coreReadArchive.cpp (38) - Serialization Error in nvinfer1::rt::CoreReadArchive::verifyHeader: 0 (Version tag does not match)
INVALID_STATE: Unknown exception
INVALID_CONFIG: Deserialize the cuda engine failed.

Which I think is expected behavior (trt models not cross platform). Just pointing out for anyone coming across this post.

Let us consider this matter resolved. Thank you for your help.

NVES_R · January 17, 2020, 1:07am

Yeah I would think that’s expected. Glad it worked out.

unrealdev · January 17, 2020, 1:11pm

It was my TensorRT 6 for Tensorflow 1.14 that was getting loaded instead of what I thought it was/should. After I sorted everything out, I was getting a crash at createExecutionContext() function. Which was resolved by updating my logger code with the new code from TRT 7 “sampleONNXMNIST” sample (parameters then change from gLogger to gLogger.getTRTLogger()). Everything runs smoothly now, hopefully all of this helps someone else, cheers!

Topic		Replies	Views
Running a pytorch network converted to ONNX with TensorRT on the TX2 Jetson TX2	24	8807	October 18, 2021
TensorRT 8 : C++ inference gives different results compared to tensorflow python inference TensorRT	7	1328	October 5, 2021
Converting TensorFlow autoencoder decoder to TensorRT engine via UFF TensorRT	1	1153	February 11, 2020
Problem converting TensorFlow 2-> ONNX model to TensorRT Engine (efficientdet_d0) TensorRT	8	1376	November 17, 2022
ONNX -> TensorRT convertAxis assertion failed TensorRT	15	2821	May 18, 2020
Troubleshooting Suggestions for ONNX v. TensorRT discrepancies TensorRT	7	1836	October 12, 2021
Keras CRNN model conversion to tensorrt engine error TensorRT tensorrt , tensorflow , onnx	3	954	April 8, 2022
UFF deprecation - Are there good alternatives for the TensorFlow workflow? TensorRT	5	1433	February 13, 2020
TensorRT with onnx model TensorRT tensorrt , tensorflow , onnx	7	1464	September 2, 2021
ONNX model and TensorRT engine works differently TensorRT	5	706	February 20, 2023

Recurrent convolution

These models PASSED all permutations:

These models FAILED all permutations:

Related topics