Problem in accuracy and performance in conversion from keras to tensorrt model for production

Hello folks,
I recently trained keras model and weights are saved in .h5 format and then I converted those weights into engine file and loaded into tensorrt runtime but results and performance I am getting from tensorrt runtime is not good at all as compare to results from trained keras model.
Can you please tell me the possible reason behind this? It would be really helpful.

Thanks

Moved to the TensorRT forum

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance

Thanks!

Hello NVES,
Thanks for your reply.

I have exactly referred this article by NVIDIA to first convert .pb weights to .onnx format and then .onnx to .plan/.engine to build engine and Tensorrt inference.

My model is basically violence detection where labels is binary 0 or 1(i.e. if there is fight or non-fight after analyzing some frames i.e. 20 frames during inference and predict output probabilities using softmax function) but when I inference with .plan/.engine file it gives predictions for just class 0 means non fight always even some fight frames are there in testing video.

So I was wondered that how can it possible that during tensorrt model will not give good results as I was getting with keras. That’s why raised this issue in forums.

Can you help me now? That would be really appreciable as this model itself is necessary to deploy in production on NVIDIA jetson nano so I need to pass test cases during testing phase before deploying this to production.

Thanks

Hi @yugal.jain1999,

We recommend you to please share issue reproducible ONNX model and scripts/steps to try from our end. Also please let us know the environment details.
TensorRT Version :
GPU Type :
Nvidia Driver Version :
CUDA Version :
CUDNN Version :
Operating System + Version :
Python Version (if applicable) :
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) :

Thank you.

Yeah sure,
I was using google colab to convert .pb to .onnx and further to .plan/.engine for tensorrt version.
TensorRT Version : 7.1.3.4
GPU Type :(Colab GPU)
Nvidia Driver Version : 460.32.03
CUDA Version : 10.2
CUDNN Version : 8.0
Operating System + Version : Linux-5.4.109±x86_64-with-Ubuntu-18.04-bionic
Python Version (if applicable) : Python 3.7
TensorFlow Version (if applicable) : Tensorflow 2.3.0
PyTorch Version (if applicable) : Not applicable(using tensorflow model in .pb format intially)
Baremetal or Container (if container which image + tag) : Not applicable

Scripts -

  • To convert .pb to .onnx - Run this command
    python -m tf2onnx.convert --input /content/mdl6_wts.pb --inputs input_1:0 --outputs dense_12/Sigmoid:0 --output mdl6_wts.onnx

  • To convert .onnx to .plan - Used onnx_to_plan.py script ( I have attached that file below)

  • After then I ran inference script - I am attaching inference.py script below where I have defined allocate_buffers function and tried to do final inference. I am attaching test video yts.mp4 also so that you can test on your end.

onnx_to_plan.py (2.0 KB)
inference.py (6.6 KB)

Can you now please help me?

Thanks

Hi @yugal.jain1999,

Thank you for sharing. Meanwhile could you please try using TensorRT latest version 8.0 and let us know if you still face this issue.

Hello @spolisetty,
But I think it’s not TensorRT version problem because the version I mentioned worked perfectly and did it’s work while building engine.
May I know is there any specific reason to try with TensorRT version 8.0 to solve this problem?

Hi @yugal.jain1999,

You’re using old version of TensorRT. In later releases performance related and other issues have been fixed. That’s why we recommend you to test on latest version and let us know if you still face this issue.

Thank you.

Okay so latest tensorrt version is compatible with prior requirements like tf version, cuda version, cudnn version I am using?

Please refer Support Matrix :: NVIDIA Deep Learning TensorRT Documentation to check requirements.

Hello @spolisetty
I tried with latest version of Tensorrt 8.0.0.3 EA also, still same problem.
Can you tell me what I should do now? Have you gone through the scripts I sent you and tried to find loophole?
It would be really appreciable if you help me in this.
Thanks

Hello @spolisetty
I am getting one more minor issue along with performance issue i.e. crashing session every time I try to load engine, allocate buffers again in memory and try to do predictions in tensorrt inference in python. Can you please tell me how can I free cuda memory allocated to engine after every running time or something else memory freaking solution?

Thanks

Hi @yugal.jain1999,

The destructor __del__() in python API is equivalent to the C++ destroy() for freeing cuda memory allocated to engine.
Inference script looks normal.
Could you please let us know have you tested model using onnx-runtime, using onnx-runtime as well are you facing accuracy problem or only after converting to the TensorRT ?
It would be helpful to check if this is onnx model conversion problem.

Please share ONNX model as well.

Thank you.

Hey @spolisetty
So I don’t need to add destroy() function for freeing cuda memory allocated to engine?

I didn’t try inferencing model with onnx-runtime.
I converted that onnx to tensorrt model and then tested because later I want to infer tensorrt model into NVIDIA deepstream as well.

You can infer with onnx model if you want to test it. I am sharing onnx model below.
I am looking forward to get response from your side soon.

Thanks

mdl6_wts.onnx (824 KB)

@yugal.jain1999,

Yes, as it is a destructor shouldn’t be a problem of freeing. Will try from our end as well. Meanwhile we recommend you to please test inference with onnx-runtime as well, as I mentioned in my previous reply it would be helpful to isolate if it a onnx conversion problem before generating the engine.

Thank you.

Okay sure, I try with onnx runtime and test inference.
And you please see the issue in freeing CUDA, if you can figure it out.

Thanks

@spolisetty There is same problem of accuracy in onnx runtime inference like I was facing with tensorrt. It always predict first label as output but it should predict second label as well like I got with inference with .hdf5 model.

So now it’s onnx conversion problem ?
To convert model to .onnx from .pb I have used tf2onnx
I ran this command to convert from .pb to .onnx - python -m tf2onnx.convert --input /content/mdl6_wts.pb --inputs input_1:0 --outputs dense_12/Sigmoid:0 --output mdl6_wts.onnx

Now what should we do?
PS - I first converted .hdf5 to .pb then .onnx and then .plan/.engine.

Thanks

@yugal.jain1999,

Then it looks like model conversion problem. Have you tried directly converting Keras to ONNX. This may help you. If you still face this issue after this conversion as well, recommend you to post your concern in their git issues.

Once ONNX problem resolved if you still face this issue w.r.t TensorRT, we can help you.

Thank you.

Hey @spolisetty

Using keras2onnx , onnx model is now working fine.

Now what should we do to solve tensorrt inference problem and CUDA out of memory problem while running in google colab?

One more small query - To deploy model in Nvidia Jetson Nano using deepstream , we must need tensorrt model to convert into deepstream compatible model for further deploying or we can directly convert onnx to deepstream model to deploy on Nvidia Jetson Nano?

Thanks