Problem in accuracy and performance in conversion from keras to tensorrt model for production

yugal.jain1999 · May 30, 2021, 7:24pm

Hello folks,
I recently trained keras model and weights are saved in .h5 format and then I converted those weights into engine file and loaded into tensorrt runtime but results and performance I am getting from tensorrt runtime is not good at all as compare to results from trained keras model.
Can you please tell me the possible reason behind this? It would be really helpful.

Thanks

mjain · June 1, 2021, 7:01am

Moved to the TensorRT forum

NVES · June 1, 2021, 7:37am

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance

Thanks!

yugal.jain1999 · June 1, 2021, 8:15am

Hello NVES,
Thanks for your reply.

I have exactly referred this article by NVIDIA to first convert .pb weights to .onnx format and then .onnx to .plan/.engine to build engine and Tensorrt inference.

My model is basically violence detection where labels is binary 0 or 1(i.e. if there is fight or non-fight after analyzing some frames i.e. 20 frames during inference and predict output probabilities using softmax function) but when I inference with .plan/.engine file it gives predictions for just class 0 means non fight always even some fight frames are there in testing video.

So I was wondered that how can it possible that during tensorrt model will not give good results as I was getting with keras. That’s why raised this issue in forums.

Can you help me now? That would be really appreciable as this model itself is necessary to deploy in production on NVIDIA jetson nano so I need to pass test cases during testing phase before deploying this to production.

Thanks

spolisetty · June 1, 2021, 8:40am

Hi @yugal.jain1999,

We recommend you to please share issue reproducible ONNX model and scripts/steps to try from our end. Also please let us know the environment details.
TensorRT Version :
GPU Type :
Nvidia Driver Version :
CUDA Version :
CUDNN Version :
Operating System + Version :
Python Version (if applicable) :
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) :

Thank you.

yugal.jain1999 · June 1, 2021, 9:29am

Yeah sure,
I was using google colab to convert .pb to .onnx and further to .plan/.engine for tensorrt version.
TensorRT Version : 7.1.3.4
GPU Type :(Colab GPU)
Nvidia Driver Version : 460.32.03
CUDA Version : 10.2
CUDNN Version : 8.0
Operating System + Version : Linux-5.4.109±x86_64-with-Ubuntu-18.04-bionic
Python Version (if applicable) : Python 3.7
TensorFlow Version (if applicable) : Tensorflow 2.3.0
PyTorch Version (if applicable) : Not applicable(using tensorflow model in .pb format intially)
Baremetal or Container (if container which image + tag) : Not applicable

Scripts -

To convert .pb to .onnx - Run this command
python -m tf2onnx.convert --input /content/mdl6_wts.pb --inputs input_1:0 --outputs dense_12/Sigmoid:0 --output mdl6_wts.onnx
To convert .onnx to .plan - Used onnx_to_plan.py script ( I have attached that file below)
After then I ran inference script - I am attaching inference.py script below where I have defined allocate_buffers function and tried to do final inference. I am attaching test video yts.mp4 also so that you can test on your end.

onnx_to_plan.py (2.0 KB)
inference.py (6.6 KB)

Can you now please help me?

Thanks

spolisetty · June 2, 2021, 5:57pm

Hi @yugal.jain1999,

Thank you for sharing. Meanwhile could you please try using TensorRT latest version 8.0 and let us know if you still face this issue.

yugal.jain1999 · June 2, 2021, 6:09pm

Hello @spolisetty,
But I think it’s not TensorRT version problem because the version I mentioned worked perfectly and did it’s work while building engine.
May I know is there any specific reason to try with TensorRT version 8.0 to solve this problem?

spolisetty · June 3, 2021, 6:18am

Hi @yugal.jain1999,

You’re using old version of TensorRT. In later releases performance related and other issues have been fixed. That’s why we recommend you to test on latest version and let us know if you still face this issue.

Thank you.

yugal.jain1999 · June 3, 2021, 6:24am

Okay so latest tensorrt version is compatible with prior requirements like tf version, cuda version, cudnn version I am using?

spolisetty · June 3, 2021, 6:30am

Please refer Support Matrix :: NVIDIA Deep Learning TensorRT Documentation to check requirements.

yugal.jain1999 · June 3, 2021, 1:45pm

Hello @spolisetty
I tried with latest version of Tensorrt 8.0.0.3 EA also, still same problem.
Can you tell me what I should do now? Have you gone through the scripts I sent you and tried to find loophole?
It would be really appreciable if you help me in this.
Thanks

yugal.jain1999 · June 4, 2021, 11:25am

Hello @spolisetty
I am getting one more minor issue along with performance issue i.e. crashing session every time I try to load engine, allocate buffers again in memory and try to do predictions in tensorrt inference in python. Can you please tell me how can I free cuda memory allocated to engine after every running time or something else memory freaking solution?

Thanks

spolisetty · June 8, 2021, 5:42am

Hi @yugal.jain1999,

The destructor __del__() in python API is equivalent to the C++ destroy() for freeing cuda memory allocated to engine.
Inference script looks normal.
Could you please let us know have you tested model using onnx-runtime, using onnx-runtime as well are you facing accuracy problem or only after converting to the TensorRT ?
It would be helpful to check if this is onnx model conversion problem.

Please share ONNX model as well.

Thank you.

yugal.jain1999 · June 8, 2021, 6:48am

Hey @spolisetty
So I don’t need to add destroy() function for freeing cuda memory allocated to engine?

I didn’t try inferencing model with onnx-runtime.
I converted that onnx to tensorrt model and then tested because later I want to infer tensorrt model into NVIDIA deepstream as well.

You can infer with onnx model if you want to test it. I am sharing onnx model below.
I am looking forward to get response from your side soon.

Thanks

mdl6_wts.onnx (824 KB)

spolisetty · June 8, 2021, 7:15am

@yugal.jain1999,

Yes, as it is a destructor shouldn’t be a problem of freeing. Will try from our end as well. Meanwhile we recommend you to please test inference with onnx-runtime as well, as I mentioned in my previous reply it would be helpful to isolate if it a onnx conversion problem before generating the engine.

Thank you.

yugal.jain1999 · June 8, 2021, 8:02am

Okay sure, I try with onnx runtime and test inference.
And you please see the issue in freeing CUDA, if you can figure it out.

Thanks

yugal.jain1999 · June 8, 2021, 9:14am

@spolisetty There is same problem of accuracy in onnx runtime inference like I was facing with tensorrt. It always predict first label as output but it should predict second label as well like I got with inference with .hdf5 model.

So now it’s onnx conversion problem ?
To convert model to .onnx from .pb I have used tf2onnx
I ran this command to convert from .pb to .onnx - python -m tf2onnx.convert --input /content/mdl6_wts.pb --inputs input_1:0 --outputs dense_12/Sigmoid:0 --output mdl6_wts.onnx

Now what should we do?
PS - I first converted .hdf5 to .pb then .onnx and then .plan/.engine.

Thanks

spolisetty · June 8, 2021, 12:44pm

@yugal.jain1999,

Then it looks like model conversion problem. Have you tried directly converting Keras to ONNX. This may help you. If you still face this issue after this conversion as well, recommend you to post your concern in their git issues.

Once ONNX problem resolved if you still face this issue w.r.t TensorRT, we can help you.

Thank you.

yugal.jain1999 · June 8, 2021, 1:47pm

Hey @spolisetty

Using keras2onnx , onnx model is now working fine.

Now what should we do to solve tensorrt inference problem and CUDA out of memory problem while running in google colab?

One more small query - To deploy model in Nvidia Jetson Nano using deepstream , we must need tensorrt model to convert into deepstream compatible model for further deploying or we can directly convert onnx to deepstream model to deploy on Nvidia Jetson Nano?

Thanks

Topic		Replies	Views
PyTorch model loosing accuracy when converting to TensorRT TensorRT tensorrt	10	3051	July 26, 2021
model accuracy penalty with tensorRT on jetson TX2 TensorRT	0	857	June 7, 2019
Wrong inference in TensorRT after convert keras model to TensorRT Jetson Nano	10	1990	October 14, 2021
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1280	January 19, 2022
Onnx -> tensorrt fp32 conversion performance degradation different outputs TensorRT tensorrt , pytorch , onnx	4	2241	November 29, 2022
TensorRT Engine gives incorrect inference output for segmentation model TensorRT	6	1428	October 12, 2021
Pytorch -> ONNX -> TensorRT inference with terrible accuracy (int64 clamped to int32) TensorRT cudnn	2	1544	January 23, 2024
Differences between tensorflow model inference and tensorRT model inference TensorRT tensorrt , tensorflow	6	2005	September 14, 2022
tensorRT inference unstable compared onnxruntime TensorRT	4	1454	May 4, 2021
Keras->ONXX->TensorRT TensorRT	10	1630	October 12, 2021

Problem in accuracy and performance in conversion from keras to tensorrt model for production

Related topics