Output tensor values are different from Deepstream and keras, python TensorRt implementation

• Hardware Platform (Jetson / GPU) Jetson Nano 4Gb Dev Kit
• DeepStream Version 5.1
• JetPack Version (valid for Jetson only) 4.5.1 rev1
• TensorRT Version 7.1.3

I have a modified MobileNetV2 model with structure is like this

Layer (type)                 Output Shape              Param #   
mobilenetv2_1.00_224 (Functi (None, 7, 7, 1280)        2257984   
average_pooling2d_18 (Averag (None, 1, 1, 1280)        0         
flatten_13 (Flatten)         (None, 1280)              0         
dense_20 (Dense)             (None, 256)               327936    
dropout_10 (Dropout)         (None, 256)               0         
dense_21 (Dense)             (None, 4)                 1028      
Total params: 2,586,948
Trainable params: 2,552,836
Non-trainable params: 34,112

I am trying to infer from this model in deepstream and the values i get from the output are very different from the values I get when I infer with keras on python or TensorRt on python.
For a measure of the difference I am considering the RMSE (root mean square error)value of a set of 740 frames.

DeepStream vs TensorRT on Python
avg rmse of 740 frames=1.96015
max rmse of 740 frames=2.54657

Keras python vs TensorRT on Python
avg rmse of 740 frames=0.0087
max rmse of 740 frames=0.016 

nvposrt mobilenet - Google Drive this the link to the drive folder with the deepstream app, model file, engine and inference scripts. It also contains the TRT engine conversion script by the name dottrtbuild.py.


Could you check if the pre-processing steps in dstest1_pgie_config.txt is identical to the TensorRT case first.
In Deepstream, the equation is y = net scale factor*(x-mean):


I have set the netscalefactor=1 and offsets=0;0;0 that is no preprocessing but still the RMSE is high. Even in the TensorRT case there is no preprocessing and raw rgb inputs are being fed to the engine.

To check what is being fed to the model i made a dummy neural network like this

Layer (type)                 Output Shape              Param #   
activation (Activation)      (None, 224, 224, 3)       0         
Total params: 0
Trainable params: 0
Non-trainable params: 0

which return the input as output. I converted it to TensorRT engine and deployed it in all the scripts and got the output images from keras, tensorrt and deepstream.
Keras and tensorrt gave the exact same image and their rmse was 0.
Deepstream had a very large rmse of 2000+ when compared with both. The image from Deepstream was different as well. note: the netscalingfactor=1 and offsets=0;0;0 in this case as well.
On TensorRT
Screenshot from 2021-11-10 11-02-43

On DeepStream
Screenshot from 2021-11-10 11-02-24

on deepstream its gray and 9 frames appear.

@AastaLLL any update on this ?


It’s also recommended to check the down-sample process from the video size into the network input size.
Please set the compute-hw=1 and enable-padding=0 to see if it helps.
(This indicates to use GPU for scaling and doesn’t preserve the aspect ration.)


checked nvstreammux and also set compute-hw=1 and enable-padding=0 but the results are still the same.

@AastaLLL any update on this ?


Thanks for the testing.

We need more investigate to figure out the issue.
Will share more information with you after checking the source shared above.

@AastaLLL any update ?


Thanks for your patience.

We have checked your source and found out you are using the NHWC ONNX model.
Please note that Deepstream expects model to be NCHW format.
(While TensorRT can support both NCHW and NHWC format)

Could you convert the model into NCHW format and try it again?
This can be done by the --inputs_as_nchw config in tf2onnx .

If the issue goes on, since TensorRT doesn’t support portability, could you share the ONNX model with us as well?


will check and let you know soon

I have been using NCHW for deepstream all along since the previous drive link was for the moblenetv2 model and since then i have tried the the simple model where input is the same as output i am attaching a new drive link inputoutputdiscrepency - Google Drive
this contains
1.Deepstream app with simple model
2.simple model .h5, .onnx, .trt
3. DSTRTBS1.txt the model output file from deepstream
4. frametest.py the scrip to view the output from DSTRTBS1.txt
5. keras inference
6. tensorrt inference

note: the deepstream script uses videotestsrc for input.

@AastaLLL if it speeds the debugging i can make a new post with the observation from the simple model.
I believe its this issue which is causing the the high rms of the mobilenetv2 model. So if we can answer the simple model then i will get an answer to the mobilenetv2 high rmse issue.


Do you use the same TensorRT engine for TensorRT and Deepstream?

In your TRTInferance.py, the buffer is arranged in the NHWC format.

#function to get image from local directory 
def get_img_tensor(i):
    newsize = (224, 224,3)
    a = np.resize(a,newsize)
    return a
while success:
    trt_predictions = predict(get_img_tensor(image)).astype(np.float32)

That’s why we think the engine require NHWC format to output the correct result.


no, tenosrrt uses the engine with NHWC and DeepStream uses NCHW. they both are different engines with the same model.


Could you try TensorRT + NCHW to see if the result is correct first?

sure will do and get back

TensorRT + NCHW works fine i am getting the right output.


Thanks for the experiment.

Could you update the source accordingly?
So we can check it with the same NCHW engine directly.

i have updated the same on the drive for NCHW