Difference between Engine file output in Google Colab (Python) and Local Machine (C++)

Description

I am learning how to perform inference on the engine file.

To start with,

  1. I created a Pytorch based classification model in Google Colab.

  2. Trained the classification model for MNIST dataset, verified that trained model achieves 95% + accuracy.

  3. Exported the model to ONNX (within google colab), and verified that model is working.

    • At this point I exported the output vector (which is of 10 neurons, so 1x10)
    ** ONNX OUTPUT IN COLAB ** 
    NEURON   OUTPUT_VALUE
     0 --> -8.656696319580078
     1 --> -8.746328353881836
     2 --> -7.678134441375732
     3 --> -8.662064552307129
     4 --> -0.0022408869117498398
     5 --> -8.611289978027344
     6 --> -8.606916427612305
     7 --> -7.697903633117676
     8 --> -8.510454177856445
     9 --> -8.295673370361328
    
  4. Now exported the model TensorRT (again within google colab) and verified that model is working,

    • Just like before, again I exported the output vector and below is the output
    ENGINE FILE COLAB
    NEURON   OUTPUT_VALUE
    0 --> -8.653183937072754
    1 --> -8.742051124572754
    2 --> -7.669453144073486
    3 --> -8.658066749572754
    4 --> -0.0022506495006382465
    5 --> -8.608777046203613
    6 --> -8.602890968322754
    7 --> -7.695576190948486
    8 --> -8.506699562072754
    9 --> -8.293564796447754
    
  5. As we can see, both the outputs closely match. Now I wanted to explore further with C++.

  6. I downloaded this ONNX file into my local Linux machine (whose specifications are given below in Environment section)

  7. Used /usr/src/tensorrt/bin/trtexec --onnx=MNIST_Classifier.onnx -saveEngine=MNIST_Classifier_f16.engine --fp16 to export to engine file.

  8. Now, I used a C++ code to execute the engine file. To my surprise, I got output tensor like this,

    ENGINE FILE LOCAL MACHINE WITH C++
    NEURON   OUTPUT_VALUE
    0 --> -8.794
    1 --> -8.88364
    2 --> -8.26091
    3 --> -8.79937
    4 --> -0.00169589
    5 --> -8.74809
    6 --> -8.74422
    7 --> -8.21721
    8 --> -8.64776
    9 --> -8.37213
    

Basically what I am seeing is ONNX_IN_COLAB == TENSORT_RT_IN_COLAB != TENSORT_RT_IN_LOCAL_MACHINE_C++

Is this difference in output (output tensor) is expected?

Environment

TensorRT Version: 8.4.1
GPU Type: A5000
Nvidia Driver Version:
CUDA Version: 11.6
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): ‘2.0.1+cu117’
Baremetal or Container (if container which image + tag):

Relevant Files

For local machine C++ inference, I used code which is in below link (and modified to suit for classification),
YOLOv8-TensorRT/csrc/detect/normal at main · triple-Mu/YOLOv8-TensorRT (github.com)

Hi,

Slight variations between Colab and your local machine’s TensorRT output are expected due to potential non-determinism and optimization techniques.

Please let us know if the difference is relatively high and impacts accuracy.
We recommend you please try on the latest TensorRT version 10.0.1.

@spolisetty

Thanks for replying. I understand that the difference is expected due to optimization.
And you mean say,
Pytorch to ONNX → Lossless
ONNX to TensorRT → Slightly lossy?

One more question: let’s say that I have many loops (see the representative image below) in my convolutional neural network. In that case, non-determinism and optimization techniques during engine file conversion may result in higher conversion errors, correct?

Please let us know if the difference is relatively high and impacts accuracy.

Thanks & Regards,
Aravind