Tensorrt C++ not working as python version and gives wrong results

saeed.masoomi2013 · July 21, 2023, 11:00am

Description

I Converted TensorFlow weights to the ONNX model and tried it in C++. But the C++ results didn’t match the Python ones. So, I checked the Python version using TensorRT, and it worked right. You can check this too - I’ve put both Python and C++ codes in a GitHub repository. There, you can also find steps to try it yourself. I’ve made the example as simple as possible, so I think the issue is probably with TensorRT. Weights included in there.

Environment

TensorRT Version: TensorRT-8.4.3.1, also two different version tested
GPU Type: 2080ti
Nvidia Driver Version: 536.67
CUDA Version: 11.5
CUDNN Version: cudnn-windows-x86_64-8.6.0.163_cuda11
Operating System + Version: Windows 10
Python Version (if applicable): 3.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

spolisetty · July 21, 2023, 11:47am

Hi,

We recommend that you please try the latest TensorRT version 8.6.1 and let us know if you still face the same issue. Also, please make sure your inference script and if your are doing post processing it is correct.

Thank you.

saeed.masoomi2013 · July 21, 2023, 11:53am

@spolisetty I appreciate your response. For your information, I have attempted multiple versions like TensorRT-8.6.1.6 and 8.2. Furthermore, to make testing straightforward, I have simplified the script, eliminating any pre-processing and post-processing in this scenario. The input binding will be filled with constant variable to make sure that we are not doing something wrong between python and c++.

python input data:

    # data feed, I've supposed that all of the image pixels after preprocessing is -0.99609375
    image = np.ones((1, 112, 112, 3), np.float32) * -0.99609375

C++ Input data:

    // Allocate memory on the GPU for the input and output data
    float *input_data;
    float *output_data;
    int input_size = 1 * 112 * 112 * 3; // As per your Python code
    int output_size = 512; // You need to set the output size here
    cudaMalloc((void **) &input_data, input_size * sizeof(float));
    cudaMalloc((void **) &output_data, output_size * sizeof(float));

    // Set the input data
    float input_value = -0.99609375;
    cudaMemset(input_data, input_value, input_size * sizeof(float));

Full C++ code:

#include <iostream>
#include <fstream>
#include <NvInfer.h>
#include <cuda_runtime_api.h>
#include "NvOnnxParser.h"

using namespace nvinfer1;

// Simple Logger for TensorRT
class Logger : public nvinfer1::ILogger {
public:
    void log(Severity severity, const char *msg) noexcept override {
        // suppress info-level messages
        std::cout << msg << std::endl;
    }
} gLogger;

int main() {
    std::string engine_file = "Example.engine";

    // Create a TensorRT runtime
    IRuntime *runtime = createInferRuntime(gLogger);

    // Read the engine file
    std::ifstream engineStream(engine_file, std::ios::binary);
    std::string engineString((std::istreambuf_iterator<char>(engineStream)), std::istreambuf_iterator<char>());
    engineStream.close();

    // Deserialize the engine
    ICudaEngine *engine = runtime->deserializeCudaEngine(engineString.data(), engineString.size(), nullptr);

    // Create an execution context
    IExecutionContext *context = engine->createExecutionContext();

    // Allocate memory on the GPU for the input and output data
    float *input_data;
    float *output_data;
    int input_size = 1 * 112 * 112 * 3;
    int output_size = 512;
    cudaMalloc((void **) &input_data, input_size * sizeof(float));
    cudaMalloc((void **) &output_data, output_size * sizeof(float));

    // Set the input data
    float input_value = -0.99609375;
    cudaMemset(input_data, input_value, input_size * sizeof(float));

    // Set up the execution bindings
    void *bindings[2] = {input_data, output_data};

    // Run inference
    context->executeV2(bindings);

    // Copy the output data back to the host
    float *host_output = new float[output_size];
    cudaMemcpy(host_output, output_data, output_size * sizeof(float), cudaMemcpyDeviceToHost);

    // Print the output data
    for (int i = 0; i < 10; ++i) {
        std::cout << host_output[i] << " \n";
    }
    std::cout << std::endl;

    // Clean up
    cudaFree(input_data);
    cudaFree(output_data);
    delete[] host_output;
    context->destroy();
    engine->destroy();
    runtime->destroy();

    return 0;
}

saeed.masoomi2013 · July 21, 2023, 2:44pm

Using cudaMemset in here was wrong and now the C++ also generate same responses.

system · August 7, 2023, 4:11am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TensorRT 8 : C++ inference gives different results compared to tensorflow python inference TensorRT	7	1356	October 5, 2021
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	956	September 29, 2022
Incorrect inference in TensorRT compared to the Tensorflow inference TensorRT tensorrt	3	762	March 10, 2022
Module 'tensorrt' has no attribute 'Logger' TensorRT	4	1291	January 3, 2023
Terrible scaling behavior of TensorRT using C++ API TensorRT tensorrt , cudnn	5	34	March 6, 2025
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5390	June 29, 2022
TensorRT gives diffent results than ONNX and Pytorch TensorRT	8	1556	September 28, 2023
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1087	December 13, 2022
Could not infer onnx model for TensorrtExecutionProvider provider TensorRT tensorrt , onnx	1	1167	November 11, 2022
TensorRT ( C++ ) inference strange behavior on Jetson AGX Xavier TensorRT cudnn	0	18	January 15, 2025