I Converted TensorFlow weights to the ONNX model and tried it in C++. But the C++ results didn’t match the Python ones. So, I checked the Python version using TensorRT, and it worked right. You can check this too - I’ve put both Python and C++ codes in a GitHub repository. There, you can also find steps to try it yourself. I’ve made the example as simple as possible, so I think the issue is probably with TensorRT. Weights included in there.
Environment
TensorRT Version: TensorRT-8.4.3.1, also two different version tested GPU Type: 2080ti Nvidia Driver Version: 536.67 CUDA Version: 11.5 CUDNN Version: cudnn-windows-x86_64-8.6.0.163_cuda11 Operating System + Version: Windows 10 Python Version (if applicable): 3.9 TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
We recommend that you please try the latest TensorRT version 8.6.1 and let us know if you still face the same issue. Also, please make sure your inference script and if your are doing post processing it is correct.
@spolisetty I appreciate your response. For your information, I have attempted multiple versions like TensorRT-8.6.1.6 and 8.2. Furthermore, to make testing straightforward, I have simplified the script, eliminating any pre-processing and post-processing in this scenario. The input binding will be filled with constant variable to make sure that we are not doing something wrong between python and c++.
python input data:
# data feed, I've supposed that all of the image pixels after preprocessing is -0.99609375
image = np.ones((1, 112, 112, 3), np.float32) * -0.99609375
C++ Input data:
// Allocate memory on the GPU for the input and output data
float *input_data;
float *output_data;
int input_size = 1 * 112 * 112 * 3; // As per your Python code
int output_size = 512; // You need to set the output size here
cudaMalloc((void **) &input_data, input_size * sizeof(float));
cudaMalloc((void **) &output_data, output_size * sizeof(float));
// Set the input data
float input_value = -0.99609375;
cudaMemset(input_data, input_value, input_size * sizeof(float));
Full C++ code:
#include <iostream>
#include <fstream>
#include <NvInfer.h>
#include <cuda_runtime_api.h>
#include "NvOnnxParser.h"
using namespace nvinfer1;
// Simple Logger for TensorRT
class Logger : public nvinfer1::ILogger {
public:
void log(Severity severity, const char *msg) noexcept override {
// suppress info-level messages
std::cout << msg << std::endl;
}
} gLogger;
int main() {
std::string engine_file = "Example.engine";
// Create a TensorRT runtime
IRuntime *runtime = createInferRuntime(gLogger);
// Read the engine file
std::ifstream engineStream(engine_file, std::ios::binary);
std::string engineString((std::istreambuf_iterator<char>(engineStream)), std::istreambuf_iterator<char>());
engineStream.close();
// Deserialize the engine
ICudaEngine *engine = runtime->deserializeCudaEngine(engineString.data(), engineString.size(), nullptr);
// Create an execution context
IExecutionContext *context = engine->createExecutionContext();
// Allocate memory on the GPU for the input and output data
float *input_data;
float *output_data;
int input_size = 1 * 112 * 112 * 3;
int output_size = 512;
cudaMalloc((void **) &input_data, input_size * sizeof(float));
cudaMalloc((void **) &output_data, output_size * sizeof(float));
// Set the input data
float input_value = -0.99609375;
cudaMemset(input_data, input_value, input_size * sizeof(float));
// Set up the execution bindings
void *bindings[2] = {input_data, output_data};
// Run inference
context->executeV2(bindings);
// Copy the output data back to the host
float *host_output = new float[output_size];
cudaMemcpy(host_output, output_data, output_size * sizeof(float), cudaMemcpyDeviceToHost);
// Print the output data
for (int i = 0; i < 10; ++i) {
std::cout << host_output[i] << " \n";
}
std::cout << std::endl;
// Clean up
cudaFree(input_data);
cudaFree(output_data);
delete[] host_output;
context->destroy();
engine->destroy();
runtime->destroy();
return 0;
}