Python serialized TensorRT engine output wrong data at TensorRT C++ runtime

imugly1029 · April 19, 2020, 4:24pm

Description

Hi! I have referred this and here is my case:
I use torch2trt to transfer my PyTorch model (ERFNet) to TensorRT engine.
Since the TensorRT api in torch2trt require version higher than 5, I modified some of the api to fit my version TensorRT 4 because this version is compatible with my DRIVE PX2 (driveworks-1.2, CUDA-9.2) in which I want to port my TensorRT engine.
I think i can transfer the model successfully and below is my model’s architecture output from torch2trt.

torch.Tensor.get_device
torch.nn.Conv2d.forward
torch.nn.functional.max_pool2d
torch.cat
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.max_pool2d
torch.cat
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.max_pool2d
torch.cat
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.ConvTranspose2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.ConvTranspose2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.ConvTranspose2d.forward
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Dropout2d.forward
torch.nn.Conv2d.forward
torch.nn.functional.softmax
torch.nn.functional.max_pool2d
torch.Tensor.view
torch.nn.Linear.forward
torch.nn.functional.relu
torch.nn.Linear.forward
torch.nn.functional.sigmoid

The model’s input size is (1 * 3 * 208 * 976) and outsize is (1 * 5 * 202 * 970 + 4).
I use tensorrt.utils.write_engine_to_file to save my serialized engine.
After generating engine, I want to deserialize it and inference in C++.

std::cout << "*** Deserializing ***" << std::endl;
mTrtRunTime = createInferRuntime(gLogger);
assert(mTrtRunTime != nullptr);
mTrtEngine= mTrtRunTime->deserializeCudaEngine(data.get(), length, nullptr);
assert(mTrtEngine != nullptr);

std::cout << "*** Initialize the engine ***" << std::endl;        
const int maxBatchSize = 1;
mTrtContext = mTrtEngine->createExecutionContext();
assert(mTrtContext != nullptr);
mTrtContext->setProfiler(&mTrtProfiler);

exactly IEngine::getNbBindings()
int nbBindings = mTrtEngine->getNbBindings();
mTrtCudaBuffer.resize(nbBindings);
mTrtBindBufferSize.resize(nbBindings);
for (int i = 0; i < nbBindings; ++i)
{
    Dims dims = mTrtEngine->getBindingDimensions(i);
    DataType dtype = mTrtEngine->getBindingDataType(i);
    int64_t totalSize = volume(dims) * maxBatchSize * getElementSize(dtype);
    mTrtBindBufferSize[i] = totalSize;
    mTrtCudaBuffer[i] = safeCudaMalloc(totalSize);
    if(mTrtEngine->bindingIsInput(i))
        mTrtInputCount++;
}
CUDA_CHECK(cudaStreamCreate(&mTrtCudaStream));

Until here, no error occurred.
Then I try to input a random float vector into the model and do inference. As I get output, it shows 0 or nan all over the vector. I already check the data type but it still didn’t show any normal number.

vector<float> inputData(h * w * c); // 208 * 976 * 3
std::generate(inputData.begin(), inputData.end(), []() {
    return float(rand() % 255);
});
vector<float> outputData;
outputData.resize(net.getOutputSize()/sizeof(float));

std::cout << "*** Inference ***" << std::endl;
static const int batchSize = 1;
assert(mTrtInputCount == 1);

int inputIndex = 0;
CUDA_CHECK(cudaMemcpyAsync(mTrtCudaBuffer[inputIndex], inputData.data(), mTrtBindBufferSize[inputIndex], cudaMemcpyHostToDevice, mTrtCudaStream));
auto t_start = std::chrono::high_resolution_clock::now();
mTrtContext->execute(batchSize, &mTrtCudaBuffer[inputIndex]);
auto t_end = std::chrono::high_resolution_clock::now();

float total = std::chrono::duration<float, std::milli>(t_end - t_start).count();
std::cout << "Time taken for inference is " << total << " ms." << std::endl;

for (size_t bindingIdx = mTrtInputCount; bindingIdx < mTrtBindBufferSize.size(); ++bindingIdx) {
    auto size = mTrtBindBufferSize[bindingIdx];
    CUDA_CHECK(cudaMemcpyAsync(outputData.data(), mTrtCudaBuffer[bindingIdx], size, cudaMemcpyDeviceToHost, mTrtCudaStream));
    outputData.data() = (char *)outputData.data() + size;
}
mTrtIterationTime ++ ;

How can I solve this kind of problem? Is this something to do with TensorRT version? How can I know that the generated engine really work?

Environment

TensorRT Version: 4.0.1.6
GPU Type: GeForce GTX 1070
Nvidia Driver Version: 396.26
CUDA Version: 9.2
CUDNN Version: 7
Operating System + Version: Ubuntu 16.04
Python Version (if applicable): 2.7
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.0.0
Baremetal or Container (if container which image + tag):

Relevant Files

ERFNet_trt.engine

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

SunilJB · April 20, 2020, 5:26am

Hi,

TRT 4.0 is very old version, will recommend you to use latest supported TRT version on your device.
In order to verify the model you can use the trtexec command line tool:

Thanks

imugly1029 · April 20, 2020, 6:17am

Dear @SunilJB:
Thanks for your reply.
But here I have a few questions.
According to the document, CUDA-9.2 is no longer supported by the TensorRT version 5 or later version. It seems that only CUDA-10.x are supported. However, my DRIVE PX2 has driveworks-1.2 with CUDA-9.2. Is it possible to deploy my engine generated by using TensorRT 5 api on it? Thanks!

SunilJB · April 20, 2020, 6:25am

Hi,

I don’t have much knowledge of DRIVE PX2 platform, will recommend to post query in below forum so that DRIVE PX2 team can take a look:

Thanks

imugly1029 · April 20, 2020, 6:29am

Dear @SunilJB :
OK! I will post my question over there.
Thanks for your help!

Topic		Replies	Views
Yolov5 Engine Inference error TensorRT tensorrt	3	1946	May 6, 2022
TensorRT get different result in python and c++ TensorRT	21	2944	August 24, 2022
TensorRT-7.1.3.4 Deserialize the cuda engine failed TensorRT cuda	9	8196	March 28, 2024
error while using TensorRT TensorRT	1	1252	January 10, 2020
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1185	January 19, 2022
Xavier NX TRT Deserialize fails for a specific caffe model Jetson Xavier NX tensorrt	5	545	October 18, 2021
Error while inferencing yolov5 tensorrt version on Jetson Xavier NX Jetson Xavier NX tensorrt , yolo	2	2108	February 9, 2022
Error Code 1: Serialization (Serialization assertion magicTagRead == magicTag failed.Magic tag does not match) [pp_infer-1] trt_infer: 4: [runtime.cpp TensorRT	1	24	February 1, 2025
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5487	June 29, 2022
TensorRT engine fails to deserialize, despite being built with the same version on the same machine TensorRT cudnn , deepstream	3	221	May 1, 2025

Python serialized TensorRT engine output wrong data at TensorRT C++ runtime

Description

Environment

Relevant Files

Steps To Reproduce

Related topics