TensorRT-7.1.3.4 Deserialize the cuda engine failed

641263629 · October 14, 2020, 3:59am

Description

$ ./main image
ERROR: /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/resources.h (460) - Cuda Error in loadKernel: -1 (TensorRT internal error)
ERROR: INVALID_STATE: std::exception
ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
Segmentation fault

the onnx file is ok in cuda10.0 + cudnn7.5 + TensorRT-5.1.5.0,
pytorch to onnx has no error/warning.

value of engine->getNbLayers()
in TensorRT-5.1.5.0 is 40
in TensorRT-7.1.3.4 is 35

Environment

**TensorRT Version7.1.3.4:
**GPU Type Tesla T4:
**Nvidia Driver Version 450.51.05:
**CUDA Version 11.0:
**CUDNN Version cudnn-11.0-linux-x64-v8.0.2.39:
**Operating System + Version ubuntu16.04
**Python Version (if applicable) python3.7:
TensorFlow Version (if applicable):
**PyTorch Version (if applicable) 1.6.0:
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

logs:

ubuntu @ ~/tools/TensorRT-7.1.3.4/bin
$ ./trtexec --onnx=/tmp/test.onnx --shapes=input:32x3x160x96 --explicitBatch --workspace=1024 --fp16 --saveEngine=/tmp/test.engine
&&&& RUNNING TensorRT.trtexec # ./trtexec --onnx=/tmp/test.onnx --shapes=input:32x3x160x96 --explicitBatch --workspace=1024 --fp16 --saveEngine=/tmp/test.engine
[10/14/2020-10:21:48] [I] === Model Options ===
[10/14/2020-10:21:48] [I] Format: ONNX
[10/14/2020-10:21:48] [I] Model: /tmp/test.onnx
[10/14/2020-10:21:48] [I] Output:
[10/14/2020-10:21:48] [I] === Build Options ===
[10/14/2020-10:21:48] [I] Max batch: explicit
[10/14/2020-10:21:48] [I] Workspace: 1024 MB
[10/14/2020-10:21:48] [I] minTiming: 1
[10/14/2020-10:21:48] [I] avgTiming: 8
[10/14/2020-10:21:48] [I] Precision: FP32+FP16
[10/14/2020-10:21:48] [I] Calibration:
[10/14/2020-10:21:48] [I] Safe mode: Disabled
[10/14/2020-10:21:48] [I] Save engine: /tmp/test.engine
[10/14/2020-10:21:48] [I] Load engine:
[10/14/2020-10:21:48] [I] Builder Cache: Enabled
[10/14/2020-10:21:48] [I] NVTX verbosity: 0
[10/14/2020-10:21:48] [I] Inputs format: fp32:CHW
[10/14/2020-10:21:48] [I] Outputs format: fp32:CHW
[10/14/2020-10:21:48] [I] Input build shape: input=32x3x160x96+32x3x160x96+32x3x160x96
[10/14/2020-10:21:48] [I] Input calibration shapes: model
[10/14/2020-10:21:48] [I] === System Options ===
[10/14/2020-10:21:48] [I] Device: 0
[10/14/2020-10:21:48] [I] DLACore:
[10/14/2020-10:21:48] [I] Plugins:
[10/14/2020-10:21:48] [I] === Inference Options ===
[10/14/2020-10:21:48] [I] Batch: Explicit
[10/14/2020-10:21:48] [I] Input inference shape: input=32x3x160x96
[10/14/2020-10:21:48] [I] Iterations: 10
[10/14/2020-10:21:48] [I] Duration: 3s (+ 200ms warm up)
[10/14/2020-10:21:48] [I] Sleep time: 0ms
[10/14/2020-10:21:48] [I] Streams: 1
[10/14/2020-10:21:48] [I] ExposeDMA: Disabled
[10/14/2020-10:21:48] [I] Spin-wait: Disabled
[10/14/2020-10:21:48] [I] Multithreading: Disabled
[10/14/2020-10:21:48] [I] CUDA Graph: Disabled
[10/14/2020-10:21:48] [I] Skip inference: Disabled
[10/14/2020-10:21:48] [I] Inputs:
[10/14/2020-10:21:48] [I] === Reporting Options ===
[10/14/2020-10:21:48] [I] Verbose: Disabled
[10/14/2020-10:21:48] [I] Averages: 10 inferences
[10/14/2020-10:21:48] [I] Percentile: 99
[10/14/2020-10:21:48] [I] Dump output: Disabled
[10/14/2020-10:21:48] [I] Profile: Disabled
[10/14/2020-10:21:48] [I] Export timing to JSON file:
[10/14/2020-10:21:48] [I] Export output to JSON file:
[10/14/2020-10:21:48] [I] Export profile to JSON file:
[10/14/2020-10:21:48] [I]

Input filename: /tmp/test.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: pytorch
Producer version: 1.6
Domain:
Model version: 0
Doc string:

[10/14/2020-10:23:31] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[10/14/2020-10:23:31] [I] Starting inference threads
[10/14/2020-10:23:34] [I] Warmup completed 0 queries over 200 ms
[10/14/2020-10:23:34] [I] Timing trace has 0 queries over 3.00063 s
[10/14/2020-10:23:34] [I] Trace averages of 10 runs:
[10/14/2020-10:23:34] [I] Average on 10 runs - GPU latency: 0.24994 ms - Host latency: 0.416225 ms (end to end 0.478523 ms, enqueue 0.0757492 ms)
…
0.3979 ms, enqueue 0.0746338 ms)
[10/14/2020-10:23:34] [I] Average on 10 runs - GPU latency: 0.209033 ms - Host latency: 0.37102 ms (end to end 0.403516 ms, enqueue 0.0744385 ms)
[10/14/2020-10:23:34] [I] Host Latency
[10/14/2020-10:23:34] [I] min: 0.357666 ms (end to end 0.368225 ms)
[10/14/2020-10:23:34] [I] max: 0.459991 ms (end to end 0.510849 ms)
[10/14/2020-10:23:34] [I] mean: 0.371668 ms (end to end 0.401351 ms)
[10/14/2020-10:23:34] [I] median: 0.369873 ms (end to end 0.401855 ms)
[10/14/2020-10:23:34] [I] percentile: 0.408752 ms at 99% (end to end 0.467224 ms at 99%)
[10/14/2020-10:23:34] [I] throughput: 0 qps
[10/14/2020-10:23:34] [I] walltime: 3.00063 s
[10/14/2020-10:23:34] [I] Enqueue Time
[10/14/2020-10:23:34] [I] min: 0.0690918 ms
[10/14/2020-10:23:34] [I] max: 0.124512 ms
[10/14/2020-10:23:34] [I] median: 0.0737915 ms
[10/14/2020-10:23:34] [I] GPU Compute
[10/14/2020-10:23:34] [I] min: 0.196289 ms
[10/14/2020-10:23:34] [I] max: 0.274429 ms
[10/14/2020-10:23:34] [I] mean: 0.208811 ms
[10/14/2020-10:23:34] [I] median: 0.207275 ms
[10/14/2020-10:23:34] [I] percentile: 0.246078 ms at 99%
[10/14/2020-10:23:34] [I] total compute time: 2.94236 s
&&&& PASSED TensorRT.trtexec # ./trtexec --onnx=/tmp/test.onnx --shapes=input:32x3x160x96 --explicitBatch --workspace=1024 --fp16 --saveEngine=/tmp/test.engine
ubuntu@ ~/tools/TensorRT-7.1.3.4/bin

{
int nbInput = network->getNbInputs();
auto inDim = network->getInput(0)->getDimensions();
int nbOutput = network->getNbOutputs();
auto outDim = network->getOutput(0)->getDimensions();
printf(“%s\n inputs %d, inputDims %d, inputCount %d\n outputs %d, outputDims %d, outputCount %d,
engine_size %d, engine_nbLayers %d\n”, output_file.c_str(),
nbInput, inDim.nbDims, samplesCommon::volume(inDim),
nbOutput, outDim.nbDims, samplesCommon::volume(outDim),
data->size(), engine->getNbLayers());
}
inputs 1, inputDims 4, inputCount 491520
outputs 1, outputDims 2, outputCount 96, engine_size 316916, engine_nbLayers 35

ubuntu @ ~/work
$ ./main image
ERROR: /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/resources.h (460) - Cuda Error in loadKernel: -1 (TensorRT internal error)
ERROR: INVALID_STATE: std::exception
ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.

AakankshaS · October 14, 2020, 6:22pm

Hi @641263629,
Request you to share your onnx model, so that we can assist you better.

Thanks!

641263629 · October 18, 2020, 3:31am

hi，any problem with the onnx file ?

AakankshaS · October 20, 2020, 6:52am

Hi @641263629,
I could not reproduce the issue,
Are you using the same TRT version while deserializing the engine, which you used to create one?
Thanks!

641263629 · October 20, 2020, 7:25am

yes, there is only one version of tensorrt installed in my system. does the gpu type matter? T4, 2080 Ti?

AakankshaS · October 20, 2020, 7:31am

Yes it does.
The generated plan files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be re-targeted to the specific GPU in case you want to run them on a different GPU.
Thanks!

641263629 · October 22, 2020, 2:55am

I know the reason!
I need tensorrt and libtorch both. build the latest version pytorch need std=c++14, but this setting lead the tensorrt in error。I drop out the libtorch and build my code with std=c++11, run success!
But what could I do to solve this conflict?

AakankshaS · December 1, 2020, 7:57am

Hi @641263629,
Suggest you to use NGC containers to avoid system dependencies .

Thanks!

332313594 · February 4, 2021, 7:14am

I meet the same error,maybe your cuda is out of memory

trenka.raul · March 28, 2024, 8:00am

Hi There,

I could still reproduce a similar situation, so I think it is recurring.
My solution:
In case you have multiple cuda libs on the machine, especially the libs that will reside in the python environment and come with installing tensorrt or othert tools with pip install. Then the tool needs to be prefixed with the right path to the right cuda libs. Like so:
LD_CONFIG_PATH=:$LD_LIBRARY_PATH ./main

Topic		Replies	Views
CUDA Error in TensorRT deserializeCudaEngine() TensorRT tensorrt , cuda , linux	5	3451	October 12, 2021
Error while doing inference with "deserializeCudaEngine" engine which would do "setWeights" for a Conv. layer TensorRT tensorrt , camera	10	1428	December 1, 2020
Runtime.deserialize_cuda_engine return a NoneType, how to fix ti? TensorRT tensorrt	10	2589	July 15, 2022
deserializeCudaEngine failed. Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match TensorRT	4	3095	April 22, 2024
Trouble deserialising a trt engine file TensorRT	1	1547	September 5, 2021
[TRT]: Deserialize the cuda engine failed TensorRT tensorrt	5	2983	December 29, 2021
[defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument) TensorRT tensorrt	3	1135	May 5, 2022
TensorRT deserialize_cuda_engine() returns a None Object TensorRT tensorrt	7	3804	October 12, 2021
Yolov5 Engine Inference error TensorRT tensorrt	3	1993	May 6, 2022
trtexec Caffe to tensorrt conversion deserializeCudaEngine segfault TensorRT	2	1406	October 12, 2021