What causes the deserializeCudaEngine() fail and how to get the error message?

tjliupeng · May 12, 2023, 11:13am

Env: RTX 3090, Cuda 11.4, TensorRT 8.2.3, Ubuntu 18.04**, Docker 19.03

We have 2 TRT models, which both sizes are more than 200MB. When our C++ application loads these 2 models into GPU by deserializeCudaEngine(), everything is fine at first. After some time, we try to optimize out application, and we change some C++ code run on CPU, then one model is randomly loaded failed (The returned engine pointer is null). How can I know what causes the deserialization fail? How to get the error message? I just guess that might relate to the memory. Refer to the below issue:

We try to use Google sanitizer to check the memory leak. At first with these 2 models, we can’t even start the application. Every time it is failed at deserializeCudaEngine(). Then we change the model file to another version with size around 120MB. That works and we can run the application to check the memory leak.

How can we know how much the host/GPU memory used by the model is?

spolisetty · May 12, 2023, 6:05pm

Hi,

The following few reasons could cause the deserializeCudaEngine() function to fail:

The GPU does not have enough memory to load the TRT engine.
Corrupted TRT engine file.
Incorrect engine buffer size or pointer
TensorRT’s current version is different from the TRT engine build time version.

You need to set the TRT logging to VERBOSE or DEBUG in order to get the error messages. Then the error messages and details about memory usage can be discovered in the logs.

You can also use the TRT profiler to know how much host or GPU memory is used by the model.
Please refer Developer Guide :: NVIDIA Deep Learning TensorRT Documentation for more info.

Thank you.

tjliupeng · May 13, 2023, 1:22am

So, where can I find the logs? Is it in the log file or any output? stdout or stderr?

spolisetty · May 14, 2023, 4:59pm

You can find the output logs in the console (STDOUT).

tjliupeng · May 15, 2023, 2:24am

@spolisetty , Thanks. Besides, does the trtexec --workspace number affect the trt inferencing GPU memory usage? Or the workspace size only affect the trt conversion?

spolisetty · May 16, 2023, 5:07am

trtexec --workspace option can affect the both TRT inferencing GPU memory usage and TRT conversion.
TensorRT uses the workspace memory to store the intermediate results and to perform optimizations.

tjliupeng · May 16, 2023, 5:39am

@spolisetty So, if I convert an ONNX model to trt engine by trtexec --workspace 20480, then when I use this trt in a c++ applicaiton, it will cost more GPU memory than the trt engine by trtexec --workspace 1024, right? Can trtexec report the GPU memeory usage for trt inferencing if I use trtexec to run the trt inference?

spolisetty · May 24, 2023, 9:57am

Hi,

Sorry if my previous response did not convey clear information to you.
The amount of memory used for inference in a C++ application is not directly affected due to the workspace size specified during engine building using trtexec. The workspace size only impacts the temporary GPU memory that TensorRT uses when building engines.

If we use the trtexec tool for both engine building and inference, then the workspace option will affect both, as I previously mentioned.

The Trtexec tool logs report higher-level summaries. You can also use profiling tools to get detailed information about GPU memory usage.

Thank you.

tjliupeng · May 27, 2023, 1:56am

@spolisetty Thanks for your clarification.

Topic		Replies	Views
CUDA Error in TensorRT deserializeCudaEngine() TensorRT tensorrt , cuda , linux	4	3519	June 10, 2021
why does tensorrt use so much memory? TensorRT	1	537	December 3, 2020
Trouble deserialising a trt engine file TensorRT	1	1566	September 5, 2021
segmentation fault when using deserializeCudaEngine in C++ api TensorRT	2	1113	August 15, 2019
cannot deserialize engine and segmentation fault(core dumped) TensorRT	1	1042	September 6, 2019
cannot deserialize engine and segmentation fault(core dumped) Jetson TX2	1	2304	July 25, 2018
TensorRT-7.1.3.4 Deserialize the cuda engine failed TensorRT cuda	9	8342	March 28, 2024
TensorRT deserialize_cuda_engine() returns a None Object TensorRT tensorrt	6	3845	March 31, 2021
ERROR: runtime->deserializeCudaEngine build a engine ,report error "Serialization assertion sizeRead == static_cast<uint64_t>(mEnd - mCurrent) failed" TensorRT	2	567	October 28, 2022
can we write IHostMemory into a file, and read the file to deserializeCudaEngine? TensorRT	8	1956	October 12, 2019

What causes the deserializeCudaEngine() fail and how to get the error message?

Related topics