Env: RTX 3090, Cuda 11.4, TensorRT 8.2.3, Ubuntu 18.04**, Docker 19.03
We have 2 TRT models, which both sizes are more than 200MB. When our C++ application loads these 2 models into GPU by deserializeCudaEngine(), everything is fine at first. After some time, we try to optimize out application, and we change some C++ code run on CPU, then one model is randomly loaded failed (The returned engine pointer is null). How can I know what causes the deserialization fail? How to get the error message? I just guess that might relate to the memory. Refer to the below issue:
We try to use Google sanitizer to check the memory leak. At first with these 2 models, we can’t even start the application. Every time it is failed at deserializeCudaEngine(). Then we change the model file to another version with size around 120MB. That works and we can run the application to check the memory leak.
How can we know how much the host/GPU memory used by the model is?
The following few reasons could cause the
deserializeCudaEngine() function to fail:
- The GPU does not have enough memory to load the TRT engine.
- Corrupted TRT engine file.
- Incorrect engine buffer size or pointer
- TensorRT’s current version is different from the TRT engine build time version.
You need to set the TRT logging to VERBOSE or DEBUG in order to get the error messages. Then the error messages and details about memory usage can be discovered in the logs.
You can also use the TRT profiler to know how much host or GPU memory is used by the model.
Please refer Developer Guide :: NVIDIA Deep Learning TensorRT Documentation for more info.
So, where can I find the logs? Is it in the log file or any output? stdout or stderr?
You can find the output logs in the console (STDOUT).
@spolisetty , Thanks. Besides, does the trtexec --workspace number affect the trt inferencing GPU memory usage? Or the workspace size only affect the trt conversion?
trtexec --workspace option can affect the both TRT inferencing GPU memory usage and TRT conversion.
TensorRT uses the workspace memory to store the intermediate results and to perform optimizations.
@spolisetty So, if I convert an ONNX model to trt engine by trtexec --workspace 20480, then when I use this trt in a c++ applicaiton, it will cost more GPU memory than the trt engine by trtexec --workspace 1024, right? Can trtexec report the GPU memeory usage for trt inferencing if I use trtexec to run the trt inference?
Sorry if my previous response did not convey clear information to you.
The amount of memory used for inference in a C++ application is not directly affected due to the workspace size specified during engine building using trtexec. The workspace size only impacts the temporary GPU memory that TensorRT uses when building engines.
If we use the trtexec tool for both engine building and inference, then the workspace option will affect both, as I previously mentioned.
The Trtexec tool logs report higher-level summaries. You can also use profiling tools to get detailed information about GPU memory usage.
@spolisetty Thanks for your clarification.