trtexec Caffe to tensorrt conversion deserializeCudaEngine segfault

GPU type RTX Titan
nvidia driver version 415.27
CUDA version
CUDNN version 7.3
TensorRT version TensorRT


The past weeks I have been trying to convert my custom caffe model to tensorrt. Doing this using trtexec works perfectly fine and achieves a very satisfying speedup within the trtexec profiler. However, when I try to deserialize the PLAN file in C++ using deserializeCudaEngine I always run into a segfault.

What could be causing this segfault? The stacktrace isn’t too helpful, as I cannot step into the deserializeCudaEngine function.

What I have tried so far:

  1. Ensure that trtexec is run on the same machine and GPU as where the C++ code should run (I know the optimizations are device specific).
  2. Ensure tensorrt versions of trtexec and the one used in my C++ code are EXACTLY equal ( Still segfaults.
  3. Try using a simpler caffe prototxt with only an input, 1 convolution, output. Still segfaults.
  4. Try the conversion using a manual caffe to tensorrt C++ executable that my colleague wrote some time ago instead of trtexec. deserializeCudaEngine works correctly for my super simple input,conv,output network in this case. Sadly, my colleague’s script lacks support for some layers in the custom network that I actually want to convert. Trtexec does support these layers, so I still want to use that if possible.

Because of the above points I am reasonably certain my C++ code is not the issue (the plan file created by my colleagues code works).
Phrased differently: What pitfalls are there to keep in mind when generating the plan file via trtexec and loading it later with deserializeCudaEngine? Could it happen for example that there is some version conflict in CUDA instead even if tensorrt versions are equal?

Not that I think it will be too helpful, but this is the prototxt of my super simple model that I used for testing:
layer {
type: “Input”
name: “data_1”
top: “data_1”
input_param {
shape { dim: 1 dim: 3 dim: 1440 dim: 1920 }
layer {
name: “conv1”
type: “Convolution”
bottom: “data_1”
top: “conv1”
param {
name: “conv1_w”
convolution_param {
num_output: 16
bias_term: false
pad: 0
kernel_size: 1
stride: 1


I have managed to solve this. My code was part of a large codebase with integrated third party libraries and it appears that there Cudnn7.3.0 was used. I ran trtexec outside of this environment, causing it to use the system installed version of cudnn: Cudnn7.3.1.

Conclusion: versions are important, not only for tensorrt itself, but also the supporting libraries.