CUDA Error in TensorRT deserializeCudaEngine()


TensorRT issues the following error when deserializing an engine on a Tesla P100 machine:

 /_src/rtSafe/resources.h (441) - Cuda Error in loadKernel: -1 (TensorRT internal error)
 INVALID_STATE: std::exception
 INVALID_CONFIG: Deserialize the cuda engine failed.

Here is the stack trace related to the error from gdb:

#0  0x00007fffb3ccabe0 in __cxa_throw () from /lib64/
#1  0x00007fffe39788cf in nvinfer1::throwCudaError(char const*, char const*, int, int, char const*) () from /usr/local/lib64/third_party/
#2  0x00007fffe37426cb in nvinfer1::rt::ArchiveReadUtils::load(nvinfer1::rt::ReadArchive&, nvinfer1::DriverKernel&, unsigned short) () from /usr/local/lib64/third_party/
#3  0x00007fffe374d474 in nvinfer1::rt::ArchiveReadUtils::load(nvinfer1::rt::ReadArchive&, nvinfer1::OptionalValue<nvinfer1::rt::cuda::PointWiseV2Runner>&, unsigned short) () from /usr/local/lib64/third_party/
#4  0x00007fffe37596c4 in ?? () from /usr/local/lib64/third_party/
#5  0x00007fffe374fa44 in nvinfer1::rt::ArchiveReadUtils::load(nvinfer1::rt::ReadArchive&, nvinfer1::OptionalValue<nvinfer1::rt::Runner>&, unsigned short) () from /usr/local/lib64/third_party/
#6  0x00007fffe39546fc in nvinfer1::rt::SafeEngine::deserializeCoreEngine(nvinfer1::rt::CoreReadArchive&, std::vector<nvinfer1::rt::EngineLayerAttribute, std::allocator<nvinfer1::rt::EngineLayerAttribute> >&) () from /usr/local/lib64/third_party/
#7  0x00007fffe36faf32 in nvinfer1::rt::Engine::deserialize(void const*, unsigned long, nvinfer1::IGpuAllocator&, nvinfer1::IPluginFactory*) () from /usr/local/lib64/third_party/
#8  0x00007fffe3704465 in nvinfer1::Runtime::deserializeCudaEngine(void const*, unsigned long, nvinfer1::IPluginFactory*) () from /usr/local/lib64/third_party/
#9  0x00000000004b758a in ITensorRTClassifier::Internals::LoadEngine(std::string const&) ()

The funny thing is, the engine was created and serialized on the same machine but when we try loading it, we receive the error detailed above. We successfully loaded the engine on another machine that had the same environment so this error has us mystified! Has anyone ever encountered a similar situation?? Any guidance will be appreciated. Thanks in advance!


TensorRT Version:
GPU Type: Tesla P100
Nvidia Driver Version: 440.64.00
CUDA Version: 10.2
CUDNN Version: 8.1.0
Operating System + Version: CentOS 7.9.2009
Python Version (if applicable): N/A
TensorFlow Version (if applicable): N/A
PyTorch Version (if applicable): 1.9
Baremetal or Container (if container which image + tag): N/A

Relevant Files


Steps To Reproduce

  • Convert ONNX model to TensorRT engine
  • Serialize the engine to an engine plan file
  • Deserialize the engine in a another application using the deserializeCudaEngine(...) function.

Hi @kduncan,

Could you please make sure following. This error could be possible due to one of the following reasons.

  • Are you using the same TRT version while deserializing the engine, which you used to create one?
    The generated plan files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be re-targeted to the specific GPU in case you want to run them on a different GPU.

  • Mismatched versions of libraries/dependencies. Please make sure CUDA/CuDnn and other libraries installed correctly Alternatively, you could remove the hassle of host-side dependencies by using our NGC Docker containers instead: NVIDIA NGC

  • Sometimes this can happen due to Out Of Memory errors. You can easily check this with nvidia-smi, and verify that you have plenty of memory available before trying to convert/load the model.


Hello @spolisetty. Here are my responses to your suggestions.

  • Yes, I am using the same version of TRT. As I mentioned, the engine is serialized and de-serialized on the same machine. This is not the issue here.

  • I am currently investigating for mismatched libraries. I’ve been checking this for the past week. I truly believe that this has to be the reason. It’s just that nothing is standing out so far. All of the appropriate libraries appear to be loaded. Also, using NGC containers is not an option at the moment.

  • No, memory is not an issue. This is the only GPU-heavy application in use on the machine. I even tried lowering the amount of workspace memory for TRT when building the engine, but this had no effect.

Any other suggestions are welcomed. Thanks in advance.

Hi @kduncan,

Please check libraries installed correctly or not. You may test on NGC container to identify if it is dependency issue on your local setup. And please reinstall required dependencies correctly.

Thank you.

After further investigation, it appears that this was an RPATH issue with the deployed executable. Once the RPATH was cleared before deployment, everything worked fine.

1 Like