Description
I have a trt engine file converted by trtexec. The file can run by trtexec, but failed to run in Python.
In Python, it raises the error:
[02/13/2023-08:17:42] [TRT] [E] 1: [pluginV2Runner.cpp::load::300] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[02/13/2023-08:17:42] [TRT] [E] 4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)
It seems that some plugins are not found. However, I can run it with trtexec --loadEngine=file.trt
:
[02/13/2023-08:20:05] [I] Enqueue Time: min = 82.1047 ms, max = 83.3429 ms, mean = 82.7438 ms, median = 82.7836 ms, percentile(90%) = 83.0442 ms, percentile(95%) = 83.2036 ms, percentile(99%) = 83.3429 ms
[02/13/2023-08:20:05] [I] H2D Latency: min = 0.171631 ms, max = 0.201569 ms, mean = 0.175657 ms, median = 0.174988 ms, percentile(90%) = 0.177979 ms, percentile(95%) = 0.182373 ms, percentile(99%) = 0.201569 ms
[02/13/2023-08:20:05] [I] GPU Compute Time: min = 81.7808 ms, max = 83.5574 ms, mean = 82.8076 ms, median = 82.8636 ms, percentile(90%) = 83.2031 ms, percentile(95%) = 83.2399 ms, percentile(99%) = 83.5574 ms
[02/13/2023-08:20:05] [I] D2H Latency: min = 0.0336914 ms, max = 0.0388794 ms, mean = 0.0372563 ms, median = 0.037323 ms, percentile(90%) = 0.0386963 ms, percentile(95%) = 0.0388184 ms, percentile(99%) = 0.0388794 ms
[02/13/2023-08:20:05] [I] Total Host Walltime: 3.20019 s
[02/13/2023-08:20:05] [I] Total GPU Compute Time: 3.14669 s
[02/13/2023-08:20:05] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[02/13/2023-08:20:05] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[02/13/2023-08:20:05] [I] Explanations of the performance metrics are printed in the verbose logs.
[02/13/2023-08:20:05] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8501] # trtexec --loadEngine=file.trt
Environment
nvidia docker container 22.12 (I also tried the latest 23.01, same error happens.)
Relevant Files
I have used multiHeadCrossAttentionPlugin and multiHeadFlashAttentionPlugin in my onnx file, and use trtexec to convert it to an engine. I have checked these two plugins are in python registry. So I guess these plugins are merged into something else absent in python by trtexec.
Here is the engine file (I don’t know whether it can help since it’s platform relevant): https://cloud.tsinghua.edu.cn/f/efa176b7463140ed9839/?dl=1