What's the problem of the deserializeCudaEngine() failure?

We need to build c++ application based on middle-ware platform similar to ROS provided by third-party company on Jetson Orin. This application is provided as a shared library and loaded with dlopen() by this platform.

Now we got a weird problem: when calling deserializeCudaEngine() to load a tensorrt enginge with QAT, core dump happened, and we got the stack trace as follow:

`Thread 1 “mfrlaunch” received signal SIGSEGV, Segmentation fault.
__strlen_generic () at …/sysdeps/aarch64/multiarch/…/strlen.S:98

98 …/sysdeps/aarch64/multiarch/…/strlen.S: No such file or directory.
(gdb) bt
#0 __strlen_generic () at …/sysdeps/aarch64/multiarch/…/strlen.S:98
#1 0x0000fffff721c958 in __GI__IO_puts (str=0x0) at ioputs.c:35
#2 0x0000ffffdfb337ec in ?? () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.8
#3 0x0000ffffdf857638 in ?? () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.8
#4 0x0000ffffdfaf5ac0 in ?? () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.8
#5 0x0000ffffdcc571a0 in ?? () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.8
#6 0x0000ffffdcc3ea50 in ?? () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.8
#7 0x0000ffffdcc32548 in ?? () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.8
#8 0x0000fffff68d2428 in nvinfer1::IRuntime::deserializeCudaEngine (this=, pluginFactory=0x0, size=, blob=0xfffeac2d4010)
`
However, if we load an engine without QAT, everything is fine.

Besides, if we build an independent application without that middle-ware platform, the problem engine can be loaded.

Thanks.

Hi,

Is the QAT engine built in the same environment as the one without QAT?

It looks like some issues when parsing the model which might relate to the layer (or string) that is used in the QAT model.
Are you able to extract a sample so we can reproduce it on our platform?

Thanks.

Yes. They are both built on the same environment. And we built and loaded on the same board.

Now Another weird thing: we built the engine on the other env provided by other company. This engine can be loaded on our env.

Hi,

It’s possible. TensorRT compiling may not choose the same algorithm every time.
Are you able to share the QAT model so we can try to locate which layer/algorithm that leads to this error?

Thanks.

Hi,

It is not convenient to share the model file.However, we testified that changing the linker order of some Nvidia libraries in the CMakeLists.txt seems solve this problem.

Hi,

Okay. Thanks for the update.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.