TensorRT4 sometimes Segfault when loading and de-serializing on TX2

Beerend · November 8, 2018, 9:26am

Hi,

On Jetson TX2, I am launching a gstreamer pipeline with multiple plugins that load a serialized TensorRT engine from disk. This is done using gst-launch-1.0 and sometimes this works without any problem, but sometimes I get a segmentation fault. I vaguely remember reading somewhere that it’s a bad idea to concurrently load multiple TensorRT engines but can someone confirm this and perhaps point to where this is documented?

Also does anyone have an idea on a workaround for this using gst-launch-1.0?
Currently I put the initialization in the gst_infer_handle_sink_event function because that’s where I can read the frame width and height at runtime from a caps structure which I need to initialize some object that wraps the TensorRT execution context.

(In gstreamer there is also a gst_infer_init function but this also gets executed with gst-inspect-1.0 so I did not want to put this expensive operation of loading the TensorRT engine in there.)

NVES · November 8, 2018, 6:24pm

Hello,

can you share the seg fault message and any traceback you are seeing?

Beerend · November 12, 2018, 1:15pm

Hi NVES,

Sorry it took me so long to get back to you. Since the bug does not always happen it was kind of tedious to reproduce and I’m also working on other things so I don’t often launch this exact pipeline.

Anyhow. Here you have some logging output and a traceback from the segfault and also a log from when it works as it should.

Segfault:

[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/aarch64-linux-gnu/libthread_db.so.1”.
[548547813376] Init for infer_plugin_obj=0x4cf020…
[548547813376] Init for infer_plugin_obj=0x4cf1a0…
Setting pipeline to PAUSED …
[New Thread 0x7fa084a200 (LWP 16206)]
[New Thread 0x7f9bfff200 (LWP 16207)]
[New Thread 0x7f9b7ff200 (LWP 16208)]
[New Thread 0x7f9afff200 (LWP 16209)]
[New Thread 0x7f9a7ff200 (LWP 16210)]
[New Thread 0x7f99fff200 (LWP 16211)]
Pipeline is PREROLLING …
[New Thread 0x7f997ff200 (LWP 16212)]
[New Thread 0x7f83fff200 (LWP 16214)]
[New Thread 0x7f9882f200 (LWP 16213)]
[548019565056] Loading TensoRT model from …/data/object_detector.json
[547675435520] Loading TensoRT model from …/data/object_detector.json
Got caps info: 2448x2048
Got caps info: 2448x2048
Loading TRT Engine…
Loading TRT Engine…
[New Thread 0x7f82e42200 (LWP 16216)]
INFO: Glob Size is 101703552 bytes.

Thread 9 “queue1:src” received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f83fff200 (LWP 16214)]
memcpy () at …/sysdeps/aarch64/memcpy.S:159
159 …/sysdeps/aarch64/memcpy.S: No such file or directory.
(gdb) bt
#0 memcpy () at …/sysdeps/aarch64/memcpy.S:159
#1 0x0000007fb7072e48 in std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) ()
from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#2 0x0000007fb706a2b8 in std::basic_stringbuf<char, std::char_traits, std::allocator >::overflow(int) ()
from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#3 0x0000007fb70be6b0 in std::basic_streambuf<char, std::char_traits >::xsputn(char const*, long) () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#4 0x0000007fb70b046c in std::basic_ostream<char, std::char_traits >& std::__ostream_insert<char, std::char_traits >(std::basic_ostream<char, std::char_traits >&, char const*, long) ()
from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#5 0x0000007fb28d26e4 in nvinfer1::cudnn::Engine::addLinearBlock(unsigned long) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
#6 0x0000007fb28d2e94 in nvinfer1::cudnn::Engine::deserialize(void const*, unsigned long, nvinfer1::IGpuAllocator&, nvinfer1::IPluginFactory*) ()
from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
#7 0x0000007fb28c6ec8 in nvinfer1::Runtime::deserializeCudaEngine(void const*, unsigned long, nvinfer1::IPluginFactory*) ()
from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
—Type to continue, or q to quit—q

Good run:

[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/aarch64-linux-gnu/libthread_db.so.1”.
[548547813376] Init for infer_plugin_obj=0x4cf020…
[548547813376] Init for infer_plugin_obj=0x4cf1a0…
Setting pipeline to PAUSED …
[New Thread 0x7fa084a200 (LWP 16233)]
[New Thread 0x7f9bfff200 (LWP 16234)]
[New Thread 0x7f9b7ff200 (LWP 16235)]
[New Thread 0x7f9afff200 (LWP 16236)]
[New Thread 0x7f9a7ff200 (LWP 16237)]
[New Thread 0x7f99fff200 (LWP 16238)]
Pipeline is PREROLLING …
[New Thread 0x7f997ff200 (LWP 16239)]
[New Thread 0x7f9882f200 (LWP 16240)]
[New Thread 0x7f83fff200 (LWP 16241)]
[547675435520] Loading TensoRT model from …/data/object_detector.json
[548019565056] Loading TensoRT model from …/data/object_detector.json
Got caps info: 2448x2048
Got caps info: 2448x2048
Loading TRT Engine…
Loading TRT Engine…
[New Thread 0x7f82e42200 (LWP 16243)]
INFO: Glob Size is 101703552 bytes.
INFO: Glob Size is 101703552 bytes.
INFO: Added linear block of size 88604672
INFO: Added linear block of size 88604672
INFO: Added linear block of size 5537792
INFO: Added linear block of size 1384448
INFO: Added linear block of size 88604672
INFO: Added linear block of size 88604672
INFO: Added linear block of size 5537792
INFO: Added linear block of size 1384448
INFO: Deserialize required 2571623 microseconds.
Loading Complete!
INFO: Deserialize required 2589110 microseconds.
Loading Complete!

NVES · November 15, 2018, 4:46pm

hello,

Question: Do you re-use the same logger for parallel engine deserialization. Call stack seems to suggest that multiple threads are trying to access the same logging instance. Engineering suggests to create a separate logger for each deserialized engine or make it thread-safe.

Beerend · November 16, 2018, 8:57am

Normally not:

NvLogger nvLogger;
nvinfer1::IRuntime* runtime = nvinfer1::createInferRuntime(nvLogger);
trt_engine = EnginePtr(runtime->deserializeCudaEngine(modelMem.data(), modelMem.size(), plugin_factory.get()));
runtime->destroy();

This code appears inside the member function of my class that encapsulates the TensorRT engine and context. So each instance creates its own logger object, attempts to de-serialize and cleans up the logger and runtime.

The plugin_factory is also created uniquely for each instance.

NVES · November 16, 2018, 5:24pm

Hello,

It’d help us debug if we can get a small repro package that exhibits the symptoms you are seeing. You can DM me if you’d like.

Topic		Replies	Views
segmentation fault when using deserializeCudaEngine in C++ api TensorRT	2	1070	August 15, 2019
cannot deserialize engine and segmentation fault(core dumped) Jetson TX2	2	2266	October 18, 2021
cannot deserialize engine and segmentation fault(core dumped) TensorRT	1	1025	September 6, 2019
Problem in Deserializing Engine(Segmentation Fault) TensorRT	1	1006	September 6, 2019
Problem on exporting Tensor RT engine to file and reimport it. TensorRT	5	2041	October 12, 2021
Creating various TensorRT engines from different threads -> Segmentation fault TensorRT	3	1048	December 28, 2018
tensorrt deserializecudaengine multiple thread will crash TensorRT	0	723	March 28, 2019
Error loading engine, deserialize_cuda_engine generates Segmentation fault (core dumped) TensorRT	4	1547	June 18, 2020
Segmentation fault occurs at deserializeCudaEngine TensorRT	7	1975	October 12, 2021
CUDA Error in TensorRT deserializeCudaEngine() TensorRT tensorrt , cuda , linux	5	3430	October 12, 2021

TensorRT4 sometimes Segfault when loading and de-serializing on TX2

Related topics