TensorRT4 sometimes Segfault when loading and de-serializing on TX2

Hi,

On Jetson TX2, I am launching a gstreamer pipeline with multiple plugins that load a serialized TensorRT engine from disk. This is done using gst-launch-1.0 and sometimes this works without any problem, but sometimes I get a segmentation fault. I vaguely remember reading somewhere that it’s a bad idea to concurrently load multiple TensorRT engines but can someone confirm this and perhaps point to where this is documented?

Also does anyone have an idea on a workaround for this using gst-launch-1.0?
Currently I put the initialization in the gst_infer_handle_sink_event function because that’s where I can read the frame width and height at runtime from a caps structure which I need to initialize some object that wraps the TensorRT execution context.

(In gstreamer there is also a gst_infer_init function but this also gets executed with gst-inspect-1.0 so I did not want to put this expensive operation of loading the TensorRT engine in there.)

Hello,

can you share the seg fault message and any traceback you are seeing?

Hi NVES,

Sorry it took me so long to get back to you. Since the bug does not always happen it was kind of tedious to reproduce and I’m also working on other things so I don’t often launch this exact pipeline.

Anyhow. Here you have some logging output and a traceback from the segfault and also a log from when it works as it should.

Segfault:

Good run:

hello,

Question: Do you re-use the same logger for parallel engine deserialization. Call stack seems to suggest that multiple threads are trying to access the same logging instance. Engineering suggests to create a separate logger for each deserialized engine or make it thread-safe.

Normally not:

NvLogger nvLogger;
nvinfer1::IRuntime* runtime = nvinfer1::createInferRuntime(nvLogger);
trt_engine = EnginePtr(runtime->deserializeCudaEngine(modelMem.data(), modelMem.size(), plugin_factory.get()));
runtime->destroy();

This code appears inside the member function of my class that encapsulates the TensorRT engine and context. So each instance creates its own logger object, attempts to de-serialize and cleans up the logger and runtime.

The plugin_factory is also created uniquely for each instance.

Hello,

It’d help us debug if we can get a small repro package that exhibits the symptoms you are seeing. You can DM me if you’d like.