Creating various TensorRT engines from different threads -> Segmentation fault

Hi,

When I create various TRT engines and contexts in parallel from different CPU threads I get a segmentation fault. I have checked my code since I believe it can be my problem but I do not see anything wrong. When I run load the models and generate the engine and contexts sequencially I dont have this problem. Is it possible that TensorRT does not support multi-thread in the creation of the negine and context?

Thanks a lot in advance.

Inaki

I have two other TensorrRT networks which I load in parallel. If I load them sequentially I have no troblue.

The output from gdb is as follows:

(gdb) where
#0  0x00007fffda5d1d25 in nvinfer1::LogStream<(nvinfer1::ILogger::Severity)3>::Buf::sync() () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.4
#1  0x00007fffd879bf5e in std::ostream::flush() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007fffda6a982e in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.4
#3  0x00007fffda6ab44b in nvinfer1::cudnn::selectFastestLayerAndDeleteOthers(nvinfer1::cudnn::EngineBuildContext&, std::vector<nvinfer1::cudnn::Layer*, std::allocator<nvinfer1::cudnn::Layer*> > const&)
    () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.4
#4  0x00007fffda642d8f in nvinfer1::builder::buildSingleLayer(nvinfer1::cudnn::EngineBuildContext&, nvinfer1::builder::Node&, std::unordered_map<std::string, std::unique_ptr<nvinfer1::cudnn::Region, std::default_delete<nvinfer1::cudnn::Region> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::unique_ptr<nvinfer1::cudnn::Region, std::default_delete<nvinfer1::cudnn::Region> > > > > const&, nvinfer1::CpuMemoryGroup&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*, bool) () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.4
#5  0x00007fffda643d69 in nvinfer1::builder::EngineTacticSupply::getBestTactic(nvinfer1::builder::Node&, nvinfer1::query::Ports<nvinfer1::RegionFormatL> const&, bool) ()
   from /usr/lib/x86_64-linux-gnu/libnvinfer.so.4
#6  0x00007fffda676e1d in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.4
#7  0x00007fffda67a559 in nvinfer1::builder::chooseFormatsAndTactics(nvinfer1::builder::Graph&, nvinfer1::builder::TacticSupply&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*) () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.4
#8  0x00007fffda645109 in nvinfer1::builder::makeEngineFromGraph(nvinfer1::CudaEngineBuildConfig const&, nvinfer1::cudnn::HardwareContext const&, nvinfer1::builder::Graph&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*, int) ()
   from /usr/lib/x86_64-linux-gnu/libnvinfer.so.4
#9  0x00007fffda648ffd in nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, nvinfer1::cudnn::HardwareContext const&, nvinfer1::Network const&) ()
   from /usr/lib/x86_64-linux-gnu/libnvinfer.so.4
#10 0x00007fffda6b2201 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.4
#11 0x00000000007ef63f in adv::load_from_uff(adv::TRTNetworkConfig const&, adv::Logger) () at /home/navarro/dev/tec/technology/dnn/src/tensorrt/tensorrt.cc:259
#12 0x00000000007f10d2 in adv::TRTNetwork::TRTNetwork(adv::TRTNetworkConfig const&) () at /home/navarro/dev/tec/technology/dnn/src/tensorrt/tensorrt.cc:299
#13 0x00000000007f844d in adv::BodyPoseTensorrt::Impl::Impl (this=0x7fff84001350, net_resolution=..., frame_resolution=...) at /home/navarro/dev/tec/technology/dnn/src/tensorrt_body_pose.cc:32

Nvinfer.h
//!
//! \class ILogger
//!
//! \brief Application-implemented logging interface for the builder, engine and runtime.
//!
//! Note that although a logger is passed on creation to each instance of a IBuilder or IRuntime interface, the logger is internally considered a singleton, and thus
//! multiple instances of IRuntime and/or IBuilder must all use the same logger.
//!

In my code, every thread define a log, so occoured this problem.
But look up the brief note, we must use a global or a singleton logger.

Hi, I’ve ran into the exact same issue as inaki. When creating TensorRT engines multithreads in parallel, I run into segmentation fault and the gdb backtrace is exactly the same.

Any ideas? Thanks in advance.