Terminate called after throwing an instance of 'nvinfer1::MyelinError'

Description

Occasionally when calling destroy() on an ICudaEngine a nvinfer1::MyelinError exception is thrown. The method is marked as noexcept so this results in std::terminate being called, with no way to catch and handle the exception. The exception contains the following error:

myelin/myelinGraphContext.h (40) - Myelin Error in ~MyelinGraphContext: 3 ()

Environment

TensorRT Version: Occurs in both 7.2.1.6 and 7.2.3.4
GPU Type: GTX 1060
Nvidia Driver Version: 460.39
CUDA Version: 11.1
CUDNN Version: 8.0
Operating System + Version: Ubuntu 18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

N/A

Steps To Reproduce

  • Create and destroy ICudaEngine in a loop

Stack trace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f7338f25921 in __GI_abort () at abort.c:79
#2  0x00007f733957a957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f7339580ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f733957fb49 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f73395804b8 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f73392e6573 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00007f73392e6df5 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8  0x00007f72e543142d in nvinfer1::throwMyelinError(char const*, char const*, int, int, char const*) () from /home/myles/core-cmake/cmake-build-debug/lib/../lib/libnvinfer.so.7
#9  0x00007f72e54279c1 in nvinfer1::rt::MyelinRunner::~MyelinRunner() () from /home/myles/core-cmake/cmake-build-debug/lib/../lib/libnvinfer.so.7
#10 0x00007f72e54279f9 in nvinfer1::rt::MyelinRunner::~MyelinRunner() () from /home/myles/core-cmake/cmake-build-debug/lib/../lib/libnvinfer.so.7
#11 0x00007f72e53b5aa6 in nvinfer1::rt::SafeEngine::~SafeEngine() () from /home/myles/core-cmake/cmake-build-debug/lib/../lib/libnvinfer.so.7
#12 0x00007f72e510ca6b in nvinfer1::rt::Engine::~Engine() () from /home/myles/core-cmake/cmake-build-debug/lib/../lib/libnvinfer.so.7
#13 0x00007f72e510cb99 in nvinfer1::rt::Engine::~Engine() () from /home/myles/core-cmake/cmake-build-debug/lib/../lib/libnvinfer.so.7
...

Hi @myles.inglis,

Could you please share issue repro scripts for better assistance.

Thank you.

I’ve been doing some more testing on this and it seems that it is a thread safety issue with the IExecutionContexts. I’ve been trying to create a minimal example that you can run but it has proven difficult without our entire integration code and internal models.

It seems that doing inference on multiple execution contexts from the same engine is thread safe, but creating (and possibly destroying?) execution contexts is not thread safe. For example:

engine.reset(runtime->deserializeCudaEngine(engine_data.data(), engine_data.size()));
std::vector<std::thread> threads;
for (int j = 0; j < 8; j++) {
  threads.emplace_back([&engine]() {
    auto exec_context = TRTPointer<nvinfer1::IExecutionContext>(engine->createExecutionContext());
    // Do inference
  });
}

Causes assertions/std::terminate, however:

engine.reset(runtime->deserializeCudaEngine(engine_data.data(), engine_data.size()));
std::vector<std::thread> threads;
for (int j = 0; j < 8; j++) {
  auto exec_context = TRTPointer<nvinfer1::IExecutionContext>(engine->createExecutionContext());
  threads.emplace_back([exec_context = std::move(exec_context)]() {
    // Do inference
  });
}

Is this expected?

Regardless, it would be preferable if the exceptions were caught in the noexcept methods or not thrown at all avoid crashing the host application.