Unexpected Segfault when creating ExecutionContext with TensorRT 8.0.1

Description

I am encountering a segfault when trying to create an execution context from a previously loaded ONNX Model. Specifically, I have a small wrapper class (defined below) which captures the onnx parsing logic, engine deserialization, and execution context creation. I generate a segfault when trying to create an execution context from the loaded engine member variable.

This issue only appeared after upgrading to TensorRT 8 from TensorRT 7.

Environment

TensorRT Version: 8.0.1
GPU Type: TITAN RTX
Nvidia Driver Version: 470.103.01
CUDA Version: 10.2
CUDNN Version: 8.2.1
Operating System + Version: Ubuntu 18.0.4

Onnx Model Info
ONNX IR version: 0.0.4
Opset version: 7
Producer name: pytorch
Producer version: 1.3

Steps To Reproduce

I created a simple test file to highlight this issue
tensorrt_model_loader_test_nvidia.cpp (4.8 KB)

Unfortunately, I can’t link the onnx model for privacy reasons, but hopefully there is enough information in the test file to resolve this issue.

I have an onnx model loader class defined as follows:

{
  public:
    TestModelLoader(const void* model_data, size_t model_size)
    {
      auto runtime = TRTUniquePtr<nvinfer1::IRuntime>(nvinfer1::createInferRuntime(logger_));
      engine_.reset(runtime->deserializeCudaEngine(model_data, model_size));
      execution_context_.reset(engine_->createExecutionContext());
    }

    static std::shared_ptr<TestModelLoader> loadFromOnnx(const std::string &model_file)
    {
      BasicLogger basic_logger;
      auto builder = TRTUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(basic_logger));
      const auto explicitBatch = 1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
      auto network = TRTUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(explicitBatch));
      auto parser = TRTUniquePtr<nvonnxparser::IParser>(nvonnxparser::createParser(*network, basic_logger));
      parser->parseFromFile(model_file.c_str(), 1);
      auto config = TRTUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
      config->setMaxWorkspaceSize(64 * (1 << 20));
      builder->setMaxBatchSize(1);
      auto engine_data = TRTUniquePtr<nvinfer1::IHostMemory>(builder->buildSerializedNetwork(*network, *config));

      return std::make_shared<TestModelLoader>(engine_data->data(), engine_data->size());
    }

    std::shared_ptr<nvinfer1::IExecutionContext> getExecutionContext()
    {
      return execution_context_;
    }

    std::shared_ptr<nvinfer1::IExecutionContext> createExecutionContext()
    {
      return std::shared_ptr<nvinfer1::IExecutionContext>(engine_->createExecutionContext());
    }

  private:   
    BasicLogger logger_;
    std::shared_ptr<nvinfer1::IExecutionContext> execution_context_;
    std::shared_ptr<nvinfer1::ICudaEngine> engine_;
};

I notice that if I run the following code, I get a segfault when trying to create the execution context

  // Test Model Loader
  {
    auto trt_state = TestModelLoader::loadFromOnnx(model_file);
    std::cerr << "Trying to load execution context from test class loader" << std::endl;
    trt_state->createExecutionContext();
    std::cerr << "Finished loading execution context from  test class loader" << std::endl;
  }

The confusing part is that the model loads successfully and I can create an execution context in the constructor of the class. In addition, I manually wrote out the steps to load a model from onnx and I don’t get a segfault:

  {
    std::shared_ptr<nvinfer1::ICudaEngine> engine = nullptr;
    std::shared_ptr<nvinfer1::IRuntime> runtime = nullptr;
    {
      std::shared_ptr<nvinfer1::IHostMemory> host_memory = nullptr;
      {
        BasicLogger basic_logger;
        auto builder = TRTUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(basic_logger));
        builder->getNbDLACores();
        const auto explicitBatch = 1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
        auto network = TRTUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(explicitBatch));
        auto parser = TRTUniquePtr<nvonnxparser::IParser>(nvonnxparser::createParser(*network, basic_logger));
        parser->parseFromFile(model_file.c_str(), 1);
        auto config = TRTUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
        config->setMaxWorkspaceSize(workspace_size_mb * (1 << 20));
        builder->setMaxBatchSize(batch_size);
        host_memory.reset(builder->buildSerializedNetwork(*network, *config));
      }

      {
        BasicLogger basic_logger;
        runtime.reset(nvinfer1::createInferRuntime(basic_logger));
        engine.reset(runtime->deserializeCudaEngine(host_memory->data(), host_memory->size()));
      }
    }
    {
      std::cerr << "Manual Create Execution Context" << std::endl;
      engine->createExecutionContext();
      std::cerr << "Manual creation succeeded" << std::endl;
    }
  }

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#onnx-export

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hi,

Could you please try on the latest TensorRT version 8.4 EA
https://developer.nvidia.com/nvidia-tensorrt-8x-download

Thank you.

I tried with Jetpack 8.2.1, and Jetpack 8.4EA and the test passes with both versions.

I’m looking at the release notes for tensorrt 8.2 and I can’t immediately see what changed to fix, but we can move forward with an upgrade to tensorrt 8.2.1

Please find here Documentation Archives :: NVIDIA Deep Learning TensorRT Documentation