getPluginCreator could not find plugin BatchedNMS_TRT version 1

  • Name: NVIDIA Jetson
  • Type: AGX Xavier
  • Jetpack: UNKNOWN [L4T 32.2.2] (JetPack 4.3. DP)
  • GPU-Arch: 7.2
  • Libraries:
  • CUDA: 10.0.326
  • cuDNN: 7.6.3.28-1+cuda10.0
  • TensorRT: 6.0.1.5-1+cuda10.0
  • VisionWorks: NOT_INSTALLED
  • OpenCV: 4.0.0 compiled CUDA: YES

Hi,

I want to connect BatchedNMSPlugin to my detector model.

So, I convert onnx file to trt and connect to batchedNMSPlugin.

I change onnx2trt’s main.cpp.

[main.cpp]

// set nms paramter
      nvinfer1::plugin::NMSParameters nms_parameter;
      nms_parameter.shareLocation = true;
      nms_parameter.backgroundLabelId = -1;
      nms_parameter.numClasses = 1;
      nms_parameter.scoreThreshold = 0.1f;
      nms_parameter.iouThreshold = 0.5f;
      nms_parameter.isNormalized = false;
      nms_parameter.keepTopK = 100;

      // create nms plugin
      nvinfer1::IPluginV2* nms_plugin = createBatchedNMSPlugin(nms_parameter);

      // set nms input tensor
      // 7: outputs_bbox0
      // 8: outputs_score0
      nvinfer1::ITensor* nms_input_tensors[] = {trt_network->getOutput(7), trt_network->getOutput(8)};

      // print nms input tensor
      for (auto i = 0; i < 2; ++i) {
        auto dims = nms_input_tensors[i]->getDimensions();
        cout << i << ": " << dims.nbDims << ": ";
        for (auto j = 0; j < dims.nbDims; ++j)
          cout << dims.d[j] << ", ";
        cout << endl;
      }

      // connect to original network
      auto nms_layer = trt_network->addPluginV2(&nms_input_tensors[0], 2, *nms_plugin);

      // set nms output tensor
      for (auto i = 0; i < nms_layer->getNbOutputs(); ++i)
        trt_network->markOutput(*(nms_layer->getOutput(i)));
      
      // print nms output tensor
      for (auto i = 0; i < nms_layer->getNbOutputs(); ++i) {
        auto dims = nms_layer->getOutput(i)->getDimensions();
        cout << i << ": " << dims.nbDims << ": ";
        for (auto j = 0; j < dims.nbDims; ++j)
          cout << dims.d[j] << ", ";
        cout << endl;
      }

      // remove original detector's output
      trt_network->unmarkOutput(*(trt_network->getOutput(7)));
      trt_network->unmarkOutput(*(trt_network->getOutput(7)));

      // set nms output's name
      char* nms_output_name[] = {"num_detections", "nmsed_boxes", "nmsed_scores", "nmsed_classes"};
      for (auto i = 0; i < 4; ++i)
        trt_network->getOutput(7 + i)->setName(nms_output_name[i]);

      // final nms engine's output
      for (auto i = 0; i < trt_network->getNbOutputs(); ++i) {
        auto nms_tensor = trt_network->getOutput(i);
        auto nms_out_dims = nms_tensor->getDimensions();

        cout << i << ": " << nms_tensor->getName() << ": ";
        for (auto j = 0; j < nms_out_dims.nbDims; ++j)
          cout << nms_out_dims.d[j] << ", ";
        cout << endl;
      }

So, I can get Tensor RT engine file from converted by onnx2trt.

And I want to use this engine file another source.

I want to deserialize this engine and execute engine.

But In this case, I got the error on deserializeCudaEngine().

INVALID_ARGUMENT: getPluginCreator could not find plugin BatchedNMS_TRT version 1
safeDeserializationUtils.cpp (259) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
INVALID_STATE: std::exception
INVALID_CONFIG: Deserialize the cuda engine failed.
Segmentation fault (core dumped)

How can I do?

Thanks.

  • Name: NVIDIA GeForce
  • Type: GTX 1050Ti
  • Docker container: TensorFlow 19.09-py3
  • GPU-Arch: 6.1

Hi,

I found same exact issue within TensorFlow container when running TF-TRT optimized graph that has CombinedNonMaxSuppression operator. However, it is stated in https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#tf-114 that the operator is supported. I ran on GTX 1050Ti laptop using NVIDIA NGC TensorFlow 19.09-py3 container.

2019-11-13 03:00:45.202276: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2019-11-13 03:00:51.470503: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-11-13 03:00:51.470996: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5079d70 executing computations on platform Host. Devices:
2019-11-13 03:00:51.471027: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-11-13 03:00:51.473315: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-11-13 03:00:51.545852: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:51.546170: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5056840 executing computations on platform CUDA. Devices:
2019-11-13 03:00:51.546193: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
2019-11-13 03:00:51.546380: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:51.546677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2019-11-13 03:00:51.546743: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2019-11-13 03:00:51.546786: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2019-11-13 03:00:51.546815: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10
2019-11-13 03:00:51.546853: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10
2019-11-13 03:00:51.548121: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10
2019-11-13 03:00:51.548738: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10
2019-11-13 03:00:51.548779: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-13 03:00:51.548864: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:51.549126: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:51.549319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-11-13 03:00:52.084317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-13 03:00:52.084352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-11-13 03:00:52.084363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-11-13 03:00:52.084576: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:52.084866: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:52.085080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3055 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-11-13 03:00:54.905646: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-13 03:00:55.886505: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2019-11-13 03:00:56.228054: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger INVALID_ARGUMENT: getPluginCreator could not find plugin BatchedNMS_TRT version 1
2019-11-13 03:00:56.228100: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger safeDeserializationUtils.cpp (259) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
2019-11-13 03:00:56.229101: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger INVALID_STATE: std::exception
2019-11-13 03:00:56.229151: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger INVALID_CONFIG: Deserialize the cuda engine failed.
*** Aborted at 1573614056 (unix time) try "date -d @1573614056" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 2157 (TID 0x7f2857fff700) from PID 0; stack trace: ***
    @     0x7f2d22615f20 (unknown)
    @     0x7f2ae60c3ac8 tensorflow::tensorrt::TRTEngineOp::GetEngine()
    @     0x7f2ae60c9263 tensorflow::tensorrt::TRTEngineOp::ComputeAsync()
    @     0x7f2b05e49de3 tensorflow::BaseGPUDevice::ComputeAsync()
    @     0x7f2b05ea9416 tensorflow::(anonymous namespace)::ExecutorState::Process()
    @     0x7f2b05ea9b0f _ZNSt17_Function_handlerIFvvEZN10tensorflow12_GLOBAL__N_113ExecutorState13ScheduleReadyERKN4absl13InlinedVectorINS3_10TaggedNodeELm8ESaIS6_EEEPNS3_20TaggedNodeReadyQueueEEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x7f2b05f501f1 Eigen::ThreadPoolTempl<>::WorkerLoop()
    @     0x7f2b05f4d0e8 _ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x7f2d2014466f (unknown)
    @     0x7f2d223bf6db start_thread
    @     0x7f2d226f888f clone

I am having the exact same errors as the original poster. I tried both versions 19.12 and 20.1 through pip3. I also flashed down to JetPack 4.2 with version 19.11, but I get a similar error about not finding the BatchedNMS_TRT plugin…

  • Name: NVIDIA Jetson
  • Type: AGX Xavier
  • Jetpack: UNKNOWN [L4T 32.2.3] (JetPack 4.3. DP)
  • GPU-Arch: 7.2
  • Libraries:
  • CUDA: 10.2
  • TensorRT: 6

So I had been converting a SavedModel to TRT in one python script, then trying to load it in another. I did both in one script: Load non-TRT model, convert to TRT, save TRT, load TRT, and run inference. Then it worked! So it’s probably something to do with converter.convert, the default parameters, etc. Hope this helps.