getPluginCreator could not find plugin BatchedNMS_TRT version 1

  • Name: NVIDIA Jetson
  • Type: AGX Xavier
  • Jetpack: UNKNOWN [L4T 32.2.2] (JetPack 4.3. DP)
  • GPU-Arch: 7.2
  • Libraries:
  • CUDA: 10.0.326
  • cuDNN: 7.6.3.28-1+cuda10.0
  • TensorRT: 6.0.1.5-1+cuda10.0
  • VisionWorks: NOT_INSTALLED
  • OpenCV: 4.0.0 compiled CUDA: YES

Hi,

I want to connect BatchedNMSPlugin to my detector model.

So, I convert onnx file to trt and connect to batchedNMSPlugin.

I change onnx2trt’s main.cpp.

[main.cpp]

// set nms paramter
      nvinfer1::plugin::NMSParameters nms_parameter;
      nms_parameter.shareLocation = true;
      nms_parameter.backgroundLabelId = -1;
      nms_parameter.numClasses = 1;
      nms_parameter.scoreThreshold = 0.1f;
      nms_parameter.iouThreshold = 0.5f;
      nms_parameter.isNormalized = false;
      nms_parameter.keepTopK = 100;

      // create nms plugin
      nvinfer1::IPluginV2* nms_plugin = createBatchedNMSPlugin(nms_parameter);

      // set nms input tensor
      // 7: outputs_bbox0
      // 8: outputs_score0
      nvinfer1::ITensor* nms_input_tensors[] = {trt_network->getOutput(7), trt_network->getOutput(8)};

      // print nms input tensor
      for (auto i = 0; i < 2; ++i) {
        auto dims = nms_input_tensors[i]->getDimensions();
        cout << i << ": " << dims.nbDims << ": ";
        for (auto j = 0; j < dims.nbDims; ++j)
          cout << dims.d[j] << ", ";
        cout << endl;
      }

      // connect to original network
      auto nms_layer = trt_network->addPluginV2(&nms_input_tensors[0], 2, *nms_plugin);

      // set nms output tensor
      for (auto i = 0; i < nms_layer->getNbOutputs(); ++i)
        trt_network->markOutput(*(nms_layer->getOutput(i)));
      
      // print nms output tensor
      for (auto i = 0; i < nms_layer->getNbOutputs(); ++i) {
        auto dims = nms_layer->getOutput(i)->getDimensions();
        cout << i << ": " << dims.nbDims << ": ";
        for (auto j = 0; j < dims.nbDims; ++j)
          cout << dims.d[j] << ", ";
        cout << endl;
      }

      // remove original detector's output
      trt_network->unmarkOutput(*(trt_network->getOutput(7)));
      trt_network->unmarkOutput(*(trt_network->getOutput(7)));

      // set nms output's name
      char* nms_output_name[] = {"num_detections", "nmsed_boxes", "nmsed_scores", "nmsed_classes"};
      for (auto i = 0; i < 4; ++i)
        trt_network->getOutput(7 + i)->setName(nms_output_name[i]);

      // final nms engine's output
      for (auto i = 0; i < trt_network->getNbOutputs(); ++i) {
        auto nms_tensor = trt_network->getOutput(i);
        auto nms_out_dims = nms_tensor->getDimensions();

        cout << i << ": " << nms_tensor->getName() << ": ";
        for (auto j = 0; j < nms_out_dims.nbDims; ++j)
          cout << nms_out_dims.d[j] << ", ";
        cout << endl;
      }

So, I can get Tensor RT engine file from converted by onnx2trt.

And I want to use this engine file another source.

I want to deserialize this engine and execute engine.

But In this case, I got the error on deserializeCudaEngine().

INVALID_ARGUMENT: getPluginCreator could not find plugin BatchedNMS_TRT version 1
safeDeserializationUtils.cpp (259) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
INVALID_STATE: std::exception
INVALID_CONFIG: Deserialize the cuda engine failed.
Segmentation fault (core dumped)

How can I do?

Thanks.

  • Name: NVIDIA GeForce
  • Type: GTX 1050Ti
  • Docker container: TensorFlow 19.09-py3
  • GPU-Arch: 6.1

Hi,

I found same exact issue within TensorFlow container when running TF-TRT optimized graph that has CombinedNonMaxSuppression operator. However, it is stated in Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation that the operator is supported. I ran on GTX 1050Ti laptop using NVIDIA NGC TensorFlow 19.09-py3 container.

2019-11-13 03:00:45.202276: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2019-11-13 03:00:51.470503: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-11-13 03:00:51.470996: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5079d70 executing computations on platform Host. Devices:
2019-11-13 03:00:51.471027: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-11-13 03:00:51.473315: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-11-13 03:00:51.545852: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:51.546170: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5056840 executing computations on platform CUDA. Devices:
2019-11-13 03:00:51.546193: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
2019-11-13 03:00:51.546380: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:51.546677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2019-11-13 03:00:51.546743: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2019-11-13 03:00:51.546786: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2019-11-13 03:00:51.546815: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10
2019-11-13 03:00:51.546853: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10
2019-11-13 03:00:51.548121: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10
2019-11-13 03:00:51.548738: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10
2019-11-13 03:00:51.548779: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-13 03:00:51.548864: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:51.549126: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:51.549319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-11-13 03:00:52.084317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-13 03:00:52.084352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-11-13 03:00:52.084363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-11-13 03:00:52.084576: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:52.084866: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:52.085080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3055 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-11-13 03:00:54.905646: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-13 03:00:55.886505: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2019-11-13 03:00:56.228054: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger INVALID_ARGUMENT: getPluginCreator could not find plugin BatchedNMS_TRT version 1
2019-11-13 03:00:56.228100: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger safeDeserializationUtils.cpp (259) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
2019-11-13 03:00:56.229101: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger INVALID_STATE: std::exception
2019-11-13 03:00:56.229151: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger INVALID_CONFIG: Deserialize the cuda engine failed.
*** Aborted at 1573614056 (unix time) try "date -d @1573614056" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 2157 (TID 0x7f2857fff700) from PID 0; stack trace: ***
    @     0x7f2d22615f20 (unknown)
    @     0x7f2ae60c3ac8 tensorflow::tensorrt::TRTEngineOp::GetEngine()
    @     0x7f2ae60c9263 tensorflow::tensorrt::TRTEngineOp::ComputeAsync()
    @     0x7f2b05e49de3 tensorflow::BaseGPUDevice::ComputeAsync()
    @     0x7f2b05ea9416 tensorflow::(anonymous namespace)::ExecutorState::Process()
    @     0x7f2b05ea9b0f _ZNSt17_Function_handlerIFvvEZN10tensorflow12_GLOBAL__N_113ExecutorState13ScheduleReadyERKN4absl13InlinedVectorINS3_10TaggedNodeELm8ESaIS6_EEEPNS3_20TaggedNodeReadyQueueEEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x7f2b05f501f1 Eigen::ThreadPoolTempl<>::WorkerLoop()
    @     0x7f2b05f4d0e8 _ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x7f2d2014466f (unknown)
    @     0x7f2d223bf6db start_thread
    @     0x7f2d226f888f clone

I am having the exact same errors as the original poster. I tried both versions 19.12 and 20.1 through pip3. I also flashed down to JetPack 4.2 with version 19.11, but I get a similar error about not finding the BatchedNMS_TRT plugin…

  • Name: NVIDIA Jetson
  • Type: AGX Xavier
  • Jetpack: UNKNOWN [L4T 32.2.3] (JetPack 4.3. DP)
  • GPU-Arch: 7.2
  • Libraries:
  • CUDA: 10.2
  • TensorRT: 6

So I had been converting a SavedModel to TRT in one python script, then trying to load it in another. I did both in one script: Load non-TRT model, convert to TRT, save TRT, load TRT, and run inference. Then it worked! So it’s probably something to do with converter.convert, the default parameters, etc. Hope this helps.

No, it’s not help

Hi, Request you to check the below reference links for custom plugin implementation.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleOnnxMnistCoordConvAC

Thanks!