I want to connect BatchedNMSPlugin to my detector model.
So, I convert onnx file to trt and connect to batchedNMSPlugin.
I change onnx2trt’s main.cpp.
[main.cpp]
// set nms paramter
nvinfer1::plugin::NMSParameters nms_parameter;
nms_parameter.shareLocation = true;
nms_parameter.backgroundLabelId = -1;
nms_parameter.numClasses = 1;
nms_parameter.scoreThreshold = 0.1f;
nms_parameter.iouThreshold = 0.5f;
nms_parameter.isNormalized = false;
nms_parameter.keepTopK = 100;
// create nms plugin
nvinfer1::IPluginV2* nms_plugin = createBatchedNMSPlugin(nms_parameter);
// set nms input tensor
// 7: outputs_bbox0
// 8: outputs_score0
nvinfer1::ITensor* nms_input_tensors[] = {trt_network->getOutput(7), trt_network->getOutput(8)};
// print nms input tensor
for (auto i = 0; i < 2; ++i) {
auto dims = nms_input_tensors[i]->getDimensions();
cout << i << ": " << dims.nbDims << ": ";
for (auto j = 0; j < dims.nbDims; ++j)
cout << dims.d[j] << ", ";
cout << endl;
}
// connect to original network
auto nms_layer = trt_network->addPluginV2(&nms_input_tensors[0], 2, *nms_plugin);
// set nms output tensor
for (auto i = 0; i < nms_layer->getNbOutputs(); ++i)
trt_network->markOutput(*(nms_layer->getOutput(i)));
// print nms output tensor
for (auto i = 0; i < nms_layer->getNbOutputs(); ++i) {
auto dims = nms_layer->getOutput(i)->getDimensions();
cout << i << ": " << dims.nbDims << ": ";
for (auto j = 0; j < dims.nbDims; ++j)
cout << dims.d[j] << ", ";
cout << endl;
}
// remove original detector's output
trt_network->unmarkOutput(*(trt_network->getOutput(7)));
trt_network->unmarkOutput(*(trt_network->getOutput(7)));
// set nms output's name
char* nms_output_name[] = {"num_detections", "nmsed_boxes", "nmsed_scores", "nmsed_classes"};
for (auto i = 0; i < 4; ++i)
trt_network->getOutput(7 + i)->setName(nms_output_name[i]);
// final nms engine's output
for (auto i = 0; i < trt_network->getNbOutputs(); ++i) {
auto nms_tensor = trt_network->getOutput(i);
auto nms_out_dims = nms_tensor->getDimensions();
cout << i << ": " << nms_tensor->getName() << ": ";
for (auto j = 0; j < nms_out_dims.nbDims; ++j)
cout << nms_out_dims.d[j] << ", ";
cout << endl;
}
So, I can get Tensor RT engine file from converted by onnx2trt.
And I want to use this engine file another source.
I want to deserialize this engine and execute engine.
But In this case, I got the error on deserializeCudaEngine().
INVALID_ARGUMENT: getPluginCreator could not find plugin BatchedNMS_TRT version 1
safeDeserializationUtils.cpp (259) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
INVALID_STATE: std::exception
INVALID_CONFIG: Deserialize the cuda engine failed.
Segmentation fault (core dumped)
2019-11-13 03:00:45.202276: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2019-11-13 03:00:51.470503: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-11-13 03:00:51.470996: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5079d70 executing computations on platform Host. Devices:
2019-11-13 03:00:51.471027: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2019-11-13 03:00:51.473315: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-11-13 03:00:51.545852: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:51.546170: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5056840 executing computations on platform CUDA. Devices:
2019-11-13 03:00:51.546193: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
2019-11-13 03:00:51.546380: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:51.546677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2019-11-13 03:00:51.546743: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2019-11-13 03:00:51.546786: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2019-11-13 03:00:51.546815: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10
2019-11-13 03:00:51.546853: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10
2019-11-13 03:00:51.548121: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10
2019-11-13 03:00:51.548738: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10
2019-11-13 03:00:51.548779: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-13 03:00:51.548864: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:51.549126: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:51.549319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-11-13 03:00:52.084317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-13 03:00:52.084352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-11-13 03:00:52.084363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-11-13 03:00:52.084576: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:52.084866: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 03:00:52.085080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3055 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-11-13 03:00:54.905646: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-13 03:00:55.886505: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2019-11-13 03:00:56.228054: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger INVALID_ARGUMENT: getPluginCreator could not find plugin BatchedNMS_TRT version 1
2019-11-13 03:00:56.228100: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger safeDeserializationUtils.cpp (259) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
2019-11-13 03:00:56.229101: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger INVALID_STATE: std::exception
2019-11-13 03:00:56.229151: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger INVALID_CONFIG: Deserialize the cuda engine failed.
*** Aborted at 1573614056 (unix time) try "date -d @1573614056" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 2157 (TID 0x7f2857fff700) from PID 0; stack trace: ***
@ 0x7f2d22615f20 (unknown)
@ 0x7f2ae60c3ac8 tensorflow::tensorrt::TRTEngineOp::GetEngine()
@ 0x7f2ae60c9263 tensorflow::tensorrt::TRTEngineOp::ComputeAsync()
@ 0x7f2b05e49de3 tensorflow::BaseGPUDevice::ComputeAsync()
@ 0x7f2b05ea9416 tensorflow::(anonymous namespace)::ExecutorState::Process()
@ 0x7f2b05ea9b0f _ZNSt17_Function_handlerIFvvEZN10tensorflow12_GLOBAL__N_113ExecutorState13ScheduleReadyERKN4absl13InlinedVectorINS3_10TaggedNodeELm8ESaIS6_EEEPNS3_20TaggedNodeReadyQueueEEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x7f2b05f501f1 Eigen::ThreadPoolTempl<>::WorkerLoop()
@ 0x7f2b05f4d0e8 _ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x7f2d2014466f (unknown)
@ 0x7f2d223bf6db start_thread
@ 0x7f2d226f888f clone
I am having the exact same errors as the original poster. I tried both versions 19.12 and 20.1 through pip3. I also flashed down to JetPack 4.2 with version 19.11, but I get a similar error about not finding the BatchedNMS_TRT plugin…
So I had been converting a SavedModel to TRT in one python script, then trying to load it in another. I did both in one script: Load non-TRT model, convert to TRT, save TRT, load TRT, and run inference. Then it worked! So it’s probably something to do with converter.convert, the default parameters, etc. Hope this helps.