Segfault error when exporting model from ONNX to TensorRT

Bramoss · July 10, 2021, 8:26pm

Description
I retrained the ssd resnet 50 v1 fpn 640x640 and exported to ONNX successfully. When I try to export it to TensorRT using the code here: Speeding Up Deep Learning Inference Using TensorFlow, ONNX, and NVIDIA TensorRT | NVIDIA Technical Blog, the segfault error appears.

I tried ./trtexec --onnx=model.onnx --verbose --explicitBatch and got the same error
trt_log.txt (51.6 KB)

Using the code in the NVIDIA TENSORRT DOCUMENTATION at the section Importing from ONNX using Python gives me the same error

Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

After using gdb I got a new error:

0x0000007fa957d0f8 in nvinfer1::plugin::BatchedNMSPlugin::getOutputDataType(int, nvinfer1::DataType const*, int) const ()
   from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7

Environment

TensorRT Version: 7.1.3
GPU Type: Jetson Nano
Nvidia Driver Version: JetPack-4.5
CUDA Version: 10.2
CUDNN Version: 8.0.0
Python Version: 3.6.9
Baremetal or Container: Baremetal

Notes

The command used to export the model to ONNX is: python -m tf2onnx.convert --saved-model saved_model --output model_resnet.onnx --fold_const --tag serve --verbose --opset 12
The input dtype of the ONNX model was changed to np.float32 since INT8 is not compatible with TensorRT
Layer + Register API was used to replace the NMS TensorFlow uses with BatchedNMS_TRT (I dont think this is a problem since the log I uploaded says Successfully created plugin: BatchedNMS_TRT)
I have tried changing the builder.max_workspace_size() in the code from the blog to different values with no success (1 << 30), (3 << 30) (It crashes even faster)
When I used onnx.checker.check_model() I got an error that no op was registered as BatchedNMS_TRT for domain_version 12. However, apparently that op is not technically a valid ONNX op so this is expected. Plus the trtexec log says Successfully created plugin: BatchedNMS_TRT)

Possible solutions

Using an earlier JetPack version?
Using CombinedNonMaxSuppression in the TensorFlow model? Since I read in a post made by @klinten that

BatchedNMSPlugin is modeled directly after the TensorFlow CombinedNMS

I dont know it this would be more ‘compatible’ with TensorRT, thus solving my problem

Optimizing the model a bit more before exporting to TensorRT? (if the problem occurs because of low memory)

So once again the question is:
How do I export my ONNX model to TensorRT?
Thanks in advance for your help :)

AastaLLL · July 12, 2021, 3:52am

Hi,

Based on the gdb log, the error occurs in the BatchedNMSPlugin.

Could you check if the layer does meet the requirement of batchedNMSPlugin?
https://github.com/NVIDIA/TensorRT/tree/master/plugin/batchedNMSPlugin#structure

Structure

The batchedNMSPlugin takes two inputs, boxes input and scores input.

Boxes input The boxes input are of shape [batch_size, number_boxes, number_classes, number_box_parameters] . The box location usually consists of four parameters such as [x1, y1, x2, y2] where (x1, y1) and (x2, y2) are the coordinates of any diagonal pair of box corners. For example, if your model outputs 8732 bounding boxes given one image, there are 100 candidate classes, the shape of boxes input will be [8732, 100, 4] .
…

More, we have an example to deploy Retinanet with TensorRT.
Could you also try the plugin layer to see if it works?

Thanks.

Bramoss · July 12, 2021, 11:00pm

Thank you very much for your answer.

About your first question, I tried implementing the batchedNMSPlugin in a different way.
This time, the error was:

ERROR: Graph contains a cycle

log_MultiClass_nms.txt (50.6 KB)

I read that the error could be caused by a map_fn TensorFlow op, but I’m not sure if that’s the problem here.

In the meantime, I’m gonna take a look at the examples you link to :)

AastaLLL · July 27, 2021, 3:15am

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

Does the BatchedNMSPlugin work with the sample shared above?
Thanks.

system · October 10, 2021, 2:16am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tf2 checkpoint->save_model->onnx->trt (onnx->trt failed) TensorRT tensorrt , jetson	1	541	November 1, 2021
Conversion of ONNX to TensorRT on Jetson Xavier Jetson TX2 tensorrt , tensorflow , onnx	2	822	October 3, 2021
Layer BatchedNmsPlugin failed validation TensorRT	6	692	April 1, 2021
Adding BatchedNMSDynamic_TRT plugin in the ssd mobileNet onnx model TensorRT	4	1895	September 7, 2021
ONNX Model Inference on Jetson Nano - Segmentation fault Jetson Nano tensorrt , jetson-inference	8	1457	October 15, 2021
Convert ssd_resnet50_v1_coco onnx to TensorRT TensorRT tensorrt , tensorflow , onnx	1	599	July 2, 2021
getPluginCreator could not find plugin BatchedNMS_TRT version 1 TensorRT	5	4034	December 23, 2020
Exporting Tensorflow models to Jetson Nano Jetson Nano tensorflow	25	6749	October 15, 2021
My Yolov4 tf2checkpoint->save_model->onnx->trt TensorRT tensorrt , jetson-inference	3	492	November 1, 2021
Segfault when importing ONNX model TensorRT	8	1830	October 12, 2021

Segfault error when exporting model from ONNX to TensorRT

Structure

Related topics