Problems exporting TAO ONNX model to Jetson

juansensio03 · July 26, 2024, 11:33am

Please provide the following information when requesting support.

• Hardware (A100/Orin)
• Network Type (Detectnet_v2)

Hello, I am working in a project that consists on detecting multiple types of vehicles from traffic cameras. In this phase, I am trying to finetune detectnet_v2 (traficcamnet) with a custom dataset. Following the tutorials I’ve managed to train some models (unpruned, pruned and quantized), the training curves look fine and performing “tao evaluate” and “tao inference” on the final weights yield good results.

The problems I am facing right now concern the export and deployment of such new models to the production environment. First I export an onnx model with “tao export”. The ONNX model is able to find objects of interest in images, however I do not know how to translate the bounding boxes raw outputs to actual bounding boxes in image coordinates (but this is not the main issue, at this point I trust the onnx model works, although any pointers on how to do this would be apreciated to fully trust this step is correct).

Then I copy the onnx model to my Jetson and here I have multiple problems when I start detectNet (passing the labels and inputs/outputs names accordingly).

1 - The onnx model has dynamic shapes (mainly the batch size) and this results in an error. I was able to fix this by creating a new onnx model with fixed batch size of 1 (but I am not sure this causes other errors down in the pipeline). Can we do “tao export” without dynamic axis?
2 - The model cannot bind the inputs/outputs, even if I set the correct names in the detectNet initialization, it expects the default names of “data” (for input) and “coverage”/“bboxes” (for outputs). Again, I was able to fix this by creating a new onnx model from the original changing the names of the input/output layers, but I am not sure this causes other errors down in the pipeline.
3 - With fixes 1 and 2 I can run detectNet but I get some weird detections (there are more classes than there should be) suggesting something went wrong in the process.

[tracker] dropped track -1 -> class=39 frames=0
[tracker] dropped track -1 -> class=46 frames=0
[tracker] dropped track -1 -> class=55 frames=0
[tracker] dropped track -1 -> class=8 frames=0
[tracker] dropped track -1 -> class=8 frames=0
[tracker] dropped track -1 -> class=40 frames=0
[tracker] dropped track -1 -> class=67 frames=0
[tracker] added track -1 -> class=24
[tracker] added track -1 -> class=46
[tracker] added track -1 -> class=55
[tracker] added track -1 -> class=39
[tracker] added track -1 -> class=8
[tracker] added track -1 -> class=8
[tracker] added track -1 -> class=40
[tracker] added track -1 -> class=62
[tracker] dropped track -1 -> class=24 frames=0
[tracker] dropped track -1 -> class=55 frames=0
[tracker] dropped track -1 -> class=39 frames=0
[tracker] dropped track -1 -> class=46 frames=0
[tracker] dropped track -1 -> class=8 frames=0

I am pretty sure the error is in the ONNX → TRT step (although some issue generating the onnx model is not 100% discarded). I have tried to use “trtexec” to generate a TRT engine from the onnx model and give that to detectNet, but I get the same result.

I would appreciate help to solve this issue.

Morganh · July 26, 2024, 3:13pm

Did you refer to section “10. Model Export” of tao_tutorials/notebooks/tao_launcher_starter_kit/detectnet_v2/detectnet_v2.ipynb at main · NVIDIA/tao_tutorials · GitHub to export? Please use tf2onnx.

No, for current detectnet_v2, the exporting will only do dynamic batch size .

Can you open the onnx to double check? The input name is input_1 and output name is "output_cov/Sigmoid", "output_bbox/BiasAdd" . Refer to tao_tensorflow1_backend/nvidia_tao_tf1/cv/detectnet_v2/export/exporter.py at main · NVIDIA/tao_tensorflow1_backend · GitHub.

Please follow above-mentioned notebook to generate tensorrt engine and run inference to double check.

juansensio03 · July 27, 2024, 7:49am

Yes, this is an example of the command we use for exporting

tao model detectnet_v2 export \
                  -m $(USER_EXPERIMENT_DIR)/experiment_dir_unpruned/weights/resnet18_detector.hdf5 \
                  -o $(USER_EXPERIMENT_DIR)/experiment_dir_final/resnet18_detector_unpruned.onnx \
                  -e $(SPECS_DIR)/detectnet_v2_train.txt \
                  --gen_ds_config \
                  --onnx_route tf2onnx \
                  --verbose

Yes, I have an utility to explore onnx models and these are indeed the default names for the inputs and outputs. This is the information for the input/output layers of the default traficcament model (downloaded as tlt and exported to onnx with tao export, and after removing the dynamic batch size and fix it to 1)

Model Inputs:
Name: input_1:0, Shape: [1, 3, 544, 960], Type: float32

Model Outputs:
Name: output_cov/Sigmoid:0, Shape: [1, 4, 34, 60], Type: float32
Name: output_bbox/BiasAdd:0, Shape: [1, 16, 34, 60], Type: float32

However, when I try to run the model with detectNet

net = detectNet(
		model=args.model,
		labels=args.labels,
		input_blob="input_1", 
		output_cvg="output_cov/Sigmoid",
		output_bbox="output_bbox/BiasAdd",
		threshold=args.threshold,
	)

I get this error

[TRT]    CUDA engine context initialized on device GPU:
[TRT]       -- layers       30
[TRT]       -- maxBatchSize 1
[TRT]       -- deviceMemory 26112000
[TRT]       -- bindings     3
[TRT]       binding 0
                -- index   0
                -- name    'input_1:0'
                -- type    FP32
                -- in/out  INPUT
                -- # dims  4
                -- dim #0  1
                -- dim #1  3
                -- dim #2  544
                -- dim #3  960
[TRT]       binding 1
                -- index   1
                -- name    'output_cov/Sigmoid:0'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  4
                -- dim #0  1
                -- dim #1  4
                -- dim #2  34
                -- dim #3  60
[TRT]       binding 2
                -- index   2
                -- name    'output_bbox/BiasAdd:0'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  4
                -- dim #0  1
                -- dim #1  16
                -- dim #2  34
                -- dim #3  60
[TRT]    
[TRT]    3: Cannot find binding of given name: data
[TRT]    failed to find requested input layer data in network
[TRT]    device GPU, failed to create resources for CUDA engine
[TRT]    failed to create TensorRT engine for models/trafficcamnet_adapted2.onnx, device GPU
[TRT]    detectNet -- failed to initialize.

The error is the same if add the “:0” at the end of the names.

It works if I manually change the names for the inputs/outputs to the default ones: “data” for inputs and “coverage”/“bboxes” for outputs, that it works.

Model Inputs:
Name: data, Shape: [1, 3, 544, 960], Type: float32

Model Outputs:
Name: coverage, Shape: [1, 4, 34, 60], Type: float32
Name: bboxes, Shape: [1, 16, 34, 60], Type: float32

Since jetson inference uses trt I get the exact same errors in both cases (first the dynamic shapes, then the naming of the inputs/outputs, then the model does not detect anything).

I am using the provided trafficcamnet model as well for debugging, and not only our fine tuned model, I would expect it to work but as mentioned the onnx version of trafficamnet does not work for us. It would be nice to have a reference notebook that showcases this process working fine (run detectnet with trafficcamnet exported to onnx from the tlt version).

Thanks for the help.

Morganh · July 29, 2024, 2:17am

OK, got it. The trafficcamnet is trained several years ago with previous version of detectnet_v2. The input/output name may have changed.

Officially, there is not a sample to run onnx file with onnxruntime. We suggest to use tensorrt engine.
BTW, for tlt-> onnx file, you can refer to this link to get the onnx version of trafficcamnet tlt model.

yingliu · August 27, 2024, 6:08am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · September 10, 2024, 6:09am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting no output in detectnet jetson-inference TAO Toolkit tao	2	311	April 10, 2024
Running custom onnx model on Jetson Jetson Nano onnx	4	1801	June 15, 2022
How to deploy TAO 5.1.0 trained models to Jetson - Xavier NX and Nvidia Jetson TX2 NX (with Xavier NX Devkit) TAO Toolkit	2	554	December 20, 2023
Tensorflow model or onnx on my Jetson Nano Jetson Nano tensorrt , tensorflow , python	4	1346	October 15, 2021
Failed to Load ONNX ROS Deep Learning Jetson Xavier NX ros , onnx	6	537	February 7, 2024
TAO and Jetson-Inference ...ooops TAO Toolkit jetson-inference	9	1117	February 20, 2023
Error with using custom onnx model Jetson Nano python , onnx	4	759	July 13, 2022
Custom model on Jetson Nano Jetson Nano jetson-inference	2	587	June 28, 2023
[TAO 5] [Object Detection] Can't export a DINO model after training successfully. Missing Layers? TAO Toolkit	19	877	September 29, 2023
Jetson-inference detectnet.py --model=peoplenet , no engine file Jetson Xavier NX jetson-inference	11	637	February 13, 2024

Problems exporting TAO ONNX model to Jetson

Related topics