Detectnet failed to load Resnet50 ONNX model

ART97 · January 6, 2022, 11:47am

Hi

I have downloaded a pretrained resnet50 model and have converted it to ONNX. I have then tested it using trtexec and it seems to ran fine. Below is the some lines from output

	[01/06/2022-17:00:44] [I] [TRT] [GpuLayer] (Unnamed Layer* 123) [Shuffle]
	[01/06/2022-17:00:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +123, now: CPU 477, GPU 3339 (MiB)
	[01/06/2022-17:00:49] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +241, GPU +242, now: CPU 718, GPU 3581 (MiB)
	[01/06/2022-17:00:49] [W] [TRT] Detected invalid timing cache, setup a local cache instead
	[01/06/2022-17:00:57] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
	[01/06/2022-17:02:54] [I] [TRT] Detected 1 inputs and 1 output network tensors.
	[01/06/2022-17:02:56] [I] [TRT] Total Host Persistent Memory: 130976
	[01/06/2022-17:02:56] [I] [TRT] Total Device Persistent Memory: 82422784
	[01/06/2022-17:02:56] [I] [TRT] Total Scratch Memory: 8192
	[01/06/2022-17:02:56] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 192 MiB
	[01/06/2022-17:02:56] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 970, GPU 3800 (MiB)
	[01/06/2022-17:02:56] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 971, GPU 3800 (MiB)
	[01/06/2022-17:02:56] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 970, GPU 3800 (MiB)
	[01/06/2022-17:02:56] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 970, GPU 3801 (MiB)
	[01/06/2022-17:02:56] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 970 MiB, GPU 3801 MiB
	[01/06/2022-17:02:57] [I] [TRT] Loaded engine size: 121 MB
	[01/06/2022-17:02:57] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 1091 MiB, GPU 3778 MiB
	[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +1, now: CPU 1092, GPU 3788 (MiB)
	[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +1, now: CPU 1092, GPU 3789 (MiB)
	[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1092, GPU 3789 (MiB)
	[01/06/2022-17:02:59] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1092 MiB, GPU 3789 MiB
	[01/06/2022-17:02:59] [I] Engine built in 138.956 sec.
	[01/06/2022-17:02:59] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 872 MiB, GPU 3624 MiB
	[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 872, GPU 3624 (MiB)
	[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 872, GPU 3624 (MiB)
	[01/06/2022-17:02:59] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 872 MiB, GPU 3685 MiB
	[01/06/2022-17:02:59] [I] Created input binding for input with dimensions 1x3x224x224
	[01/06/2022-17:02:59] [I] Created output binding for output with dimensions 1x1000
	[01/06/2022-17:02:59] [I] Starting inference
	[01/06/2022-17:03:02] [I] Warmup completed 2 queries over 200 ms
	[01/06/2022-17:03:02] [I] Timing trace has 40 queries over 3.14926 s
	[01/06/2022-17:03:02] [I]
	[01/06/2022-17:03:02] [I] === Trace details ===
	[01/06/2022-17:03:02] [I] Trace averages of 10 runs:
	[01/06/2022-17:03:02] [I] Average on 10 runs - GPU latency: 78.9664 ms - Host latency: 79.0416 ms (end to end 79.0643 ms, enqueue 7.99818 ms)
	[01/06/2022-17:03:02] [I] Average on 10 runs - GPU latency: 78.564 ms - Host latency: 78.6388 ms (end to end 78.6496 ms, enqueue 8.79493 ms)
	[01/06/2022-17:03:02] [I] Average on 10 runs - GPU latency: 78.4611 ms - Host latency: 78.5361 ms (end to end 78.5468 ms, enqueue 8.98292 ms)
	[01/06/2022-17:03:02] [I] Average on 10 runs - GPU latency: 78.5776 ms - Host latency: 78.6523 ms (end to end 78.6625 ms, enqueue 8.94394 ms)
	[01/06/2022-17:03:02] [I]
	[01/06/2022-17:03:02] [I] === Performance summary ===
	[01/06/2022-17:03:02] [I] Throughput: 12.7014 qps
	[01/06/2022-17:03:02] [I] Latency: min = 77.8767 ms, max = 80.4874 ms, mean = 78.7172 ms, median = 78.5518 ms, percentile(99%) = 80.4874 ms
	[01/06/2022-17:03:02] [I] End-to-End Host Latency: min = 77.8877 ms, max = 80.4978 ms, mean = 78.7308 ms, median = 78.5619 ms, percentile(99%) = 80.4978 ms
	[01/06/2022-17:03:02] [I] Enqueue Time: min = 5.52032 ms, max = 11.0875 ms, mean = 8.67999 ms, median = 8.89282 ms, percentile(99%) = 11.0875 ms
	[01/06/2022-17:03:02] [I] H2D Latency: min = 0.0708008 ms, max = 0.0722961 ms, mean = 0.0713795 ms, median = 0.0712891 ms, percentile(99%) = 0.0722961 ms
	[01/06/2022-17:03:02] [I] GPU Compute Time: min = 77.801 ms, max = 80.4124 ms, mean = 78.6423 ms, median = 78.4767 ms, percentile(99%) = 80.4124 ms
	[01/06/2022-17:03:02] [I] D2H Latency: min = 0.00219727 ms, max = 0.00415039 ms, mean = 0.00356293 ms, median = 0.00360107 ms, percentile(99%) = 0.00415039 ms
	[01/06/2022-17:03:02] [I] Total Host Walltime: 3.14926 s
	[01/06/2022-17:03:02] [I] Total GPU Compute Time: 3.14569 s
	[01/06/2022-17:03:02] [I] Explanations of the performance metrics are printed in the verbose logs.
	[01/06/2022-17:03:02] [I]
	&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/thingtrax/Documents/Conversion/resnet50.onnx
	[01/06/2022-17:03:02] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 872, GPU 3686 (MiB)

Does that means it ran fine and the ONNX model is correct?

I tried to run the ONNX model using detectnet and got below error:

[TRT]    Total per-runner host memory is 131024
[TRT]    Allocated activation device memory of size 3612672
[TRT]    [MemUsageSnapshot] ExecutionContext creation end: CPU 690 MiB, GPU 3633 MiB
[TRT]    
[TRT]    CUDA engine context initialized on device GPU:
[TRT]       -- layers       58
[TRT]       -- maxBatchSize 1
[TRT]       -- deviceMemory 3612672
[TRT]       -- bindings     2
[TRT]       binding 0
				-- index   0
				-- name    'input'
				-- type    FP32
				-- in/out  INPUT
				-- # dims  4
				-- dim #0  1
				-- dim #1  3
				-- dim #2  224
				-- dim #3  224
[TRT]       binding 1
				-- index   1
				-- name    'output'
				-- type    FP32
				-- in/out  OUTPUT
				-- # dims  2
				-- dim #0  1
				-- dim #1  1000
[TRT]    
[TRT]    3: Cannot find binding of given name: data
[TRT]    failed to find requested input layer data in network
[TRT]    device GPU, failed to create resources for CUDA engine
[TRT]    failed to create TensorRT engine for resnet50.onnx, device GPU
[TRT]    detectNet -- failed to initialize.
detectnet:  failed to load detectNet model

Can anyone please help

AastaLLL · January 7, 2022, 3:23am

Hi,

The TensorRT testing is passed and indicates the model can run successfully.

But please note that ResNet50 is a classifer rather than a detector.
So you should start with the imagenet sample instead:

github.com

dusty-nv/jetson-inference/blob/master/docs/imagenet-console-2.md

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">
<p align="right"><sup><a href="building-repo-2.md">Back</a> | <a href="imagenet-example-python-2.md">Next</a> | </sup><a href="../README.md#hello-ai-world"><sup>Contents</sup></a>
<br/>
<sup>Image Recognition</sup></p>  

# Classifying Images with ImageNet
There are multiple types of deep learning networks available, including recognition, detection/localization, and semantic segmentation.  The first deep learning capability we're highlighting in this tutorial is **image recognition**, using classifcation networks that have been trained on large datasets to identify scenes and objects.

<img src="https://github.com/dusty-nv/jetson-inference/raw/pytorch/docs/images/imagenet.jpg" width="1000">

The [`imageNet`](../c/imageNet.h) object accepts an input image and outputs the probability for each class.  Having been trained on the ImageNet ILSVRC dataset of **[1000 objects](../data/networks/ilsvrc12_synset_words.txt)**, the GoogleNet and ResNet-18 models were automatically downloaded during the build step.  See [below](#downloading-other-classification-models) for other classification models that can be downloaded and used as well.

As an example of using the [`imageNet`](../c/imageNet.h) class, we provide sample programs for C++ and Python:

- [`imagenet.cpp`](../examples/imagenet/imagenet.cpp) (C++) 
- [`imagenet.py`](../python/examples/imagenet.py) (Python) 

These samples are able to classify images, videos, and camera feeds.  For more info about the various types of input/output streams supported, see the [Camera Streaming and Multimedia](aux-streaming.md) page.

This file has been truncated. show original

Thanks.

system · February 2, 2022, 1:53am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1415	July 12, 2022
ONNX model with Jetson-Inference using GPU Jetson Xavier NX tensorrt , jetson-inference , onnx	38	5641	October 18, 2021
Failed to load detectNet model Jetson Nano jetson-inference , jetson-nano	2	1135	October 6, 2022
DetectNet Error With ONNX model Jetson Nano	2	960	June 25, 2021
resnet10.caffemodel_b8_fp16.engine is optimized for DeepStream SDK	10	1389	October 12, 2021
Classifier result on onnx doesn't match Deepstream result DeepStream SDK tensorrt , tensorflow , nvbugs , onnx	35	3314	October 2, 2021
Face detection using jetson inference and custom model Jetson Nano tensorrt , jetson-inference	6	2226	March 9, 2022
The trt exec could not predict the image properly with resNet50.onnx model Jetson AGX Xavier tensorrt	22	956	January 9, 2024
ERORR with ONNX2TRT : Unknown embedded device detected Jetson Xavier NX onnx	18	4577	April 27, 2022
DeepStream 5.1, PyTorch, MobileNet SSD v1, retained, ONNX - poor performance DeepStream SDK	8	1722	October 12, 2021

Detectnet failed to load Resnet50 ONNX model

Related topics