Detectnet failed to load Resnet50 ONNX model

Hi

I have downloaded a pretrained resnet50 model and have converted it to ONNX. I have then tested it using trtexec and it seems to ran fine. Below is the some lines from output

[01/06/2022-17:00:44] [I] [TRT] [GpuLayer] (Unnamed Layer* 123) [Shuffle]
[01/06/2022-17:00:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +123, now: CPU 477, GPU 3339 (MiB)
[01/06/2022-17:00:49] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +241, GPU +242, now: CPU 718, GPU 3581 (MiB)
[01/06/2022-17:00:49] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[01/06/2022-17:00:57] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[01/06/2022-17:02:54] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[01/06/2022-17:02:56] [I] [TRT] Total Host Persistent Memory: 130976
[01/06/2022-17:02:56] [I] [TRT] Total Device Persistent Memory: 82422784
[01/06/2022-17:02:56] [I] [TRT] Total Scratch Memory: 8192
[01/06/2022-17:02:56] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 192 MiB
[01/06/2022-17:02:56] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 970, GPU 3800 (MiB)
[01/06/2022-17:02:56] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 971, GPU 3800 (MiB)
[01/06/2022-17:02:56] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 970, GPU 3800 (MiB)
[01/06/2022-17:02:56] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 970, GPU 3801 (MiB)
[01/06/2022-17:02:56] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 970 MiB, GPU 3801 MiB
[01/06/2022-17:02:57] [I] [TRT] Loaded engine size: 121 MB
[01/06/2022-17:02:57] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 1091 MiB, GPU 3778 MiB
[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +1, now: CPU 1092, GPU 3788 (MiB)
[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +1, now: CPU 1092, GPU 3789 (MiB)
[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1092, GPU 3789 (MiB)
[01/06/2022-17:02:59] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1092 MiB, GPU 3789 MiB
[01/06/2022-17:02:59] [I] Engine built in 138.956 sec.
[01/06/2022-17:02:59] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 872 MiB, GPU 3624 MiB
[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 872, GPU 3624 (MiB)
[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 872, GPU 3624 (MiB)
[01/06/2022-17:02:59] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 872 MiB, GPU 3685 MiB
[01/06/2022-17:02:59] [I] Created input binding for input with dimensions 1x3x224x224
[01/06/2022-17:02:59] [I] Created output binding for output with dimensions 1x1000
[01/06/2022-17:02:59] [I] Starting inference
[01/06/2022-17:03:02] [I] Warmup completed 2 queries over 200 ms
[01/06/2022-17:03:02] [I] Timing trace has 40 queries over 3.14926 s
[01/06/2022-17:03:02] [I]
[01/06/2022-17:03:02] [I] === Trace details ===
[01/06/2022-17:03:02] [I] Trace averages of 10 runs:
[01/06/2022-17:03:02] [I] Average on 10 runs - GPU latency: 78.9664 ms - Host latency: 79.0416 ms (end to end 79.0643 ms, enqueue 7.99818 ms)
[01/06/2022-17:03:02] [I] Average on 10 runs - GPU latency: 78.564 ms - Host latency: 78.6388 ms (end to end 78.6496 ms, enqueue 8.79493 ms)
[01/06/2022-17:03:02] [I] Average on 10 runs - GPU latency: 78.4611 ms - Host latency: 78.5361 ms (end to end 78.5468 ms, enqueue 8.98292 ms)
[01/06/2022-17:03:02] [I] Average on 10 runs - GPU latency: 78.5776 ms - Host latency: 78.6523 ms (end to end 78.6625 ms, enqueue 8.94394 ms)
[01/06/2022-17:03:02] [I]
[01/06/2022-17:03:02] [I] === Performance summary ===
[01/06/2022-17:03:02] [I] Throughput: 12.7014 qps
[01/06/2022-17:03:02] [I] Latency: min = 77.8767 ms, max = 80.4874 ms, mean = 78.7172 ms, median = 78.5518 ms, percentile(99%) = 80.4874 ms
[01/06/2022-17:03:02] [I] End-to-End Host Latency: min = 77.8877 ms, max = 80.4978 ms, mean = 78.7308 ms, median = 78.5619 ms, percentile(99%) = 80.4978 ms
[01/06/2022-17:03:02] [I] Enqueue Time: min = 5.52032 ms, max = 11.0875 ms, mean = 8.67999 ms, median = 8.89282 ms, percentile(99%) = 11.0875 ms
[01/06/2022-17:03:02] [I] H2D Latency: min = 0.0708008 ms, max = 0.0722961 ms, mean = 0.0713795 ms, median = 0.0712891 ms, percentile(99%) = 0.0722961 ms
[01/06/2022-17:03:02] [I] GPU Compute Time: min = 77.801 ms, max = 80.4124 ms, mean = 78.6423 ms, median = 78.4767 ms, percentile(99%) = 80.4124 ms
[01/06/2022-17:03:02] [I] D2H Latency: min = 0.00219727 ms, max = 0.00415039 ms, mean = 0.00356293 ms, median = 0.00360107 ms, percentile(99%) = 0.00415039 ms
[01/06/2022-17:03:02] [I] Total Host Walltime: 3.14926 s
[01/06/2022-17:03:02] [I] Total GPU Compute Time: 3.14569 s
[01/06/2022-17:03:02] [I] Explanations of the performance metrics are printed in the verbose logs.
[01/06/2022-17:03:02] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/thingtrax/Documents/Conversion/resnet50.onnx
[01/06/2022-17:03:02] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 872, GPU 3686 (MiB)

Does that means it ran fine and the ONNX model is correct?

I tried to run the ONNX model using detectnet and got below error:

[TRT]    Total per-runner host memory is 131024
[TRT]    Allocated activation device memory of size 3612672
[TRT]    [MemUsageSnapshot] ExecutionContext creation end: CPU 690 MiB, GPU 3633 MiB
[TRT]    
[TRT]    CUDA engine context initialized on device GPU:
[TRT]       -- layers       58
[TRT]       -- maxBatchSize 1
[TRT]       -- deviceMemory 3612672
[TRT]       -- bindings     2
[TRT]       binding 0
				-- index   0
				-- name    'input'
				-- type    FP32
				-- in/out  INPUT
				-- # dims  4
				-- dim #0  1
				-- dim #1  3
				-- dim #2  224
				-- dim #3  224
[TRT]       binding 1
				-- index   1
				-- name    'output'
				-- type    FP32
				-- in/out  OUTPUT
				-- # dims  2
				-- dim #0  1
				-- dim #1  1000
[TRT]    
[TRT]    3: Cannot find binding of given name: data
[TRT]    failed to find requested input layer data in network
[TRT]    device GPU, failed to create resources for CUDA engine
[TRT]    failed to create TensorRT engine for resnet50.onnx, device GPU
[TRT]    detectNet -- failed to initialize.
detectnet:  failed to load detectNet model

Can anyone please help

Hi,

The TensorRT testing is passed and indicates the model can run successfully.

But please note that ResNet50 is a classifer rather than a detector.
So you should start with the imagenet sample instead:

Thanks.