Hi
I have downloaded a pretrained resnet50 model and have converted it to ONNX. I have then tested it using trtexec and it seems to ran fine. Below is the some lines from output
[01/06/2022-17:00:44] [I] [TRT] [GpuLayer] (Unnamed Layer* 123) [Shuffle] | |
---|---|
[01/06/2022-17:00:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +123, now: CPU 477, GPU 3339 (MiB) | |
[01/06/2022-17:00:49] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +241, GPU +242, now: CPU 718, GPU 3581 (MiB) | |
[01/06/2022-17:00:49] [W] [TRT] Detected invalid timing cache, setup a local cache instead | |
[01/06/2022-17:00:57] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. | |
[01/06/2022-17:02:54] [I] [TRT] Detected 1 inputs and 1 output network tensors. | |
[01/06/2022-17:02:56] [I] [TRT] Total Host Persistent Memory: 130976 | |
[01/06/2022-17:02:56] [I] [TRT] Total Device Persistent Memory: 82422784 | |
[01/06/2022-17:02:56] [I] [TRT] Total Scratch Memory: 8192 | |
[01/06/2022-17:02:56] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 192 MiB | |
[01/06/2022-17:02:56] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 970, GPU 3800 (MiB) | |
[01/06/2022-17:02:56] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 971, GPU 3800 (MiB) | |
[01/06/2022-17:02:56] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 970, GPU 3800 (MiB) | |
[01/06/2022-17:02:56] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 970, GPU 3801 (MiB) | |
[01/06/2022-17:02:56] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 970 MiB, GPU 3801 MiB | |
[01/06/2022-17:02:57] [I] [TRT] Loaded engine size: 121 MB | |
[01/06/2022-17:02:57] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 1091 MiB, GPU 3778 MiB | |
[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +1, now: CPU 1092, GPU 3788 (MiB) | |
[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +1, now: CPU 1092, GPU 3789 (MiB) | |
[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1092, GPU 3789 (MiB) | |
[01/06/2022-17:02:59] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1092 MiB, GPU 3789 MiB | |
[01/06/2022-17:02:59] [I] Engine built in 138.956 sec. | |
[01/06/2022-17:02:59] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 872 MiB, GPU 3624 MiB | |
[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 872, GPU 3624 (MiB) | |
[01/06/2022-17:02:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 872, GPU 3624 (MiB) | |
[01/06/2022-17:02:59] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 872 MiB, GPU 3685 MiB | |
[01/06/2022-17:02:59] [I] Created input binding for input with dimensions 1x3x224x224 | |
[01/06/2022-17:02:59] [I] Created output binding for output with dimensions 1x1000 | |
[01/06/2022-17:02:59] [I] Starting inference | |
[01/06/2022-17:03:02] [I] Warmup completed 2 queries over 200 ms | |
[01/06/2022-17:03:02] [I] Timing trace has 40 queries over 3.14926 s | |
[01/06/2022-17:03:02] [I] | |
[01/06/2022-17:03:02] [I] === Trace details === | |
[01/06/2022-17:03:02] [I] Trace averages of 10 runs: | |
[01/06/2022-17:03:02] [I] Average on 10 runs - GPU latency: 78.9664 ms - Host latency: 79.0416 ms (end to end 79.0643 ms, enqueue 7.99818 ms) | |
[01/06/2022-17:03:02] [I] Average on 10 runs - GPU latency: 78.564 ms - Host latency: 78.6388 ms (end to end 78.6496 ms, enqueue 8.79493 ms) | |
[01/06/2022-17:03:02] [I] Average on 10 runs - GPU latency: 78.4611 ms - Host latency: 78.5361 ms (end to end 78.5468 ms, enqueue 8.98292 ms) | |
[01/06/2022-17:03:02] [I] Average on 10 runs - GPU latency: 78.5776 ms - Host latency: 78.6523 ms (end to end 78.6625 ms, enqueue 8.94394 ms) | |
[01/06/2022-17:03:02] [I] | |
[01/06/2022-17:03:02] [I] === Performance summary === | |
[01/06/2022-17:03:02] [I] Throughput: 12.7014 qps | |
[01/06/2022-17:03:02] [I] Latency: min = 77.8767 ms, max = 80.4874 ms, mean = 78.7172 ms, median = 78.5518 ms, percentile(99%) = 80.4874 ms | |
[01/06/2022-17:03:02] [I] End-to-End Host Latency: min = 77.8877 ms, max = 80.4978 ms, mean = 78.7308 ms, median = 78.5619 ms, percentile(99%) = 80.4978 ms | |
[01/06/2022-17:03:02] [I] Enqueue Time: min = 5.52032 ms, max = 11.0875 ms, mean = 8.67999 ms, median = 8.89282 ms, percentile(99%) = 11.0875 ms | |
[01/06/2022-17:03:02] [I] H2D Latency: min = 0.0708008 ms, max = 0.0722961 ms, mean = 0.0713795 ms, median = 0.0712891 ms, percentile(99%) = 0.0722961 ms | |
[01/06/2022-17:03:02] [I] GPU Compute Time: min = 77.801 ms, max = 80.4124 ms, mean = 78.6423 ms, median = 78.4767 ms, percentile(99%) = 80.4124 ms | |
[01/06/2022-17:03:02] [I] D2H Latency: min = 0.00219727 ms, max = 0.00415039 ms, mean = 0.00356293 ms, median = 0.00360107 ms, percentile(99%) = 0.00415039 ms | |
[01/06/2022-17:03:02] [I] Total Host Walltime: 3.14926 s | |
[01/06/2022-17:03:02] [I] Total GPU Compute Time: 3.14569 s | |
[01/06/2022-17:03:02] [I] Explanations of the performance metrics are printed in the verbose logs. | |
[01/06/2022-17:03:02] [I] | |
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/thingtrax/Documents/Conversion/resnet50.onnx | |
[01/06/2022-17:03:02] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 872, GPU 3686 (MiB) |
Does that means it ran fine and the ONNX model is correct?
I tried to run the ONNX model using detectnet and got below error:
[TRT] Total per-runner host memory is 131024
[TRT] Allocated activation device memory of size 3612672
[TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 690 MiB, GPU 3633 MiB
[TRT]
[TRT] CUDA engine context initialized on device GPU:
[TRT] -- layers 58
[TRT] -- maxBatchSize 1
[TRT] -- deviceMemory 3612672
[TRT] -- bindings 2
[TRT] binding 0
-- index 0
-- name 'input'
-- type FP32
-- in/out INPUT
-- # dims 4
-- dim #0 1
-- dim #1 3
-- dim #2 224
-- dim #3 224
[TRT] binding 1
-- index 1
-- name 'output'
-- type FP32
-- in/out OUTPUT
-- # dims 2
-- dim #0 1
-- dim #1 1000
[TRT]
[TRT] 3: Cannot find binding of given name: data
[TRT] failed to find requested input layer data in network
[TRT] device GPU, failed to create resources for CUDA engine
[TRT] failed to create TensorRT engine for resnet50.onnx, device GPU
[TRT] detectNet -- failed to initialize.
detectnet: failed to load detectNet model
Can anyone please help