TensorRT Inference error on Jetson nano

You told me to share the TLT model. But to generate an engine, the .tlt model must be converted to the .etlt model through the export task.
I thought you would generate .etlt model.
Link to download final_model.etlt :

Hi,

Thanks for the file.
We test your model on Nano + JetPack 4.6 and it can be inference successfully.

So the issue may occur from the implementation rather than TensorRT.
Would you mind double-checking it?

$ export key=bnJ0OG1xcHVrb3N2MGU5b21nZHR2a3ZrMXI6NTkzYjE3YjAtNzllNy00MTk3LTkyNmUtNmJhM2QxNTAyOGEw
$ ./tao-converter -k ${key} -p Input,1x3x416x416,8x3x416x416,16x3x416x416 -d 3,416,416 -o BatchedNMS -i nchw -m 16 -e trt.engine -w 1073741824 final_model.etlt
$ /usr/src/tensorrt/bin/trtexec --loadEngine=trt.engine
$ /usr/src/tensorrt/bin/trtexec --loadEngine=trt.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --loadEngine=trt.engine
[12/21/2021-15:08:53] [I] === Model Options ===
[12/21/2021-15:08:53] [I] Format: *
[12/21/2021-15:08:53] [I] Model:
[12/21/2021-15:08:53] [I] Output:
[12/21/2021-15:08:53] [I] === Build Options ===
[12/21/2021-15:08:53] [I] Max batch: 1
[12/21/2021-15:08:53] [I] Workspace: 16 MiB
[12/21/2021-15:08:53] [I] minTiming: 1
[12/21/2021-15:08:53] [I] avgTiming: 8
[12/21/2021-15:08:53] [I] Precision: FP32
[12/21/2021-15:08:53] [I] Calibration:
[12/21/2021-15:08:53] [I] Refit: Disabled
[12/21/2021-15:08:53] [I] Sparsity: Disabled
[12/21/2021-15:08:53] [I] Safe mode: Disabled
[12/21/2021-15:08:53] [I] Restricted mode: Disabled
[12/21/2021-15:08:53] [I] Save engine:
[12/21/2021-15:08:53] [I] Load engine: trt.engine
[12/21/2021-15:08:53] [I] NVTX verbosity: 0
[12/21/2021-15:08:53] [I] Tactic sources: Using default tactic sources
[12/21/2021-15:08:53] [I] timingCacheMode: local
[12/21/2021-15:08:53] [I] timingCacheFile:
[12/21/2021-15:08:53] [I] Input(s)s format: fp32:CHW
[12/21/2021-15:08:53] [I] Output(s)s format: fp32:CHW
[12/21/2021-15:08:53] [I] Input build shapes: model
[12/21/2021-15:08:53] [I] Input calibration shapes: model
[12/21/2021-15:08:53] [I] === System Options ===
[12/21/2021-15:08:53] [I] Device: 0
[12/21/2021-15:08:53] [I] DLACore:
[12/21/2021-15:08:53] [I] Plugins:
[12/21/2021-15:08:53] [I] === Inference Options ===
[12/21/2021-15:08:53] [I] Batch: 1
[12/21/2021-15:08:53] [I] Input inference shapes: model
[12/21/2021-15:08:53] [I] Iterations: 10
[12/21/2021-15:08:53] [I] Duration: 3s (+ 200ms warm up)
[12/21/2021-15:08:53] [I] Sleep time: 0ms
[12/21/2021-15:08:53] [I] Streams: 1
[12/21/2021-15:08:53] [I] ExposeDMA: Disabled
[12/21/2021-15:08:53] [I] Data transfers: Enabled
[12/21/2021-15:08:53] [I] Spin-wait: Disabled
[12/21/2021-15:08:53] [I] Multithreading: Disabled
[12/21/2021-15:08:53] [I] CUDA Graph: Disabled
[12/21/2021-15:08:53] [I] Separate profiling: Disabled
[12/21/2021-15:08:53] [I] Time Deserialize: Disabled
[12/21/2021-15:08:53] [I] Time Refit: Disabled
[12/21/2021-15:08:53] [I] Skip inference: Disabled
[12/21/2021-15:08:53] [I] Inputs:
[12/21/2021-15:08:53] [I] === Reporting Options ===
[12/21/2021-15:08:53] [I] Verbose: Disabled
[12/21/2021-15:08:53] [I] Averages: 10 inferences
[12/21/2021-15:08:53] [I] Percentile: 99
[12/21/2021-15:08:53] [I] Dump refittable layers:Disabled
[12/21/2021-15:08:53] [I] Dump output: Disabled
[12/21/2021-15:08:53] [I] Profile: Disabled
[12/21/2021-15:08:53] [I] Export timing to JSON file:
[12/21/2021-15:08:53] [I] Export output to JSON file:
[12/21/2021-15:08:53] [I] Export profile to JSON file:
[12/21/2021-15:08:53] [I]
[12/21/2021-15:08:53] [I] === Device Information ===
[12/21/2021-15:08:53] [I] Selected Device: NVIDIA Tegra X1
[12/21/2021-15:08:53] [I] Compute Capability: 5.3
[12/21/2021-15:08:53] [I] SMs: 1
[12/21/2021-15:08:53] [I] Compute Clock Rate: 0.9216 GHz
[12/21/2021-15:08:53] [I] Device Global Memory: 3956 MiB
[12/21/2021-15:08:53] [I] Shared Memory per SM: 64 KiB
[12/21/2021-15:08:53] [I] Memory Bus Width: 64 bits (ECC disabled)
[12/21/2021-15:08:53] [I] Memory Clock Rate: 0.01275 GHz
[12/21/2021-15:08:53] [I]
[12/21/2021-15:08:53] [I] TensorRT version: 8001
[12/21/2021-15:09:00] [I] [TRT] [MemUsageChange] Init CUDA: CPU +202, GPU +0, now: CPU 686, GPU 3012 (MiB)
[12/21/2021-15:09:00] [I] [TRT] Loaded engine size: 465 MB
[12/21/2021-15:09:00] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 686 MiB, GPU 3012 MiB
[12/21/2021-15:09:04] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +198, now: CPU 848, GPU 3299 (MiB)
[12/21/2021-15:09:06] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +240, GPU +299, now: CPU 1088, GPU 3598 (MiB)
[12/21/2021-15:09:06] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1088, GPU 3598 (MiB)
[12/21/2021-15:09:06] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1088 MiB, GPU 3598 MiB
[12/21/2021-15:09:06] [I] Engine loaded in 12.6778 sec.
[12/21/2021-15:09:06] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 623 MiB, GPU 3132 MiB
[12/21/2021-15:09:06] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 623, GPU 3132 (MiB)
[12/21/2021-15:09:06] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 623, GPU 3132 (MiB)
[12/21/2021-15:09:07] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 624 MiB, GPU 3640 MiB
[12/21/2021-15:09:07] [W] Dynamic dimensions required for input: Input, but no shapes were provided. Automatically overriding shape to: 1x3x416x416
[12/21/2021-15:09:07] [I] Created input binding for Input with dimensions 1x3x416x416
[12/21/2021-15:09:07] [I] Created output binding for BatchedNMS with dimensions 1x1
[12/21/2021-15:09:07] [I] Created output binding for BatchedNMS_1 with dimensions 1x200x4
[12/21/2021-15:09:07] [I] Created output binding for BatchedNMS_2 with dimensions 1x200
[12/21/2021-15:09:07] [I] Created output binding for BatchedNMS_3 with dimensions 1x200
[12/21/2021-15:09:07] [I] Starting inference
[12/21/2021-15:09:14] [I] Warmup completed 1 queries over 200 ms
[12/21/2021-15:09:14] [I] Timing trace has 10 queries over 5.445 s
[12/21/2021-15:09:14] [I]
[12/21/2021-15:09:14] [I] === Trace details ===
[12/21/2021-15:09:14] [I] Trace averages of 10 runs:
[12/21/2021-15:09:14] [I] Average on 10 runs - GPU latency: 544.267 ms - Host latency: 544.49 ms (end to end 544.5 ms, enqueue 6.6431 ms)
[12/21/2021-15:09:14] [I]
[12/21/2021-15:09:14] [I] === Performance summary ===
[12/21/2021-15:09:14] [I] Throughput: 1.83655 qps
[12/21/2021-15:09:14] [I] Latency: min = 542.125 ms, max = 546.103 ms, mean = 544.49 ms, median = 544.957 ms, percentile(99%) = 546.103 ms
[12/21/2021-15:09:14] [I] End-to-End Host Latency: min = 542.135 ms, max = 546.112 ms, mean = 544.5 ms, median = 544.966 ms, percentile(99%) = 546.112 ms
[12/21/2021-15:09:14] [I] Enqueue Time: min = 6.43506 ms, max = 7.24957 ms, mean = 6.6431 ms, median = 6.57336 ms, percentile(99%) = 7.24957 ms
[12/21/2021-15:09:14] [I] H2D Latency: min = 0.208008 ms, max = 0.27301 ms, mean = 0.215363 ms, median = 0.208984 ms, percentile(99%) = 0.27301 ms
[12/21/2021-15:09:14] [I] GPU Compute Time: min = 541.91 ms, max = 545.886 ms, mean = 544.267 ms, median = 544.74 ms, percentile(99%) = 545.886 ms
[12/21/2021-15:09:14] [I] D2H Latency: min = 0.00537109 ms, max = 0.00830078 ms, mean = 0.00697021 ms, median = 0.00695801 ms, percentile(99%) = 0.00830078 ms
[12/21/2021-15:09:14] [I] Total Host Walltime: 5.445 s
[12/21/2021-15:09:14] [I] Total GPU Compute Time: 5.44267 s
[12/21/2021-15:09:14] [I] Explanations of the performance metrics are printed in the verbose logs.
[12/21/2021-15:09:14] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --loadEngine=trt.engine
[12/21/2021-15:09:14] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 641, GPU 3689 (MiB)

Thanks.

Hi, thank you.
Could you please check my inference code? Is it correct?

What is your TensorRT version?
Tao has TRT 7.2 but jetpack has TRT 8.
Could this problem related to TRT versions?

Hi,

We use TensorRT 8 that included in the JetPack4.6.

Just want to confirm again.
You meet this issue in a JetPack 4.6 environment. Is this correct?

Thanks.

Yes, I use jetpack 4.6

Hi,

We can deploy your model with the python sample below.

inference2_fix.py (1.8 KB)

$ ./tao-converter -k ${key} -p Input,1x3x416x416,1x3x416x416,1x3x416x416 -d 3,416,416 -o BatchedNMS -i nchw -m 1 -e trt.engine -w 1073741824 final_model.etlt
$ python3 inference2_fix.py

There is a segmentation fault error when terminating (free buffer)
Please check the new script above.

Thanks.

can you conclude the root reason which raise the error ?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.