Trtexec stuck on jetson nano while converting onnx to TensorRT

I have converted the yolov4.pth to onnx, but when I using trtexec on jetson nono, the process stuck for hours.

> $ trtexec --onnx=yolov4_-1_3_416_416_dynamic.onnx --minShapes=input:1x3x416x416 --optShapes=input:8x3x416x416 --maxShapes=input:8x3x416x416 --workspace=2048  --saveEngine=yolov4-uniform-dynamic-max8.engine --fp16
> &&&& RUNNING TensorRT.trtexec # trtexec --onnx=yolov4_-1_3_416_416_dynamic.onnx --minShapes=input:1x3x416x416 --optShapes=input:8x3x416x416 --maxShapes=input:8x3x416x416 --workspace=2048 --saveEngine=yolov4-uniform-dynamic-max8.engine --fp16
> [02/03/2021-13:56:38] [I] === Model Options ===
> [02/03/2021-13:56:38] [I] Format: ONNX
> [02/03/2021-13:56:38] [I] Model: yolov4_-1_3_416_416_dynamic.onnx
> [02/03/2021-13:56:38] [I] Output:
> [02/03/2021-13:56:38] [I] === Build Options ===
> [02/03/2021-13:56:38] [I] Max batch: explicit
> [02/03/2021-13:56:38] [I] Workspace: 2048 MB
> [02/03/2021-13:56:38] [I] minTiming: 1
> [02/03/2021-13:56:38] [I] avgTiming: 8
> [02/03/2021-13:56:38] [I] Precision: FP32+FP16
> [02/03/2021-13:56:38] [I] Calibration: 
> [02/03/2021-13:56:38] [I] Safe mode: Disabled
> [02/03/2021-13:56:38] [I] Save engine: yolov4-uniform-dynamic-max8.engine
> [02/03/2021-13:56:38] [I] Load engine: 
> [02/03/2021-13:56:38] [I] Builder Cache: Enabled
> [02/03/2021-13:56:38] [I] NVTX verbosity: 0
> [02/03/2021-13:56:38] [I] Inputs format: fp32:CHW
> [02/03/2021-13:56:38] [I] Outputs format: fp32:CHW
> [02/03/2021-13:56:38] [I] Input build shape: input=1x3x416x416+8x3x416x416+8x3x416x416
> [02/03/2021-13:56:38] [I] Input calibration shapes: model
> [02/03/2021-13:56:38] [I] === System Options ===
> [02/03/2021-13:56:38] [I] Device: 0
> [02/03/2021-13:56:38] [I] DLACore: 
> [02/03/2021-13:56:38] [I] Plugins:
> [02/03/2021-13:56:38] [I] === Inference Options ===
> [02/03/2021-13:56:38] [I] Batch: Explicit
> [02/03/2021-13:56:38] [I] Input inference shape: input=8x3x416x416
> [02/03/2021-13:56:38] [I] Iterations: 10
> [02/03/2021-13:56:38] [I] Duration: 3s (+ 200ms warm up)
> [02/03/2021-13:56:38] [I] Sleep time: 0ms
> [02/03/2021-13:56:38] [I] Streams: 1
> [02/03/2021-13:56:38] [I] ExposeDMA: Disabled
> [02/03/2021-13:56:38] [I] Spin-wait: Disabled
> [02/03/2021-13:56:38] [I] Multithreading: Disabled
> [02/03/2021-13:56:38] [I] CUDA Graph: Disabled
> [02/03/2021-13:56:38] [I] Skip inference: Disabled
> [02/03/2021-13:56:38] [I] Inputs:
> [02/03/2021-13:56:38] [I] === Reporting Options ===
> [02/03/2021-13:56:38] [I] Verbose: Disabled
> [02/03/2021-13:56:38] [I] Averages: 10 inferences
> [02/03/2021-13:56:38] [I] Percentile: 99
> [02/03/2021-13:56:38] [I] Dump output: Disabled
> [02/03/2021-13:56:38] [I] Profile: Disabled
> [02/03/2021-13:56:38] [I] Export timing to JSON file: 
> [02/03/2021-13:56:38] [I] Export output to JSON file: 
> [02/03/2021-13:56:38] [I] Export profile to JSON file: 
> [02/03/2021-13:56:38] [I] 
> ----------------------------------------------------------------
> Input filename:   yolov4_-1_3_416_416_dynamic.onnx
> ONNX IR version:  0.0.6
> Opset version:    11
> Producer name:    pytorch
> Producer version: 1.7
> Domain:           
> Model version:    0
> Doc string:       
> ----------------------------------------------------------------
> [02/03/2021-13:56:41] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
> [02/03/2021-13:56:41] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
> [02/03/2021-13:56:41] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
> [02/03/2021-13:56:41] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
> [02/03/2021-13:56:41] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
> [02/03/2021-13:56:42] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
> [02/03/2021-13:56:42] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
> [02/03/2021-13:56:42] [W] [TRT] Output type must be INT32 for shape outputs
> [02/03/2021-13:56:42] [W] [TRT] Output type must be INT32 for shape outputs
> [02/03/2021-13:56:42] [W] [TRT] Output type must be INT32 for shape outputs
> [02/03/2021-13:56:42] [W] [TRT] Output type must be INT32 for shape outputs
> [02/03/2021-14:03:36] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.

I use top command but it shows the cpu usage of trtexec is 0%

9089 mcc 20 0 10.221g 471768 216444 S 0.0 11.6 12:46.87 trtexec

Hi,

In the first time launch, TensorRT will evaluate the model and pick up a fast algorithm based on hardware and layer information.
This procedure takes several minutes and is working on GPU.

You can check the GPU utilization with tegrastats.

$ sudo tegrastats

Thanks.