Trtexec performance

Hey,
I’m currently trying to check the speed of execution of an onnx model using trtexec command.

The onnx model has been generated using the retinanet-example repo on github, on a host computer.

I’d like to see what performance it can reach on the JetsonTX2 (jetpack 4.4).
When I simply execute trtexec --onnx=model.onnx, here is what I get:

(cv) nvidia@nvidia-desktop:~$ /usr/src/tensorrt/bin/trtexec --onnx=rn50crop.onnx
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=rn50crop.onnx
[05/26/2020-09:49:40] [I] === Model Options ===
[05/26/2020-09:49:40] [I] Format: ONNX
[05/26/2020-09:49:40] [I] Model: rn50crop.onnx
[05/26/2020-09:49:40] [I] Output:
[05/26/2020-09:49:40] [I] === Build Options ===
[05/26/2020-09:49:40] [I] Max batch: 1
[05/26/2020-09:49:40] [I] Workspace: 16 MB
[05/26/2020-09:49:40] [I] minTiming: 1
[05/26/2020-09:49:40] [I] avgTiming: 8
[05/26/2020-09:49:40] [I] Precision: FP32
[05/26/2020-09:49:40] [I] Calibration:
[05/26/2020-09:49:40] [I] Safe mode: Disabled
[05/26/2020-09:49:40] [I] Save engine:
[05/26/2020-09:49:40] [I] Load engine:
[05/26/2020-09:49:40] [I] Builder Cache: Enabled
[05/26/2020-09:49:40] [I] NVTX verbosity: 0
[05/26/2020-09:49:40] [I] Inputs format: fp32:CHW
[05/26/2020-09:49:40] [I] Outputs format: fp32:CHW
[05/26/2020-09:49:40] [I] Input build shapes: model
[05/26/2020-09:49:40] [I] Input calibration shapes: model
[05/26/2020-09:49:40] [I] === System Options ===
[05/26/2020-09:49:40] [I] Device: 0
[05/26/2020-09:49:40] [I] DLACore:
[05/26/2020-09:49:40] [I] Plugins:
[05/26/2020-09:49:40] [I] === Inference Options ===
[05/26/2020-09:49:40] [I] Batch: 1
[05/26/2020-09:49:40] [I] Input inference shapes: model
[05/26/2020-09:49:40] [I] Iterations: 10
[05/26/2020-09:49:40] [I] Duration: 3s (+ 200ms warm up)
[05/26/2020-09:49:40] [I] Sleep time: 0ms
[05/26/2020-09:49:40] [I] Streams: 1
[05/26/2020-09:49:40] [I] ExposeDMA: Disabled
[05/26/2020-09:49:40] [I] Spin-wait: Disabled
[05/26/2020-09:49:40] [I] Multithreading: Disabled
[05/26/2020-09:49:40] [I] CUDA Graph: Disabled
[05/26/2020-09:49:40] [I] Skip inference: Disabled
[05/26/2020-09:49:40] [I] Inputs:
[05/26/2020-09:49:40] [I] === Reporting Options ===
[05/26/2020-09:49:40] [I] Verbose: Disabled
[05/26/2020-09:49:40] [I] Averages: 10 inferences
[05/26/2020-09:49:40] [I] Percentile: 99
[05/26/2020-09:49:40] [I] Dump output: Disabled
[05/26/2020-09:49:40] [I] Profile: Disabled
[05/26/2020-09:49:40] [I] Export timing to JSON file:
[05/26/2020-09:49:40] [I] Export output to JSON file:
[05/26/2020-09:49:40] [I] Export profile to JSON file:
[05/26/2020-09:49:40] [I]
----------------------------------------------------------------
Input filename:   rn50crop.onnx
ONNX IR version:  0.0.4
Opset version:    10
Producer name:    pytorch
Producer version: 1.3
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[05/26/2020-09:49:43] [W] [TRT] onnx2trt_utils.cpp:217: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/26/2020-09:50:13] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[05/26/2020-09:54:37] [I] [TRT] Detected 1 inputs and 10 output network tensors.
[05/26/2020-09:54:38] [I] Starting inference threads
[05/26/2020-09:54:48] [I] Warmup completed 1 queries over 200 ms
[05/26/2020-09:54:48] [I] Timing trace has 10 queries over 9.48074 s
[05/26/2020-09:54:48] [I] Trace averages of 10 runs:
[05/26/2020-09:54:48] [I] Average on 10 runs - GPU latency: 946.792 ms - Host latency: 948.065 ms (end to end 948.073 ms)
[05/26/2020-09:54:48] [I] Host latency
[05/26/2020-09:54:48] [I] min: 938.059 ms (end to end 938.067 ms)
[05/26/2020-09:54:48] [I] max: 954.232 ms (end to end 954.24 ms)
[05/26/2020-09:54:48] [I] mean: 948.065 ms (end to end 948.073 ms)
[05/26/2020-09:54:48] [I] median: 946.417 ms (end to end 946.424 ms)
[05/26/2020-09:54:48] [I] percentile: 954.232 ms at 99% (end to end 954.24 ms at 99%)
[05/26/2020-09:54:48] [I] throughput: 1.05477 qps
[05/26/2020-09:54:48] [I] walltime: 9.48074 s
[05/26/2020-09:54:48] [I] GPU Compute
[05/26/2020-09:54:48] [I] min: 936.79 ms
[05/26/2020-09:54:48] [I] max: 952.966 ms
[05/26/2020-09:54:48] [I] mean: 946.792 ms
[05/26/2020-09:54:48] [I] median: 945.148 ms
[05/26/2020-09:54:48] [I] percentile: 952.966 ms at 99%
[05/26/2020-09:54:48] [I] total compute time: 9.46792 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=rn50crop.onnx

If I understand well, the average speed by run is almost a second long, which is really slow.

I tried putting the Jetson in “nvpmodel -m 0” and use jetsonclock command to speed it up but I have the same result.

I also tried adding the –shapes=32x3xheightxwidth option but the result is still the same… (by the way I can’t find anywhere if I should put 32x3xHxW or if it is 32x3xWidthxHeight…)

I also would like to know how to find the max-batch-size I can use on JetsonTX2.

If anyone has an idea I’ll take it !

Hi,

The procedure looks correct to me.
May I know the performance when you test it on the desktop?

The batchsize can be set via argument like this:

$ /usr/src/tensorrt/bin/trtexec ... --maxBatch=1

Thanks.

Hi,
here is the result on the host for the same command trtexec --onnx=rn50crop.onnx

greenshield@greenshield-Precision-Tower-3620:~/annoted_images/training_cropped_img/rn50fpn$ /usr/src/tensorrt/bin/trtexec --onnx=rn50crop.onnx
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=rn50crop.onnx
[05/27/2020-09:45:29] [I] === Model Options ===
[05/27/2020-09:45:29] [I] Format: ONNX
[05/27/2020-09:45:29] [I] Model: rn50crop.onnx
[05/27/2020-09:45:29] [I] Output:
[05/27/2020-09:45:29] [I] === Build Options ===
[05/27/2020-09:45:29] [I] Max batch: 1
[05/27/2020-09:45:29] [I] Workspace: 16 MB
[05/27/2020-09:45:29] [I] minTiming: 1
[05/27/2020-09:45:29] [I] avgTiming: 8
[05/27/2020-09:45:29] [I] Precision: FP32
[05/27/2020-09:45:29] [I] Calibration:
[05/27/2020-09:45:29] [I] Safe mode: Disabled
[05/27/2020-09:45:29] [I] Save engine:
[05/27/2020-09:45:29] [I] Load engine:
[05/27/2020-09:45:29] [I] Inputs format: fp32:CHW
[05/27/2020-09:45:29] [I] Outputs format: fp32:CHW
[05/27/2020-09:45:29] [I] Input build shapes: model
[05/27/2020-09:45:29] [I] === System Options ===
[05/27/2020-09:45:29] [I] Device: 0
[05/27/2020-09:45:29] [I] DLACore:
[05/27/2020-09:45:29] [I] Plugins:
[05/27/2020-09:45:29] [I] === Inference Options ===
[05/27/2020-09:45:29] [I] Batch: 1
[05/27/2020-09:45:29] [I] Iterations: 10
[05/27/2020-09:45:29] [I] Duration: 3s (+ 200ms warm up)
[05/27/2020-09:45:29] [I] Sleep time: 0ms
[05/27/2020-09:45:29] [I] Streams: 1
[05/27/2020-09:45:29] [I] ExposeDMA: Disabled
[05/27/2020-09:45:29] [I] Spin-wait: Disabled
[05/27/2020-09:45:29] [I] Multithreading: Disabled
[05/27/2020-09:45:29] [I] CUDA Graph: Disabled
[05/27/2020-09:45:29] [I] Skip inference: Disabled
[05/27/2020-09:45:29] [I] Input inference shapes: model
[05/27/2020-09:45:29] [I] Inputs:
[05/27/2020-09:45:29] [I] === Reporting Options ===
[05/27/2020-09:45:29] [I] Verbose: Disabled
[05/27/2020-09:45:29] [I] Averages: 10 inferences
[05/27/2020-09:45:29] [I] Percentile: 99
[05/27/2020-09:45:29] [I] Dump output: Disabled
[05/27/2020-09:45:29] [I] Profile: Disabled
[05/27/2020-09:45:29] [I] Export timing to JSON file:
[05/27/2020-09:45:29] [I] Export output to JSON file:
[05/27/2020-09:45:29] [I] Export profile to JSON file:
[05/27/2020-09:45:29] [I]
----------------------------------------------------------------
Input filename:   rn50crop.onnx
ONNX IR version:  0.0.4
Opset version:    10
Producer name:    pytorch
Producer version: 1.3
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:30] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2020-09:45:33] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[05/27/2020-09:48:06] [I] [TRT] Detected 1 inputs and 10 output network tensors.
[05/27/2020-09:48:06] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[05/27/2020-09:48:06] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:06] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:07] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:07] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:07] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:07] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:08] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:08] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:08] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:08] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:08] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:09] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:09] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:09] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:09] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:10] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:10] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[05/27/2020-09:48:10] [I] Warmup completed 1 queries over 200 ms
[05/27/2020-09:48:10] [I] Timing trace has 16 queries over 3.85763 s
[05/27/2020-09:48:10] [I] Trace averages of 10 runs:
[05/27/2020-09:48:10] [I] Average on 10 runs - GPU latency: 226.286 ms - Host latency: 228.955 ms (end to end 453.144 ms)
[05/27/2020-09:48:10] [I] Host latency
[05/27/2020-09:48:10] [I] min: 228.139 ms (end to end 450.754 ms)
[05/27/2020-09:48:10] [I] max: 230.185 ms (end to end 460.878 ms)
[05/27/2020-09:48:10] [I] mean: 229.05 ms (end to end 452.939 ms)
[05/27/2020-09:48:10] [I] median: 228.955 ms (end to end 452.665 ms)
[05/27/2020-09:48:10] [I] percentile: 230.185 ms at 99% (end to end 460.878 ms at 99%)
[05/27/2020-09:48:10] [I] throughput: 4.14763 qps
[05/27/2020-09:48:10] [I] walltime: 3.85763 s
[05/27/2020-09:48:10] [I] GPU Compute
[05/27/2020-09:48:10] [I] min: 225.492 ms
[05/27/2020-09:48:10] [I] max: 227.546 ms
[05/27/2020-09:48:10] [I] mean: 226.39 ms
[05/27/2020-09:48:10] [I] median: 226.316 ms
[05/27/2020-09:48:10] [I] percentile: 227.546 ms at 99%
[05/27/2020-09:48:10] [I] total compute time: 3.62224 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=rn50crop.onnx

Seems I have better result on the host than on the Jetson…even if I have only one Quadro4000 in it.

For the batch question, I was more asking how to know what is the maximum batch we can put on the Jetson ? Is it hardwared defined or can I pretty much put any number to increase speed ?

Thanks

[EDIT]:
I tested the following command and here is what I get:

nvidia@nvidia-desktop:~$ /usr/src/tensorrt/bin/trtexec --loadEngine=rn50engine.trt --fp16 --batch=64
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=rn50engine.trt --fp16 --batch=64
[05/27/2020-10:01:30] [I] === Model Options ===
[05/27/2020-10:01:30] [I] Format: *
[05/27/2020-10:01:30] [I] Model:
[05/27/2020-10:01:30] [I] Output:
[05/27/2020-10:01:30] [I] === Build Options ===
[05/27/2020-10:01:30] [I] Max batch: 64
[05/27/2020-10:01:30] [I] Workspace: 16 MB
[05/27/2020-10:01:30] [I] minTiming: 1
[05/27/2020-10:01:30] [I] avgTiming: 8
[05/27/2020-10:01:30] [I] Precision: FP32+FP16
[05/27/2020-10:01:30] [I] Calibration:
[05/27/2020-10:01:30] [I] Safe mode: Disabled
[05/27/2020-10:01:30] [I] Save engine:
[05/27/2020-10:01:30] [I] Load engine: rn50engine.trt
[05/27/2020-10:01:30] [I] Builder Cache: Enabled
[05/27/2020-10:01:30] [I] NVTX verbosity: 0
[05/27/2020-10:01:30] [I] Inputs format: fp32:CHW
[05/27/2020-10:01:30] [I] Outputs format: fp32:CHW
[05/27/2020-10:01:30] [I] Input build shapes: model
[05/27/2020-10:01:30] [I] Input calibration shapes: model
[05/27/2020-10:01:30] [I] === System Options ===
[05/27/2020-10:01:30] [I] Device: 0
[05/27/2020-10:01:30] [I] DLACore:
[05/27/2020-10:01:30] [I] Plugins:
[05/27/2020-10:01:30] [I] === Inference Options ===
[05/27/2020-10:01:30] [I] Batch: 64
[05/27/2020-10:01:30] [I] Input inference shapes: model
[05/27/2020-10:01:30] [I] Iterations: 10
[05/27/2020-10:01:30] [I] Duration: 3s (+ 200ms warm up)
[05/27/2020-10:01:30] [I] Sleep time: 0ms
[05/27/2020-10:01:30] [I] Streams: 1
[05/27/2020-10:01:30] [I] ExposeDMA: Disabled
[05/27/2020-10:01:30] [I] Spin-wait: Disabled
[05/27/2020-10:01:30] [I] Multithreading: Disabled
[05/27/2020-10:01:30] [I] CUDA Graph: Disabled
[05/27/2020-10:01:30] [I] Skip inference: Disabled
[05/27/2020-10:01:30] [I] Inputs:
[05/27/2020-10:01:30] [I] === Reporting Options ===
[05/27/2020-10:01:30] [I] Verbose: Disabled
[05/27/2020-10:01:30] [I] Averages: 10 inferences
[05/27/2020-10:01:30] [I] Percentile: 99
[05/27/2020-10:01:30] [I] Dump output: Disabled
[05/27/2020-10:01:30] [I] Profile: Disabled
[05/27/2020-10:01:30] [I] Export timing to JSON file:
[05/27/2020-10:01:30] [I] Export output to JSON file:
[05/27/2020-10:01:30] [I] Export profile to JSON file:
[05/27/2020-10:01:30] [I]
[05/27/2020-10:01:34] [W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[05/27/2020-10:01:42] [I] Starting inference threads
[05/27/2020-10:01:42] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:42] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:42] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:42] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:42] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:42] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:42] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:42] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:43] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:43] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:43] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:43] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:43] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:43] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:43] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:43] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:43] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:43] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:43] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:43] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:44] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:45] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:45] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:45] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:45] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:45] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:45] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:45] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:45] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::387, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 64, but engine max batch size was: 1
[05/27/2020-10:01:45] [I] Warmup completed 128 queries over 200 ms
[05/27/2020-10:01:45] [I] Timing trace has 2496 queries over 3.16174 s
[05/27/2020-10:01:45] [I] Trace averages of 10 runs:
[05/27/2020-10:01:45] [I] Average on 10 runs - GPU latency: 0.000634766 ms - Host latency: 81.3747 ms (end to end 81.3827 ms)
[05/27/2020-10:01:45] [I] Average on 10 runs - GPU latency: 0.000720215 ms - Host latency: 81.268 ms (end to end 81.2813 ms)
[05/27/2020-10:01:45] [I] Average on 10 runs - GPU latency: 0.000720215 ms - Host latency: 80.7876 ms (end to end 80.8008 ms)
[05/27/2020-10:01:45] [I] Host latency
[05/27/2020-10:01:45] [I] min: 80.1501 ms (end to end 80.1582 ms)
[05/27/2020-10:01:45] [I] max: 82.7965 ms (end to end 82.804 ms)
[05/27/2020-10:01:45] [I] mean: 81.0573 ms (end to end 81.0694 ms)
[05/27/2020-10:01:45] [I] median: 81.134 ms (end to end 81.1419 ms)
[05/27/2020-10:01:45] [I] percentile: 82.7965 ms at 99% (end to end 82.804 ms at 99%)
[05/27/2020-10:01:45] [I] throughput: 789.44 qps
[05/27/2020-10:01:45] [I] walltime: 3.16174 s
[05/27/2020-10:01:45] [I] GPU Compute
[05/27/2020-10:01:45] [I] min: 0.000457764 ms
[05/27/2020-10:01:45] [I] max: 0.0012207 ms
[05/27/2020-10:01:45] [I] mean: 0.000726162 ms
[05/27/2020-10:01:45] [I] median: 0.000610352 ms
[05/27/2020-10:01:45] [I] percentile: 0.0012207 ms at 99%
[05/27/2020-10:01:45] [I] total compute time: 2.83203e-05 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=rn50engine.trt --fp16 --batch=64

I used the .trt file this time and set the batch to 64 (but if I use batch=1 I have the same old result of 10s for 10 runs)

Hi,

Increase batchsize will improvement the SIMD property which is good for GPU.
Is this performance good for you now?

Thanks.

Hey,
the last result with a host latency of 84ms, yeah it is quite good, I just wonder if I can keep this performance in a overall system… (grabbing an image, sending it through the network, getting the coordinates of boxes back etc…)

There is just a point I don’t get: the input shape (–shape option), what order is it ?
Example: 1x3x244x244 does it mean batch x channel x height x width ? or are height and width inverted ?

I think I must build the .trt file with all the right option to optimize it.

But then once I have the .trt file, is there a way to call it from a python script ? Like having a function grabbing the image from a camera and then putting through the network and get the bounding box coordinate back ?

Thanks

Hi,

Sorry for the late update.

To get an optimized pipeline, it’s recommended to use our Deepstream SDK:

We also have a python example here:

The input format in TensorRT is NCHW.
Thanks.