The jetson_benchmarks from github can not run on Xavier NX 16G SoM

We use jetson_benchmarks to test the performance of the Xavier NX 8G SoM on our custom board and it works fine, but not with the Xavier NX 16G SoM

The BSP is JP4.6.1

You mean the script from GitHub - NVIDIA-AI-IOT/jetson_benchmarks: Jetson Benchmark?

Yes

Let me inform internal team to investigate this issue. Thanks

Hi! Could you please share the logs and errors?

&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=1 --int8 --allowGPUFallback --useDLACore=0 --workspace=1024 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b1_ws1024_dla1.engine
[04/14/2022-14:10:32] [I] === Model Options ===
[04/14/2022-14:10:32] [I] Format: Caffe
[04/14/2022-14:10:32] [I] Model:
[04/14/2022-14:10:32] [I] Prototxt: /home/aim/jetson_benchmarks/models/inception_v4.prototxt
[04/14/2022-14:10:32] [I] Output: prob
[04/14/2022-14:10:32] [I] === Build Options ===
[04/14/2022-14:10:32] [I] Max batch: 1
[04/14/2022-14:10:32] [I] Workspace: 1024 MiB
[04/14/2022-14:10:32] [I] minTiming: 1
[04/14/2022-14:10:32] [I] avgTiming: 8
[04/14/2022-14:10:32] [I] Precision: FP32+INT8
[04/14/2022-14:10:32] [I] Calibration: Dynamic
[04/14/2022-14:10:32] [I] Refit: Disabled
[04/14/2022-14:10:32] [I] Sparsity: Disabled
[04/14/2022-14:10:32] [I] Safe mode: Disabled
[04/14/2022-14:10:32] [I] DirectIO mode: Disabled
[04/14/2022-14:10:32] [I] Restricted mode: Disabled
[04/14/2022-14:10:32] [I] Save engine:
[04/14/2022-14:10:32] [I] Load engine: /home/aim/jetson_benchmarks/models/inception_v4_b1_ws1024_dla1.engine
[04/14/2022-14:10:32] [I] Profiling verbosity: 0
[04/14/2022-14:10:32] [I] Tactic sources: Using default tactic sources
[04/14/2022-14:10:32] [I] timingCacheMode: local
[04/14/2022-14:10:32] [I] timingCacheFile:
[04/14/2022-14:10:32] [I] Input(s)s format: fp32:CHW
[04/14/2022-14:10:32] [I] Output(s)s format: fp32:CHW
[04/14/2022-14:10:32] [I] Input build shapes: model
[04/14/2022-14:10:32] [I] Input calibration shapes: model
[04/14/2022-14:10:32] [I] === System Options ===
[04/14/2022-14:10:32] [I] Device: 0
[04/14/2022-14:10:32] [I] DLACore: 0(With GPU fallback)
[04/14/2022-14:10:32] [I] Plugins:
[04/14/2022-14:10:32] [I] === Inference Options ===
[04/14/2022-14:10:32] [I] Batch: 1
[04/14/2022-14:10:32] [I] Input inference shapes: model
[04/14/2022-14:10:32] [I] Iterations: 10
[04/14/2022-14:10:32] [I] Duration: 180s (+ 200ms warm up)
[04/14/2022-14:10:32] [I] Sleep time: 0ms
[04/14/2022-14:10:32] [I] Idle time: 0ms
[04/14/2022-14:10:32] [I] Streams: 1
[04/14/2022-14:10:32] [I] ExposeDMA: Disabled
[04/14/2022-14:10:32] [I] Data transfers: Enabled
[04/14/2022-14:10:32] [I] Spin-wait: Disabled
[04/14/2022-14:10:32] [I] Multithreading: Disabled
[04/14/2022-14:10:32] [I] CUDA Graph: Disabled
[04/14/2022-14:10:32] [I] Separate profiling: Disabled
[04/14/2022-14:10:32] [I] Time Deserialize: Disabled
[04/14/2022-14:10:32] [I] Time Refit: Disabled
[04/14/2022-14:10:32] [I] Skip inference: Disabled
[04/14/2022-14:10:32] [I] Inputs:
[04/14/2022-14:10:32] [I] === Reporting Options ===
[04/14/2022-14:10:32] [I] Verbose: Disabled
[04/14/2022-14:10:32] [I] Averages: 100 inferences
[04/14/2022-14:10:32] [I] Percentile: 99
[04/14/2022-14:10:32] [I] Dump refittable layers:Disabled
[04/14/2022-14:10:32] [I] Dump output: Disabled
[04/14/2022-14:10:32] [I] Profile: Disabled
[04/14/2022-14:10:32] [I] Export timing to JSON file:
[04/14/2022-14:10:32] [I] Export output to JSON file:
[04/14/2022-14:10:32] [I] Export profile to JSON file:
[04/14/2022-14:10:32] [I]
[04/14/2022-14:10:32] [I] === Device Information ===
[04/14/2022-14:10:32] [I] Selected Device: Xavier
[04/14/2022-14:10:32] [I] Compute Capability: 7.2
[04/14/2022-14:10:32] [I] SMs: 6
[04/14/2022-14:10:32] [I] Compute Clock Rate: 1.109 GHz
[04/14/2022-14:10:32] [I] Device Global Memory: 15817 MiB
[04/14/2022-14:10:32] [I] Shared Memory per SM: 96 KiB
[04/14/2022-14:10:32] [I] Memory Bus Width: 256 bits (ECC disabled)
[04/14/2022-14:10:32] [I] Memory Clock Rate: 1.109 GHz
[04/14/2022-14:10:32] [I]
[04/14/2022-14:10:32] [I] TensorRT version: 8.2.1
[04/14/2022-14:10:32] [E] Error opening engine file: /home/aim/jetson_benchmarks/models/inception_v4_b1_ws1024_dla1.engine
[04/14/2022-14:10:32] [E] Failed to create engine from model.
[04/14/2022-14:10:32] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=1 --int8 --allowGPUFallback --useDLACore=0 --workspace=1024 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b1_ws1024_dla1.engine

&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=1 --int8 --allowGPUFallback --useDLACore=1 --workspace=1024 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b1_ws1024_dla2.engine
[04/14/2022-14:10:42] [I] === Model Options ===
[04/14/2022-14:10:42] [I] Format: Caffe
[04/14/2022-14:10:42] [I] Model:
[04/14/2022-14:10:42] [I] Prototxt: /home/aim/jetson_benchmarks/models/inception_v4.prototxt
[04/14/2022-14:10:42] [I] Output: prob
[04/14/2022-14:10:42] [I] === Build Options ===
[04/14/2022-14:10:42] [I] Max batch: 1
[04/14/2022-14:10:42] [I] Workspace: 1024 MiB
[04/14/2022-14:10:42] [I] minTiming: 1
[04/14/2022-14:10:42] [I] avgTiming: 8
[04/14/2022-14:10:42] [I] Precision: FP32+INT8
[04/14/2022-14:10:42] [I] Calibration: Dynamic
[04/14/2022-14:10:42] [I] Refit: Disabled
[04/14/2022-14:10:42] [I] Sparsity: Disabled
[04/14/2022-14:10:42] [I] Safe mode: Disabled
[04/14/2022-14:10:42] [I] DirectIO mode: Disabled
[04/14/2022-14:10:42] [I] Restricted mode: Disabled
[04/14/2022-14:10:42] [I] Save engine:
[04/14/2022-14:10:42] [I] Load engine: /home/aim/jetson_benchmarks/models/inception_v4_b1_ws1024_dla2.engine
[04/14/2022-14:10:42] [I] Profiling verbosity: 0
[04/14/2022-14:10:42] [I] Tactic sources: Using default tactic sources
[04/14/2022-14:10:42] [I] timingCacheMode: local
[04/14/2022-14:10:42] [I] timingCacheFile:
[04/14/2022-14:10:42] [I] Input(s)s format: fp32:CHW
[04/14/2022-14:10:42] [I] Output(s)s format: fp32:CHW
[04/14/2022-14:10:42] [I] Input build shapes: model
[04/14/2022-14:10:42] [I] Input calibration shapes: model
[04/14/2022-14:10:42] [I] === System Options ===
[04/14/2022-14:10:42] [I] Device: 0
[04/14/2022-14:10:42] [I] DLACore: 1(With GPU fallback)
[04/14/2022-14:10:42] [I] Plugins:
[04/14/2022-14:10:42] [I] === Inference Options ===
[04/14/2022-14:10:42] [I] Batch: 1
[04/14/2022-14:10:42] [I] Input inference shapes: model
[04/14/2022-14:10:42] [I] Iterations: 10
[04/14/2022-14:10:42] [I] Duration: 180s (+ 200ms warm up)
[04/14/2022-14:10:42] [I] Sleep time: 0ms
[04/14/2022-14:10:42] [I] Idle time: 0ms
[04/14/2022-14:10:42] [I] Streams: 1
[04/14/2022-14:10:42] [I] ExposeDMA: Disabled
[04/14/2022-14:10:42] [I] Data transfers: Enabled
[04/14/2022-14:10:42] [I] Spin-wait: Disabled
[04/14/2022-14:10:42] [I] Multithreading: Disabled
[04/14/2022-14:10:42] [I] CUDA Graph: Disabled
[04/14/2022-14:10:42] [I] Separate profiling: Disabled
[04/14/2022-14:10:42] [I] Time Deserialize: Disabled
[04/14/2022-14:10:42] [I] Time Refit: Disabled
[04/14/2022-14:10:42] [I] Skip inference: Disabled
[04/14/2022-14:10:42] [I] Inputs:
[04/14/2022-14:10:42] [I] === Reporting Options ===
[04/14/2022-14:10:42] [I] Verbose: Disabled
[04/14/2022-14:10:42] [I] Averages: 100 inferences
[04/14/2022-14:10:42] [I] Percentile: 99
[04/14/2022-14:10:42] [I] Dump refittable layers:Disabled
[04/14/2022-14:10:42] [I] Dump output: Disabled
[04/14/2022-14:10:42] [I] Profile: Disabled
[04/14/2022-14:10:42] [I] Export timing to JSON file:
[04/14/2022-14:10:42] [I] Export output to JSON file:
[04/14/2022-14:10:42] [I] Export profile to JSON file:
[04/14/2022-14:10:42] [I]
[04/14/2022-14:10:42] [I] === Device Information ===
[04/14/2022-14:10:42] [I] Selected Device: Xavier
[04/14/2022-14:10:42] [I] Compute Capability: 7.2
[04/14/2022-14:10:42] [I] SMs: 6
[04/14/2022-14:10:42] [I] Compute Clock Rate: 1.109 GHz
[04/14/2022-14:10:42] [I] Device Global Memory: 15817 MiB
[04/14/2022-14:10:42] [I] Shared Memory per SM: 96 KiB
[04/14/2022-14:10:42] [I] Memory Bus Width: 256 bits (ECC disabled)
[04/14/2022-14:10:42] [I] Memory Clock Rate: 1.109 GHz
[04/14/2022-14:10:42] [I]
[04/14/2022-14:10:42] [I] TensorRT version: 8.2.1
[04/14/2022-14:10:42] [E] Error opening engine file: /home/aim/jetson_benchmarks/models/inception_v4_b1_ws1024_dla2.engine
[04/14/2022-14:10:42] [E] Failed to create engine from model.
[04/14/2022-14:10:42] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=1 --int8 --allowGPUFallback --useDLACore=1 --workspace=1024 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b1_ws1024_dla2.engine

&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=2 --int8 --workspace=2048 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b2_ws2048_gpu.engine
[04/14/2022-14:10:22] [I] === Model Options ===
[04/14/2022-14:10:22] [I] Format: Caffe
[04/14/2022-14:10:22] [I] Model:
[04/14/2022-14:10:22] [I] Prototxt: /home/aim/jetson_benchmarks/models/inception_v4.prototxt
[04/14/2022-14:10:22] [I] Output: prob
[04/14/2022-14:10:22] [I] === Build Options ===
[04/14/2022-14:10:22] [I] Max batch: 2
[04/14/2022-14:10:22] [I] Workspace: 2048 MiB
[04/14/2022-14:10:22] [I] minTiming: 1
[04/14/2022-14:10:22] [I] avgTiming: 8
[04/14/2022-14:10:22] [I] Precision: FP32+INT8
[04/14/2022-14:10:22] [I] Calibration: Dynamic
[04/14/2022-14:10:22] [I] Refit: Disabled
[04/14/2022-14:10:22] [I] Sparsity: Disabled
[04/14/2022-14:10:22] [I] Safe mode: Disabled
[04/14/2022-14:10:22] [I] DirectIO mode: Disabled
[04/14/2022-14:10:22] [I] Restricted mode: Disabled
[04/14/2022-14:10:22] [I] Save engine:
[04/14/2022-14:10:22] [I] Load engine: /home/aim/jetson_benchmarks/models/inception_v4_b2_ws2048_gpu.engine
[04/14/2022-14:10:22] [I] Profiling verbosity: 0
[04/14/2022-14:10:22] [I] Tactic sources: Using default tactic sources
[04/14/2022-14:10:22] [I] timingCacheMode: local
[04/14/2022-14:10:22] [I] timingCacheFile:
[04/14/2022-14:10:22] [I] Input(s)s format: fp32:CHW
[04/14/2022-14:10:22] [I] Output(s)s format: fp32:CHW
[04/14/2022-14:10:22] [I] Input build shapes: model
[04/14/2022-14:10:22] [I] Input calibration shapes: model
[04/14/2022-14:10:22] [I] === System Options ===
[04/14/2022-14:10:22] [I] Device: 0
[04/14/2022-14:10:22] [I] DLACore:
[04/14/2022-14:10:22] [I] Plugins:
[04/14/2022-14:10:22] [I] === Inference Options ===
[04/14/2022-14:10:22] [I] Batch: 2
[04/14/2022-14:10:22] [I] Input inference shapes: model
[04/14/2022-14:10:22] [I] Iterations: 10
[04/14/2022-14:10:22] [I] Duration: 180s (+ 200ms warm up)
[04/14/2022-14:10:22] [I] Sleep time: 0ms
[04/14/2022-14:10:22] [I] Idle time: 0ms
[04/14/2022-14:10:22] [I] Streams: 1
[04/14/2022-14:10:22] [I] ExposeDMA: Disabled
[04/14/2022-14:10:22] [I] Data transfers: Enabled
[04/14/2022-14:10:22] [I] Spin-wait: Disabled
[04/14/2022-14:10:22] [I] Multithreading: Disabled
[04/14/2022-14:10:22] [I] CUDA Graph: Disabled
[04/14/2022-14:10:22] [I] Separate profiling: Disabled
[04/14/2022-14:10:22] [I] Time Deserialize: Disabled
[04/14/2022-14:10:22] [I] Time Refit: Disabled
[04/14/2022-14:10:22] [I] Skip inference: Disabled
[04/14/2022-14:10:22] [I] Inputs:
[04/14/2022-14:10:22] [I] === Reporting Options ===
[04/14/2022-14:10:22] [I] Verbose: Disabled
[04/14/2022-14:10:22] [I] Averages: 100 inferences
[04/14/2022-14:10:22] [I] Percentile: 99
[04/14/2022-14:10:22] [I] Dump refittable layers:Disabled
[04/14/2022-14:10:22] [I] Dump output: Disabled
[04/14/2022-14:10:22] [I] Profile: Disabled
[04/14/2022-14:10:22] [I] Export timing to JSON file:
[04/14/2022-14:10:22] [I] Export output to JSON file:
[04/14/2022-14:10:22] [I] Export profile to JSON file:
[04/14/2022-14:10:22] [I]
[04/14/2022-14:10:22] [I] === Device Information ===
[04/14/2022-14:10:22] [I] Selected Device: Xavier
[04/14/2022-14:10:22] [I] Compute Capability: 7.2
[04/14/2022-14:10:22] [I] SMs: 6
[04/14/2022-14:10:22] [I] Compute Clock Rate: 1.109 GHz
[04/14/2022-14:10:22] [I] Device Global Memory: 15817 MiB
[04/14/2022-14:10:22] [I] Shared Memory per SM: 96 KiB
[04/14/2022-14:10:22] [I] Memory Bus Width: 256 bits (ECC disabled)
[04/14/2022-14:10:22] [I] Memory Clock Rate: 1.109 GHz
[04/14/2022-14:10:22] [I]
[04/14/2022-14:10:22] [I] TensorRT version: 8.2.1
[04/14/2022-14:10:22] [E] Error opening engine file: /home/aim/jetson_benchmarks/models/inception_v4_b2_ws2048_gpu.engine
[04/14/2022-14:10:22] [E] Failed to create engine from model.
[04/14/2022-14:10:22] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=2 --int8 --workspace=2048 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b2_ws2048_gpu.engine

Could you please confirm that –save_dir (while downloading models) and –model_dir (while running models) contain the absolute path?

Yes, the system image I used was clone by flash.sh
This image can be used normally on 8G SoM

I also tried reinstalling the tool(jetson_benchmarks) on the 16G SoM, but it still doesn’t work

To confirm: both 8G and 16G have the same JetPack and TensorRT versions?

Yes

Is this a problem with all models or just inception_v4?

All models failed

Could you please try setting the workspace size to either 1024 or 512 in benchmark_csv/ nx-benchmarks.csv and running it again?

I’m on vacation now, and it will take two days for me to do this test, I think it should be faster for you to do it yourself

It makes sense to do it on your Jetson since we are debugging why it isn’t running with the configuration on your device. There is no rush from our side, so please feel free to try it at your convenience.

The result is still failed

please see the log
&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=1 --int8 --allowGPUFallback --useDLACore=0 --workspace=512 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b1_ws512_dla1.engine
[04/25/2022-08:49:29] [I] === Model Options ===
[04/25/2022-08:49:29] [I] Format: Caffe
[04/25/2022-08:49:29] [I] Model:
[04/25/2022-08:49:29] [I] Prototxt: /home/aim/jetson_benchmarks/models/inception_v4.prototxt
[04/25/2022-08:49:29] [I] Output: prob
[04/25/2022-08:49:29] [I] === Build Options ===
[04/25/2022-08:49:29] [I] Max batch: 1
[04/25/2022-08:49:29] [I] Workspace: 512 MiB
[04/25/2022-08:49:29] [I] minTiming: 1
[04/25/2022-08:49:29] [I] avgTiming: 8
[04/25/2022-08:49:29] [I] Precision: FP32+INT8
[04/25/2022-08:49:29] [I] Calibration: Dynamic
[04/25/2022-08:49:29] [I] Refit: Disabled
[04/25/2022-08:49:29] [I] Sparsity: Disabled
[04/25/2022-08:49:29] [I] Safe mode: Disabled
[04/25/2022-08:49:29] [I] DirectIO mode: Disabled
[04/25/2022-08:49:29] [I] Restricted mode: Disabled
[04/25/2022-08:49:29] [I] Save engine:
[04/25/2022-08:49:29] [I] Load engine: /home/aim/jetson_benchmarks/models/inception_v4_b1_ws512_dla1.engine
[04/25/2022-08:49:29] [I] Profiling verbosity: 0
[04/25/2022-08:49:29] [I] Tactic sources: Using default tactic sources
[04/25/2022-08:49:29] [I] timingCacheMode: local
[04/25/2022-08:49:29] [I] timingCacheFile:
[04/25/2022-08:49:29] [I] Input(s)s format: fp32:CHW
[04/25/2022-08:49:29] [I] Output(s)s format: fp32:CHW
[04/25/2022-08:49:29] [I] Input build shapes: model
[04/25/2022-08:49:29] [I] Input calibration shapes: model
[04/25/2022-08:49:29] [I] === System Options ===
[04/25/2022-08:49:29] [I] Device: 0
[04/25/2022-08:49:29] [I] DLACore: 0(With GPU fallback)
[04/25/2022-08:49:29] [I] Plugins:
[04/25/2022-08:49:29] [I] === Inference Options ===
[04/25/2022-08:49:29] [I] Batch: 1
[04/25/2022-08:49:29] [I] Input inference shapes: model
[04/25/2022-08:49:29] [I] Iterations: 10
[04/25/2022-08:49:29] [I] Duration: 180s (+ 200ms warm up)
[04/25/2022-08:49:29] [I] Sleep time: 0ms
[04/25/2022-08:49:29] [I] Idle time: 0ms
[04/25/2022-08:49:29] [I] Streams: 1
[04/25/2022-08:49:29] [I] ExposeDMA: Disabled
[04/25/2022-08:49:29] [I] Data transfers: Enabled
[04/25/2022-08:49:29] [I] Spin-wait: Disabled
[04/25/2022-08:49:29] [I] Multithreading: Disabled
[04/25/2022-08:49:29] [I] CUDA Graph: Disabled
[04/25/2022-08:49:29] [I] Separate profiling: Disabled
[04/25/2022-08:49:29] [I] Time Deserialize: Disabled
[04/25/2022-08:49:29] [I] Time Refit: Disabled
[04/25/2022-08:49:29] [I] Skip inference: Disabled
[04/25/2022-08:49:29] [I] Inputs:
[04/25/2022-08:49:29] [I] === Reporting Options ===
[04/25/2022-08:49:29] [I] Verbose: Disabled
[04/25/2022-08:49:29] [I] Averages: 100 inferences
[04/25/2022-08:49:29] [I] Percentile: 99
[04/25/2022-08:49:29] [I] Dump refittable layers:Disabled
[04/25/2022-08:49:29] [I] Dump output: Disabled
[04/25/2022-08:49:29] [I] Profile: Disabled
[04/25/2022-08:49:29] [I] Export timing to JSON file:
[04/25/2022-08:49:29] [I] Export output to JSON file:
[04/25/2022-08:49:29] [I] Export profile to JSON file:
[04/25/2022-08:49:29] [I]
[04/25/2022-08:49:29] [I] === Device Information ===
[04/25/2022-08:49:29] [I] Selected Device: Xavier
[04/25/2022-08:49:29] [I] Compute Capability: 7.2
[04/25/2022-08:49:29] [I] SMs: 6
[04/25/2022-08:49:29] [I] Compute Clock Rate: 1.109 GHz
[04/25/2022-08:49:29] [I] Device Global Memory: 15825 MiB
[04/25/2022-08:49:29] [I] Shared Memory per SM: 96 KiB
[04/25/2022-08:49:29] [I] Memory Bus Width: 256 bits (ECC disabled)
[04/25/2022-08:49:29] [I] Memory Clock Rate: 1.109 GHz
[04/25/2022-08:49:29] [I]
[04/25/2022-08:49:29] [I] TensorRT version: 8.2.1
[04/25/2022-08:49:29] [E] Error opening engine file: /home/aim/jetson_benchmarks/models/inception_v4_b1_ws512_dla1.engine
[04/25/2022-08:49:29] [E] Failed to create engine from model.
[04/25/2022-08:49:29] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=1 --int8 --allowGPUFallback --useDLACore=0 --workspace=512 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b1_ws512_dla1.engine

&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=2 --int8 --workspace=1024 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b2_ws1024_gpu.engine
[04/25/2022-08:52:37] [I] === Model Options ===
[04/25/2022-08:52:37] [I] Format: Caffe
[04/25/2022-08:52:37] [I] Model:
[04/25/2022-08:52:37] [I] Prototxt: /home/aim/jetson_benchmarks/models/inception_v4.prototxt
[04/25/2022-08:52:37] [I] Output: prob
[04/25/2022-08:52:37] [I] === Build Options ===
[04/25/2022-08:52:37] [I] Max batch: 2
[04/25/2022-08:52:37] [I] Workspace: 1024 MiB
[04/25/2022-08:52:37] [I] minTiming: 1
[04/25/2022-08:52:37] [I] avgTiming: 8
[04/25/2022-08:52:37] [I] Precision: FP32+INT8
[04/25/2022-08:52:37] [I] Calibration: Dynamic
[04/25/2022-08:52:37] [I] Refit: Disabled
[04/25/2022-08:52:37] [I] Sparsity: Disabled
[04/25/2022-08:52:37] [I] Safe mode: Disabled
[04/25/2022-08:52:37] [I] DirectIO mode: Disabled
[04/25/2022-08:52:37] [I] Restricted mode: Disabled
[04/25/2022-08:52:37] [I] Save engine:
[04/25/2022-08:52:37] [I] Load engine: /home/aim/jetson_benchmarks/models/inception_v4_b2_ws1024_gpu.engine
[04/25/2022-08:52:37] [I] Profiling verbosity: 0
[04/25/2022-08:52:37] [I] Tactic sources: Using default tactic sources
[04/25/2022-08:52:37] [I] timingCacheMode: local
[04/25/2022-08:52:37] [I] timingCacheFile:
[04/25/2022-08:52:37] [I] Input(s)s format: fp32:CHW
[04/25/2022-08:52:37] [I] Output(s)s format: fp32:CHW
[04/25/2022-08:52:37] [I] Input build shapes: model
[04/25/2022-08:52:37] [I] Input calibration shapes: model
[04/25/2022-08:52:37] [I] === System Options ===
[04/25/2022-08:52:37] [I] Device: 0
[04/25/2022-08:52:37] [I] DLACore:
[04/25/2022-08:52:37] [I] Plugins:
[04/25/2022-08:52:37] [I] === Inference Options ===
[04/25/2022-08:52:37] [I] Batch: 2
[04/25/2022-08:52:37] [I] Input inference shapes: model
[04/25/2022-08:52:37] [I] Iterations: 10
[04/25/2022-08:52:37] [I] Duration: 180s (+ 200ms warm up)
[04/25/2022-08:52:37] [I] Sleep time: 0ms
[04/25/2022-08:52:37] [I] Idle time: 0ms
[04/25/2022-08:52:37] [I] Streams: 1
[04/25/2022-08:52:37] [I] ExposeDMA: Disabled
[04/25/2022-08:52:37] [I] Data transfers: Enabled
[04/25/2022-08:52:37] [I] Spin-wait: Disabled
[04/25/2022-08:52:37] [I] Multithreading: Disabled
[04/25/2022-08:52:37] [I] CUDA Graph: Disabled
[04/25/2022-08:52:37] [I] Separate profiling: Disabled
[04/25/2022-08:52:37] [I] Time Deserialize: Disabled
[04/25/2022-08:52:37] [I] Time Refit: Disabled
[04/25/2022-08:52:37] [I] Skip inference: Disabled
[04/25/2022-08:52:37] [I] Inputs:
[04/25/2022-08:52:37] [I] === Reporting Options ===
[04/25/2022-08:52:37] [I] Verbose: Disabled
[04/25/2022-08:52:37] [I] Averages: 100 inferences
[04/25/2022-08:52:37] [I] Percentile: 99
[04/25/2022-08:52:37] [I] Dump refittable layers:Disabled
[04/25/2022-08:52:37] [I] Dump output: Disabled
[04/25/2022-08:52:37] [I] Profile: Disabled
[04/25/2022-08:52:37] [I] Export timing to JSON file:
[04/25/2022-08:52:37] [I] Export output to JSON file:
[04/25/2022-08:52:37] [I] Export profile to JSON file:
[04/25/2022-08:52:37] [I]
[04/25/2022-08:52:37] [I] === Device Information ===
[04/25/2022-08:52:37] [I] Selected Device: Xavier
[04/25/2022-08:52:37] [I] Compute Capability: 7.2
[04/25/2022-08:52:37] [I] SMs: 6
[04/25/2022-08:52:37] [I] Compute Clock Rate: 1.109 GHz
[04/25/2022-08:52:37] [I] Device Global Memory: 15825 MiB
[04/25/2022-08:52:37] [I] Shared Memory per SM: 96 KiB
[04/25/2022-08:52:37] [I] Memory Bus Width: 256 bits (ECC disabled)
[04/25/2022-08:52:37] [I] Memory Clock Rate: 1.109 GHz
[04/25/2022-08:52:37] [I]
[04/25/2022-08:52:37] [I] TensorRT version: 8.2.1
[04/25/2022-08:52:37] [E] Error opening engine file: /home/aim/jetson_benchmarks/models/inception_v4_b2_ws1024_gpu.engine
[04/25/2022-08:52:37] [E] Failed to create engine from model.
[04/25/2022-08:52:37] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=2 --int8 --workspace=1024 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b2_ws1024_gpu.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=2 --int8 --workspace=512 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b2_ws512_gpu.engine
[04/25/2022-08:56:06] [I] === Model Options ===
[04/25/2022-08:56:06] [I] Format: Caffe
[04/25/2022-08:56:06] [I] Model:
[04/25/2022-08:56:06] [I] Prototxt: /home/aim/jetson_benchmarks/models/inception_v4.prototxt
[04/25/2022-08:56:06] [I] Output: prob
[04/25/2022-08:56:06] [I] === Build Options ===
[04/25/2022-08:56:06] [I] Max batch: 2
[04/25/2022-08:56:06] [I] Workspace: 512 MiB
[04/25/2022-08:56:06] [I] minTiming: 1
[04/25/2022-08:56:06] [I] avgTiming: 8
[04/25/2022-08:56:06] [I] Precision: FP32+INT8
[04/25/2022-08:56:06] [I] Calibration: Dynamic
[04/25/2022-08:56:06] [I] Refit: Disabled
[04/25/2022-08:56:06] [I] Sparsity: Disabled
[04/25/2022-08:56:06] [I] Safe mode: Disabled
[04/25/2022-08:56:06] [I] DirectIO mode: Disabled
[04/25/2022-08:56:06] [I] Restricted mode: Disabled
[04/25/2022-08:56:06] [I] Save engine:
[04/25/2022-08:56:06] [I] Load engine: /home/aim/jetson_benchmarks/models/inception_v4_b2_ws512_gpu.engine
[04/25/2022-08:56:06] [I] Profiling verbosity: 0
[04/25/2022-08:56:06] [I] Tactic sources: Using default tactic sources
[04/25/2022-08:56:06] [I] timingCacheMode: local
[04/25/2022-08:56:06] [I] timingCacheFile:
[04/25/2022-08:56:06] [I] Input(s)s format: fp32:CHW
[04/25/2022-08:56:06] [I] Output(s)s format: fp32:CHW
[04/25/2022-08:56:06] [I] Input build shapes: model
[04/25/2022-08:56:06] [I] Input calibration shapes: model
[04/25/2022-08:56:06] [I] === System Options ===
[04/25/2022-08:56:06] [I] Device: 0
[04/25/2022-08:56:06] [I] DLACore:
[04/25/2022-08:56:06] [I] Plugins:
[04/25/2022-08:56:06] [I] === Inference Options ===
[04/25/2022-08:56:06] [I] Batch: 2
[04/25/2022-08:56:06] [I] Input inference shapes: model
[04/25/2022-08:56:06] [I] Iterations: 10
[04/25/2022-08:56:06] [I] Duration: 180s (+ 200ms warm up)
[04/25/2022-08:56:06] [I] Sleep time: 0ms
[04/25/2022-08:56:06] [I] Idle time: 0ms
[04/25/2022-08:56:06] [I] Streams: 1
[04/25/2022-08:56:06] [I] ExposeDMA: Disabled
[04/25/2022-08:56:06] [I] Data transfers: Enabled
[04/25/2022-08:56:06] [I] Spin-wait: Disabled
[04/25/2022-08:56:06] [I] Multithreading: Disabled
[04/25/2022-08:56:06] [I] CUDA Graph: Disabled
[04/25/2022-08:56:06] [I] Separate profiling: Disabled
[04/25/2022-08:56:06] [I] Time Deserialize: Disabled
[04/25/2022-08:56:06] [I] Time Refit: Disabled
[04/25/2022-08:56:06] [I] Skip inference: Disabled
[04/25/2022-08:56:06] [I] Inputs:
[04/25/2022-08:56:06] [I] === Reporting Options ===
[04/25/2022-08:56:06] [I] Verbose: Disabled
[04/25/2022-08:56:06] [I] Averages: 100 inferences
[04/25/2022-08:56:06] [I] Percentile: 99
[04/25/2022-08:56:06] [I] Dump refittable layers:Disabled
[04/25/2022-08:56:06] [I] Dump output: Disabled
[04/25/2022-08:56:06] [I] Profile: Disabled
[04/25/2022-08:56:06] [I] Export timing to JSON file:
[04/25/2022-08:56:06] [I] Export output to JSON file:
[04/25/2022-08:56:06] [I] Export profile to JSON file:
[04/25/2022-08:56:06] [I]
[04/25/2022-08:56:06] [I] === Device Information ===
[04/25/2022-08:56:06] [I] Selected Device: Xavier
[04/25/2022-08:56:06] [I] Compute Capability: 7.2
[04/25/2022-08:56:06] [I] SMs: 6
[04/25/2022-08:56:06] [I] Compute Clock Rate: 1.109 GHz
[04/25/2022-08:56:06] [I] Device Global Memory: 15825 MiB
[04/25/2022-08:56:06] [I] Shared Memory per SM: 96 KiB
[04/25/2022-08:56:06] [I] Memory Bus Width: 256 bits (ECC disabled)
[04/25/2022-08:56:06] [I] Memory Clock Rate: 1.109 GHz
[04/25/2022-08:56:06] [I]
[04/25/2022-08:56:06] [I] TensorRT version: 8.2.1
[04/25/2022-08:56:06] [E] Error opening engine file: /home/aim/jetson_benchmarks/models/inception_v4_b2_ws512_gpu.engine
[04/25/2022-08:56:06] [E] Failed to create engine from model.
[04/25/2022-08:56:06] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=2 --int8 --workspace=512 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b2_ws512_gpu.engine

Attach a log of 8G pass

&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=1 --int8 --allowGPUFallback --useDLACore=0 --workspace=1024 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b1_ws1024_dla1.engine
[04/14/2022-14:16:50] [I] === Model Options ===
[04/14/2022-14:16:50] [I] Format: Caffe
[04/14/2022-14:16:50] [I] Model:
[04/14/2022-14:16:50] [I] Prototxt: /home/aim/jetson_benchmarks/models/inception_v4.prototxt
[04/14/2022-14:16:50] [I] Output: prob
[04/14/2022-14:16:50] [I] === Build Options ===
[04/14/2022-14:16:50] [I] Max batch: 1
[04/14/2022-14:16:50] [I] Workspace: 1024 MiB
[04/14/2022-14:16:50] [I] minTiming: 1
[04/14/2022-14:16:50] [I] avgTiming: 8
[04/14/2022-14:16:50] [I] Precision: FP32+INT8
[04/14/2022-14:16:50] [I] Calibration: Dynamic
[04/14/2022-14:16:50] [I] Refit: Disabled
[04/14/2022-14:16:50] [I] Sparsity: Disabled
[04/14/2022-14:16:50] [I] Safe mode: Disabled
[04/14/2022-14:16:50] [I] DirectIO mode: Disabled
[04/14/2022-14:16:50] [I] Restricted mode: Disabled
[04/14/2022-14:16:50] [I] Save engine:
[04/14/2022-14:16:50] [I] Load engine: /home/aim/jetson_benchmarks/models/inception_v4_b1_ws1024_dla1.engine
[04/14/2022-14:16:50] [I] Profiling verbosity: 0
[04/14/2022-14:16:50] [I] Tactic sources: Using default tactic sources
[04/14/2022-14:16:50] [I] timingCacheMode: local
[04/14/2022-14:16:50] [I] timingCacheFile:
[04/14/2022-14:16:50] [I] Input(s)s format: fp32:CHW
[04/14/2022-14:16:50] [I] Output(s)s format: fp32:CHW
[04/14/2022-14:16:50] [I] Input build shapes: model
[04/14/2022-14:16:50] [I] Input calibration shapes: model
[04/14/2022-14:16:50] [I] === System Options ===
[04/14/2022-14:16:50] [I] Device: 0
[04/14/2022-14:16:50] [I] DLACore: 0(With GPU fallback)
[04/14/2022-14:16:50] [I] Plugins:
[04/14/2022-14:16:50] [I] === Inference Options ===
[04/14/2022-14:16:50] [I] Batch: 1
[04/14/2022-14:16:50] [I] Input inference shapes: model
[04/14/2022-14:16:50] [I] Iterations: 10
[04/14/2022-14:16:50] [I] Duration: 180s (+ 200ms warm up)
[04/14/2022-14:16:50] [I] Sleep time: 0ms
[04/14/2022-14:16:50] [I] Idle time: 0ms
[04/14/2022-14:16:50] [I] Streams: 1
[04/14/2022-14:16:50] [I] ExposeDMA: Disabled
[04/14/2022-14:16:50] [I] Data transfers: Enabled
[04/14/2022-14:16:50] [I] Spin-wait: Disabled
[04/14/2022-14:16:50] [I] Multithreading: Disabled
[04/14/2022-14:16:50] [I] CUDA Graph: Disabled
[04/14/2022-14:16:50] [I] Separate profiling: Disabled
[04/14/2022-14:16:50] [I] Time Deserialize: Disabled
[04/14/2022-14:16:50] [I] Time Refit: Disabled
[04/14/2022-14:16:50] [I] Skip inference: Disabled
[04/14/2022-14:16:50] [I] Inputs:
[04/14/2022-14:16:50] [I] === Reporting Options ===
[04/14/2022-14:16:50] [I] Verbose: Disabled
[04/14/2022-14:16:50] [I] Averages: 100 inferences
[04/14/2022-14:16:50] [I] Percentile: 99
[04/14/2022-14:16:50] [I] Dump refittable layers:Disabled
[04/14/2022-14:16:50] [I] Dump output: Disabled
[04/14/2022-14:16:50] [I] Profile: Disabled
[04/14/2022-14:16:50] [I] Export timing to JSON file:
[04/14/2022-14:16:50] [I] Export output to JSON file:
[04/14/2022-14:16:50] [I] Export profile to JSON file:
[04/14/2022-14:16:50] [I]
[04/14/2022-14:16:50] [I] === Device Information ===
[04/14/2022-14:16:50] [I] Selected Device: Xavier
[04/14/2022-14:16:50] [I] Compute Capability: 7.2
[04/14/2022-14:16:50] [I] SMs: 6
[04/14/2022-14:16:50] [I] Compute Clock Rate: 1.109 GHz
[04/14/2022-14:16:50] [I] Device Global Memory: 7765 MiB
[04/14/2022-14:16:50] [I] Shared Memory per SM: 96 KiB
[04/14/2022-14:16:50] [I] Memory Bus Width: 256 bits (ECC disabled)
[04/14/2022-14:16:50] [I] Memory Clock Rate: 1.109 GHz
[04/14/2022-14:16:50] [I]
[04/14/2022-14:16:50] [I] TensorRT version: 8.2.1
[04/14/2022-14:17:02] [I] [TRT] [MemUsageChange] Init CUDA: CPU +362, GPU +0, now: CPU 422, GPU 5916 (MiB)
[04/14/2022-14:17:02] [I] [TRT] Loaded engine size: 41 MiB
[04/14/2022-14:17:10] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +661, now: CPU 691, GPU 6623 (MiB)
[04/14/2022-14:17:22] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +836, now: CPU 998, GPU 7459 (MiB)
[04/14/2022-14:17:22] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +41, GPU +0, now: CPU 41, GPU 0 (MiB)
[04/14/2022-14:17:22] [I] Engine loaded in 31.3501 sec.
[04/14/2022-14:17:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +3, now: CPU 957, GPU 7421 (MiB)
[04/14/2022-14:17:22] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 957, GPU 7421 (MiB)
[04/14/2022-14:17:22] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 41, GPU 0 (MiB)
[04/14/2022-14:17:22] [I] Using random values for input data
[04/14/2022-14:17:22] [I] Created input binding for data with dimensions 3x299x299
[04/14/2022-14:17:22] [I] Using random values for output prob
[04/14/2022-14:17:22] [I] Created output binding for prob with dimensions 1000x1x1
[04/14/2022-14:17:22] [I] Starting inference
[04/14/2022-14:20:22] [I] Warmup completed 14 queries over 200 ms
[04/14/2022-14:20:22] [I] Timing trace has 11825 queries over 180.032 s
[04/14/2022-14:20:22] [I]
[04/14/2022-14:20:22] [I] === Trace details ===
[04/14/2022-14:20:22] [I] Trace averages of 100 runs:
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.7801 ms - Host latency: 12.8346 ms (end to end 14.2416 ms, enqueue 12.797 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.0468 ms - Host latency: 13.1039 ms (end to end 14.5095 ms, enqueue 12.5749 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.0648 ms - Host latency: 13.1248 ms (end to end 14.5306 ms, enqueue 12.4942 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.992 ms - Host latency: 13.0557 ms (end to end 14.5674 ms, enqueue 12.5021 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.0226 ms - Host latency: 13.0889 ms (end to end 14.6039 ms, enqueue 12.5721 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.736 ms - Host latency: 12.8093 ms (end to end 14.3889 ms, enqueue 12.8561 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.6468 ms - Host latency: 12.7236 ms (end to end 14.2387 ms, enqueue 13.0713 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.6971 ms - Host latency: 12.7744 ms (end to end 14.2694 ms, enqueue 13.1626 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.812 ms - Host latency: 12.8904 ms (end to end 14.289 ms, enqueue 13.0269 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.005 ms - Host latency: 13.0947 ms (end to end 14.6112 ms, enqueue 12.9078 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.8799 ms - Host latency: 12.9769 ms (end to end 14.6993 ms, enqueue 13.3133 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.8853 ms - Host latency: 12.9967 ms (end to end 14.5459 ms, enqueue 13.5546 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.8543 ms - Host latency: 12.9655 ms (end to end 14.5223 ms, enqueue 13.6839 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.1129 ms - Host latency: 13.2397 ms (end to end 14.8653 ms, enqueue 13.3527 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.4038 ms - Host latency: 13.5573 ms (end to end 15.4049 ms, enqueue 13.4677 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.5325 ms - Host latency: 13.7517 ms (end to end 15.933 ms, enqueue 13.9918 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7309 ms - Host latency: 13.9496 ms (end to end 15.9822 ms, enqueue 13.2874 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7197 ms - Host latency: 13.9387 ms (end to end 15.9518 ms, enqueue 13.9767 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.717 ms - Host latency: 13.9357 ms (end to end 15.9612 ms, enqueue 13.4906 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.5917 ms - Host latency: 13.7863 ms (end to end 15.7106 ms, enqueue 13.7488 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.3081 ms - Host latency: 13.4553 ms (end to end 15.1803 ms, enqueue 13.6161 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.321 ms - Host latency: 13.468 ms (end to end 15.1941 ms, enqueue 13.585 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.2905 ms - Host latency: 13.4376 ms (end to end 15.161 ms, enqueue 13.6349 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.1303 ms - Host latency: 13.2775 ms (end to end 14.9924 ms, enqueue 14.01 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.3105 ms - Host latency: 13.4578 ms (end to end 15.1791 ms, enqueue 13.4607 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.3475 ms - Host latency: 13.4949 ms (end to end 15.2276 ms, enqueue 13.3161 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.2989 ms - Host latency: 13.4458 ms (end to end 15.1474 ms, enqueue 13.4241 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.5076 ms - Host latency: 13.6858 ms (end to end 15.5203 ms, enqueue 13.4641 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7401 ms - Host latency: 13.9589 ms (end to end 15.9589 ms, enqueue 13.6739 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7387 ms - Host latency: 13.9577 ms (end to end 15.9939 ms, enqueue 13.4528 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6919 ms - Host latency: 13.9103 ms (end to end 15.9528 ms, enqueue 13.3743 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6203 ms - Host latency: 13.822 ms (end to end 15.794 ms, enqueue 13.2448 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6791 ms - Host latency: 13.8919 ms (end to end 15.9062 ms, enqueue 13.0709 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6289 ms - Host latency: 13.8275 ms (end to end 15.7747 ms, enqueue 13.328 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6599 ms - Host latency: 13.8741 ms (end to end 15.8873 ms, enqueue 13.1837 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6917 ms - Host latency: 13.8926 ms (end to end 15.881 ms, enqueue 13.1366 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7113 ms - Host latency: 13.9229 ms (end to end 15.9161 ms, enqueue 13.1087 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6052 ms - Host latency: 13.7993 ms (end to end 15.7137 ms, enqueue 13.2352 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6772 ms - Host latency: 13.8863 ms (end to end 15.8789 ms, enqueue 13.2791 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6794 ms - Host latency: 13.8857 ms (end to end 15.8606 ms, enqueue 12.9484 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7027 ms - Host latency: 13.9116 ms (end to end 15.8903 ms, enqueue 13.1641 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7271 ms - Host latency: 13.9362 ms (end to end 15.9451 ms, enqueue 13.1595 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6519 ms - Host latency: 13.8552 ms (end to end 15.8006 ms, enqueue 13.2253 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6131 ms - Host latency: 13.8091 ms (end to end 15.7777 ms, enqueue 13.259 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6742 ms - Host latency: 13.8885 ms (end to end 15.9152 ms, enqueue 13.1859 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6639 ms - Host latency: 13.8591 ms (end to end 15.8022 ms, enqueue 13.2034 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7008 ms - Host latency: 13.9156 ms (end to end 15.9502 ms, enqueue 13.0182 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7339 ms - Host latency: 13.9471 ms (end to end 15.9673 ms, enqueue 13.0555 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7473 ms - Host latency: 13.9627 ms (end to end 15.9716 ms, enqueue 12.9942 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6455 ms - Host latency: 13.8459 ms (end to end 15.796 ms, enqueue 12.8902 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7483 ms - Host latency: 13.9615 ms (end to end 15.9543 ms, enqueue 13.0986 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6746 ms - Host latency: 13.8818 ms (end to end 15.8815 ms, enqueue 13.3135 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6308 ms - Host latency: 13.8489 ms (end to end 15.8941 ms, enqueue 13.7077 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7146 ms - Host latency: 13.9331 ms (end to end 15.9764 ms, enqueue 13.3324 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7316 ms - Host latency: 13.9481 ms (end to end 15.9949 ms, enqueue 13.0582 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7118 ms - Host latency: 13.929 ms (end to end 15.9613 ms, enqueue 13.0872 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7593 ms - Host latency: 13.9771 ms (end to end 15.9739 ms, enqueue 13.2453 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7276 ms - Host latency: 13.9412 ms (end to end 15.9505 ms, enqueue 13.2121 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7369 ms - Host latency: 13.9551 ms (end to end 15.9761 ms, enqueue 13.1177 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7904 ms - Host latency: 14.0082 ms (end to end 16.0196 ms, enqueue 13.0977 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7377 ms - Host latency: 13.9557 ms (end to end 15.9966 ms, enqueue 13.1316 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7815 ms - Host latency: 14.0005 ms (end to end 16.0232 ms, enqueue 12.9839 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6789 ms - Host latency: 13.897 ms (end to end 16.0873 ms, enqueue 13.0141 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7238 ms - Host latency: 13.9421 ms (end to end 15.9627 ms, enqueue 13.0264 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6816 ms - Host latency: 13.8966 ms (end to end 16.1532 ms, enqueue 13.0913 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7746 ms - Host latency: 13.9874 ms (end to end 15.9984 ms, enqueue 12.9648 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7418 ms - Host latency: 13.9598 ms (end to end 15.9987 ms, enqueue 12.9255 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.796 ms - Host latency: 14.0132 ms (end to end 16.1446 ms, enqueue 13.0156 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 14.1451 ms - Host latency: 14.3839 ms (end to end 16.7914 ms, enqueue 13.2415 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 14.9194 ms - Host latency: 15.3058 ms (end to end 18.8218 ms, enqueue 14.0735 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 14.5932 ms - Host latency: 14.8802 ms (end to end 17.857 ms, enqueue 13.408 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6719 ms - Host latency: 13.8882 ms (end to end 15.9639 ms, enqueue 13.0984 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 14.2298 ms - Host latency: 14.4578 ms (end to end 17.2447 ms, enqueue 14.3208 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 15.0164 ms - Host latency: 15.4026 ms (end to end 19.4354 ms, enqueue 13.9257 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 14.893 ms - Host latency: 15.2764 ms (end to end 19.1198 ms, enqueue 13.1629 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 15.0992 ms - Host latency: 15.4872 ms (end to end 19.4213 ms, enqueue 14.0691 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 15.7193 ms - Host latency: 16.0993 ms (end to end 20.6677 ms, enqueue 13.5432 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 14.701 ms - Host latency: 15.0401 ms (end to end 18.512 ms, enqueue 13.8929 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.6888 ms - Host latency: 13.898 ms (end to end 15.9096 ms, enqueue 13.061 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.3512 ms - Host latency: 13.4983 ms (end to end 15.2288 ms, enqueue 13.0693 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.3435 ms - Host latency: 13.495 ms (end to end 15.2332 ms, enqueue 13.2127 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7502 ms - Host latency: 13.9692 ms (end to end 16.0113 ms, enqueue 12.8234 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7452 ms - Host latency: 13.9635 ms (end to end 16.0124 ms, enqueue 13.0553 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7773 ms - Host latency: 13.9963 ms (end to end 16.0381 ms, enqueue 13.1098 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.857 ms - Host latency: 14.0758 ms (end to end 16.4848 ms, enqueue 13.0592 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.4545 ms - Host latency: 13.6725 ms (end to end 15.708 ms, enqueue 13.4611 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.7742 ms - Host latency: 13.9952 ms (end to end 16.0223 ms, enqueue 12.9358 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 14.3009 ms - Host latency: 14.5842 ms (end to end 17.4802 ms, enqueue 13.5662 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 14.2103 ms - Host latency: 14.5627 ms (end to end 17.3512 ms, enqueue 13.7048 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.9567 ms - Host latency: 14.3489 ms (end to end 15.7391 ms, enqueue 13.813 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.66 ms - Host latency: 14.0666 ms (end to end 14.545 ms, enqueue 13.6005 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.8112 ms - Host latency: 14.2073 ms (end to end 15.5367 ms, enqueue 14.0531 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.3659 ms - Host latency: 13.7506 ms (end to end 14.2414 ms, enqueue 13.6748 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 13.2272 ms - Host latency: 13.5744 ms (end to end 13.6248 ms, enqueue 13.167 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.6709 ms - Host latency: 12.8661 ms (end to end 12.9489 ms, enqueue 12.5569 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.4748 ms - Host latency: 12.6203 ms (end to end 12.6434 ms, enqueue 12.3 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.4892 ms - Host latency: 12.6336 ms (end to end 13.0292 ms, enqueue 12.728 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.3263 ms - Host latency: 12.4459 ms (end to end 12.4911 ms, enqueue 12.1997 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.4177 ms - Host latency: 12.5352 ms (end to end 12.712 ms, enqueue 12.4386 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.2848 ms - Host latency: 12.4028 ms (end to end 12.4181 ms, enqueue 12.1402 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.4164 ms - Host latency: 12.5469 ms (end to end 12.6234 ms, enqueue 12.3478 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.4802 ms - Host latency: 12.6369 ms (end to end 12.7323 ms, enqueue 12.4 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.4267 ms - Host latency: 12.5784 ms (end to end 12.8928 ms, enqueue 12.565 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.5014 ms - Host latency: 12.6538 ms (end to end 13.037 ms, enqueue 12.7172 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.4783 ms - Host latency: 12.6287 ms (end to end 12.7897 ms, enqueue 12.4803 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.4 ms - Host latency: 12.5416 ms (end to end 12.9663 ms, enqueue 12.6625 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.3414 ms - Host latency: 12.4514 ms (end to end 12.9717 ms, enqueue 12.7208 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.2519 ms - Host latency: 12.3408 ms (end to end 12.8367 ms, enqueue 12.6156 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.2323 ms - Host latency: 12.3116 ms (end to end 12.7517 ms, enqueue 12.5655 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.2867 ms - Host latency: 12.3681 ms (end to end 12.5727 ms, enqueue 12.3895 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.2252 ms - Host latency: 12.3016 ms (end to end 12.7333 ms, enqueue 12.5419 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.2091 ms - Host latency: 12.2823 ms (end to end 12.6898 ms, enqueue 12.5122 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.225 ms - Host latency: 12.3022 ms (end to end 12.7244 ms, enqueue 12.5453 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.222 ms - Host latency: 12.2975 ms (end to end 12.7156 ms, enqueue 12.5375 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.1802 ms - Host latency: 12.247 ms (end to end 12.6245 ms, enqueue 12.4498 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.1987 ms - Host latency: 12.2647 ms (end to end 12.6073 ms, enqueue 12.4294 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.2195 ms - Host latency: 12.2905 ms (end to end 12.3764 ms, enqueue 12.1689 ms)
[04/14/2022-14:20:22] [I] Average on 100 runs - GPU latency: 12.112 ms - Host latency: 12.185 ms (end to end 12.2011 ms, enqueue 12.0231 ms)
[04/14/2022-14:20:22] [I]
[04/14/2022-14:20:22] [I] === Performance summary ===
[04/14/2022-14:20:22] [I] Throughput: 65.6827 qps
[04/14/2022-14:20:22] [I] Latency: min = 12.1406 ms, max = 29.7188 ms, mean = 13.5695 ms, median = 13.6719 ms, percentile(99%) = 17.0859 ms
[04/14/2022-14:20:22] [I] End-to-End Host Latency: min = 12.1406 ms, max = 29.7656 ms, mean = 15.2208 ms, median = 15.4238 ms, percentile(99%) = 21.4219 ms
[04/14/2022-14:20:22] [I] Enqueue Time: min = 3.00391 ms, max = 34.0781 ms, mean = 13.1174 ms, median = 12.8242 ms, percentile(99%) = 17.0625 ms
[04/14/2022-14:20:22] [I] H2D Latency: min = 0.0491333 ms, max = 0.515625 ms, mean = 0.181473 ms, median = 0.203125 ms, percentile(99%) = 0.40625 ms
[04/14/2022-14:20:22] [I] GPU Compute Time: min = 12.0781 ms, max = 29.3125 ms, mean = 13.3806 ms, median = 13.4707 ms, percentile(99%) = 16.6953 ms
[04/14/2022-14:20:22] [I] D2H Latency: min = 0 ms, max = 0.03125 ms, mean = 0.00748486 ms, median = 0.0078125 ms, percentile(99%) = 0.015625 ms
[04/14/2022-14:20:22] [I] Total Host Walltime: 180.032 s
[04/14/2022-14:20:22] [I] Total GPU Compute Time: 158.225 s
[04/14/2022-14:20:22] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[04/14/2022-14:20:22] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[04/14/2022-14:20:22] [I] Explanations of the performance metrics are printed in the verbose logs.
[04/14/2022-14:20:22] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # ./trtexec --output=prob --deploy=/home/aim/jetson_benchmarks/models/inception_v4.prototxt --batch=1 --int8 --allowGPUFallback --useDLACore=0 --workspace=1024 --avgRuns=100 --duration=180 --loadEngine=/home/aim/jetson_benchmarks/models/inception_v4_b1_ws1024_dla1.engine

Thanks. Could you please check if RAM usage is less than 500MB before running the benchmarks on Jetson? To do this, you could use the command “free -m” and look under the “used” column.