DLA cudlaSubmitTask function takes too long (12ms+)

The time taken by the cudlaSubmitTask function varies greatly, when setting the idleTime parameter of trtexec:

nsys profile -t cuda,nvtx,nvmedia \
 --cuda-flush-interval=30000 \
 --accelerator-trace=nvmedia --show-output=true \
 --force-overwrite=true --output=/usr/src/tensorrt/tmp/normal \
 trtexec \
 --loadEngine=example.engine \
 --fp16 --useDLACore=1 --allowGPUFallback
nsys profile -t cuda,nvtx,nvmedia \
 --cuda-flush-interval=30000 \
 --accelerator-trace=nvmedia --show-output=true \
 --force-overwrite=true --output=/usr/src/tensorrt/tmp/normal \
 trtexec \
 --loadEngine=example.engine \
 --fp16 --useDLACore=1 --allowGPUFallback \
 --idleTime=2000

It can be observed that when idleTime is greater than 500, there will be a trend of increasing time consumption. For example, when idleTime 100 vs 2000, the cudlaSubmitTask time is 80us vs 12ms.

Hi,

Could you maximize the device’s performance and try it again?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

If the same behavior is observed, please share the log you get with/without --idleTime.

Thanks.

Already in MAXN model, and jetson_clocks is enabled.

case: idleTime=0

case: idleTime=2000

case: idleTime=0

WARNING: --trace=nvmedia is deprecated, use tegra-accelerators in the future.
WARNING: Deprecated --accelerator-trace argument: nvmedia.
Value 'tegra-accelerators' should be used in the future.
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/bin/trtexec --loadEngine=example.engine --plugins=/usr/src/tensorrt/libplugin.so --fp16 --useDLACore=1 --allowGPUFallback
[03/27/2024-11:27:18] [I] === Model Options ===
[03/27/2024-11:27:18] [I] Format: *
[03/27/2024-11:27:18] [I] Model: 
[03/27/2024-11:27:18] [I] Output:
[03/27/2024-11:27:18] [I] === Build Options ===
[03/27/2024-11:27:18] [I] Max batch: 1
[03/27/2024-11:27:18] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[03/27/2024-11:27:18] [I] minTiming: 1
[03/27/2024-11:27:18] [I] avgTiming: 8
[03/27/2024-11:27:18] [I] Precision: FP32+FP16
[03/27/2024-11:27:18] [I] LayerPrecisions: 
[03/27/2024-11:27:18] [I] Calibration: 
[03/27/2024-11:27:18] [I] Refit: Disabled
[03/27/2024-11:27:18] [I] Sparsity: Disabled
[03/27/2024-11:27:18] [I] Safe mode: Disabled
[03/27/2024-11:27:18] [I] DirectIO mode: Disabled
[03/27/2024-11:27:18] [I] Restricted mode: Disabled
[03/27/2024-11:27:18] [I] Build only: Disabled
[03/27/2024-11:27:18] [I] Save engine: 
[03/27/2024-11:27:18] [I] Load engine: example.engine
[03/27/2024-11:27:18] [I] Profiling verbosity: 0
[03/27/2024-11:27:18] [I] Tactic sources: Using default tactic sources
[03/27/2024-11:27:18] [I] timingCacheMode: local
[03/27/2024-11:27:18] [I] timingCacheFile: 
[03/27/2024-11:27:18] [I] Heuristic: Disabled
[03/27/2024-11:27:18] [I] Preview Features: Use default preview flags.
[03/27/2024-11:27:18] [I] Input(s)s format: fp32:CHW
[03/27/2024-11:27:18] [I] Output(s)s format: fp32:CHW
[03/27/2024-11:27:18] [I] Input build shapes: model
[03/27/2024-11:27:18] [I] Input calibration shapes: model
[03/27/2024-11:27:18] [I] === System Options ===
[03/27/2024-11:27:18] [I] Device: 0
[03/27/2024-11:27:18] [I] DLACore: 1(With GPU fallback)
[03/27/2024-11:27:18] [I] Plugins: /usr/src/tensorrt/libplugin.so
[03/27/2024-11:27:18] [I] === Inference Options ===
[03/27/2024-11:27:18] [I] Batch: 1
[03/27/2024-11:27:18] [I] Input inference shapes: model
[03/27/2024-11:27:18] [I] Iterations: 10
[03/27/2024-11:27:18] [I] Duration: 3s (+ 200ms warm up)
[03/27/2024-11:27:18] [I] Sleep time: 0ms
[03/27/2024-11:27:18] [I] Idle time: 0ms
[03/27/2024-11:27:18] [I] Streams: 1
[03/27/2024-11:27:18] [I] ExposeDMA: Disabled
[03/27/2024-11:27:18] [I] Data transfers: Enabled
[03/27/2024-11:27:18] [I] Spin-wait: Disabled
[03/27/2024-11:27:18] [I] Multithreading: Disabled
[03/27/2024-11:27:18] [I] CUDA Graph: Disabled
[03/27/2024-11:27:18] [I] Separate profiling: Disabled
[03/27/2024-11:27:18] [I] Time Deserialize: Disabled
[03/27/2024-11:27:18] [I] Time Refit: Disabled
[03/27/2024-11:27:18] [I] NVTX verbosity: 0
[03/27/2024-11:27:18] [I] Persistent Cache Ratio: 0
[03/27/2024-11:27:18] [I] Inputs:
[03/27/2024-11:27:18] [I] === Reporting Options ===
[03/27/2024-11:27:18] [I] Verbose: Disabled
[03/27/2024-11:27:18] [I] Averages: 10 inferences
[03/27/2024-11:27:18] [I] Percentiles: 90,95,99
[03/27/2024-11:27:18] [I] Dump refittable layers:Disabled
[03/27/2024-11:27:18] [I] Dump output: Disabled
[03/27/2024-11:27:18] [I] Profile: Disabled
[03/27/2024-11:27:18] [I] Export timing to JSON file: 
[03/27/2024-11:27:18] [I] Export output to JSON file: 
[03/27/2024-11:27:18] [I] Export profile to JSON file: 
[03/27/2024-11:27:18] [I] 
[03/27/2024-11:27:18] [I] === Device Information ===
[03/27/2024-11:27:18] [I] Selected Device: Orin
[03/27/2024-11:27:18] [I] Compute Capability: 8.7
[03/27/2024-11:27:18] [I] SMs: 16
[03/27/2024-11:27:18] [I] Compute Clock Rate: 1.3 GHz
[03/27/2024-11:27:18] [I] Device Global Memory: 62800 MiB
[03/27/2024-11:27:18] [I] Shared Memory per SM: 164 KiB
[03/27/2024-11:27:18] [I] Memory Bus Width: 256 bits (ECC disabled)
[03/27/2024-11:27:18] [I] Memory Clock Rate: 1.3 GHz
[03/27/2024-11:27:18] [I] 
[03/27/2024-11:27:18] [I] TensorRT version: 8.5.2
[03/27/2024-11:27:18] [I] Loading supplied plugin library: /usr/src/tensorrt/libplugin.so
[03/27/2024-11:27:18] [I] Engine loaded in 0.00136703 sec.
[03/27/2024-11:27:19] [I] [TRT] Loaded engine size: 1 MiB
[03/27/2024-11:27:20] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +954, GPU +905, now: CPU 1424, GPU 19035 (MiB)
[03/27/2024-11:27:20] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +130, GPU +121, now: CPU 1554, GPU 19156 (MiB)
[03/27/2024-11:27:20] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +1, GPU +0, now: CPU 1, GPU 0 (MiB)
[03/27/2024-11:27:20] [I] Engine deserialized in 1.61347 sec.
[03/27/2024-11:27:20] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1554, GPU 19156 (MiB)
[03/27/2024-11:27:20] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1554, GPU 19156 (MiB)
[03/27/2024-11:27:20] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2, now: CPU 1, GPU 2 (MiB)
[03/27/2024-11:27:20] [I] Setting persistentCacheLimit to 0 bytes.
[03/27/2024-11:27:20] [I] Using random values for input input.1
[03/27/2024-11:27:20] [I] Created input binding for input.1 with dimensions 1x540x960x3
[03/27/2024-11:27:20] [I] Using random values for output weather
[03/27/2024-11:27:20] [I] Created output binding for weather with dimensions 1x5
[03/27/2024-11:27:20] [I] Using random values for output light
[03/27/2024-11:27:20] [I] Created output binding for light with dimensions 1x3
[03/27/2024-11:27:20] [I] Using random values for output scene
[03/27/2024-11:27:20] [I] Created output binding for scene with dimensions 1x3
[03/27/2024-11:27:20] [I] Using random values for output road
[03/27/2024-11:27:20] [I] Created output binding for road with dimensions 1x5
[03/27/2024-11:27:20] [I] Starting inference
[03/27/2024-11:27:23] [I] Warmup completed 105 queries over 200 ms
[03/27/2024-11:27:23] [I] Timing trace has 2036 queries over 3.00307 s
[03/27/2024-11:27:23] [I] 
[03/27/2024-11:27:23] [I] === Trace details ===
[03/27/2024-11:27:23] [I] Trace averages of 10 runs:
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45706 ms - Host latency: 1.66639 ms (enqueue 0.47675 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45776 ms - Host latency: 1.66772 ms (enqueue 0.466039 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45652 ms - Host latency: 1.66585 ms (enqueue 0.466191 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45803 ms - Host latency: 1.66949 ms (enqueue 0.481042 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4561 ms - Host latency: 1.66429 ms (enqueue 0.469135 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45657 ms - Host latency: 1.66697 ms (enqueue 0.486496 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45768 ms - Host latency: 1.66899 ms (enqueue 0.486575 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45733 ms - Host latency: 1.66782 ms (enqueue 0.488028 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45942 ms - Host latency: 1.67298 ms (enqueue 0.504929 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46137 ms - Host latency: 1.67337 ms (enqueue 0.485886 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45649 ms - Host latency: 1.66755 ms (enqueue 0.487012 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46046 ms - Host latency: 1.67445 ms (enqueue 0.498984 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45723 ms - Host latency: 1.66705 ms (enqueue 0.4802 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45939 ms - Host latency: 1.67005 ms (enqueue 0.485437 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45798 ms - Host latency: 1.66714 ms (enqueue 0.481595 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45874 ms - Host latency: 1.66895 ms (enqueue 0.468042 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45668 ms - Host latency: 1.66455 ms (enqueue 0.471426 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45996 ms - Host latency: 1.66993 ms (enqueue 0.484088 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45968 ms - Host latency: 1.67003 ms (enqueue 0.501218 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46142 ms - Host latency: 1.67396 ms (enqueue 0.496765 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.56406 ms - Host latency: 1.77726 ms (enqueue 0.501459 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.82438 ms - Host latency: 2.04652 ms (enqueue 0.63291 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.59962 ms - Host latency: 1.81921 ms (enqueue 0.531757 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45901 ms - Host latency: 1.67736 ms (enqueue 0.553259 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46077 ms - Host latency: 1.67183 ms (enqueue 0.485278 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45812 ms - Host latency: 1.66689 ms (enqueue 0.490466 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45506 ms - Host latency: 1.66547 ms (enqueue 0.478394 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46006 ms - Host latency: 1.67153 ms (enqueue 0.521082 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45583 ms - Host latency: 1.66489 ms (enqueue 0.495251 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45727 ms - Host latency: 1.66743 ms (enqueue 0.487988 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45734 ms - Host latency: 1.66744 ms (enqueue 0.481171 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45948 ms - Host latency: 1.67023 ms (enqueue 0.483136 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45713 ms - Host latency: 1.66778 ms (enqueue 0.483295 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45564 ms - Host latency: 1.66667 ms (enqueue 0.484235 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45834 ms - Host latency: 1.66791 ms (enqueue 0.485638 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.47556 ms - Host latency: 1.68592 ms (enqueue 0.489648 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.48019 ms - Host latency: 1.69281 ms (enqueue 0.488275 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45974 ms - Host latency: 1.67141 ms (enqueue 0.494995 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4544 ms - Host latency: 1.66859 ms (enqueue 0.594019 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.458 ms - Host latency: 1.67133 ms (enqueue 0.53092 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46442 ms - Host latency: 1.68181 ms (enqueue 0.583917 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46835 ms - Host latency: 1.68158 ms (enqueue 0.499084 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.47944 ms - Host latency: 1.70254 ms (enqueue 0.591467 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45739 ms - Host latency: 1.67037 ms (enqueue 0.489404 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45323 ms - Host latency: 1.66199 ms (enqueue 0.47699 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.54336 ms - Host latency: 1.754 ms (enqueue 0.473907 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.7148 ms - Host latency: 1.93453 ms (enqueue 0.524701 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.5236 ms - Host latency: 1.73427 ms (enqueue 0.498602 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45869 ms - Host latency: 1.66777 ms (enqueue 0.466077 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45506 ms - Host latency: 1.66352 ms (enqueue 0.48208 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45502 ms - Host latency: 1.66366 ms (enqueue 0.47652 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45558 ms - Host latency: 1.66385 ms (enqueue 0.478119 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45875 ms - Host latency: 1.66883 ms (enqueue 0.474854 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45533 ms - Host latency: 1.66528 ms (enqueue 0.478094 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45394 ms - Host latency: 1.66253 ms (enqueue 0.473767 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45709 ms - Host latency: 1.66808 ms (enqueue 0.47254 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45736 ms - Host latency: 1.66793 ms (enqueue 0.480383 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45654 ms - Host latency: 1.66676 ms (enqueue 0.4755 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45703 ms - Host latency: 1.66835 ms (enqueue 0.478296 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.459 ms - Host latency: 1.66917 ms (enqueue 0.4745 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.63291 ms - Host latency: 1.8491 ms (enqueue 0.511633 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45565 ms - Host latency: 1.66605 ms (enqueue 0.487891 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45441 ms - Host latency: 1.66503 ms (enqueue 0.465417 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45896 ms - Host latency: 1.66953 ms (enqueue 0.488782 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45763 ms - Host latency: 1.66766 ms (enqueue 0.471228 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45824 ms - Host latency: 1.66936 ms (enqueue 0.481482 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45768 ms - Host latency: 1.66804 ms (enqueue 0.473047 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45566 ms - Host latency: 1.66555 ms (enqueue 0.477014 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.452 ms - Host latency: 1.66136 ms (enqueue 0.47052 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45592 ms - Host latency: 1.66499 ms (enqueue 0.47876 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45688 ms - Host latency: 1.665 ms (enqueue 0.469421 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45754 ms - Host latency: 1.66586 ms (enqueue 0.474622 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45774 ms - Host latency: 1.66655 ms (enqueue 0.475293 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45361 ms - Host latency: 1.66295 ms (enqueue 0.467932 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45659 ms - Host latency: 1.66539 ms (enqueue 0.477295 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45902 ms - Host latency: 1.66943 ms (enqueue 0.490906 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45443 ms - Host latency: 1.66179 ms (enqueue 0.475793 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45515 ms - Host latency: 1.66475 ms (enqueue 0.478699 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45426 ms - Host latency: 1.66222 ms (enqueue 0.472131 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46008 ms - Host latency: 1.671 ms (enqueue 0.482166 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4579 ms - Host latency: 1.66593 ms (enqueue 0.472522 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45355 ms - Host latency: 1.66353 ms (enqueue 0.46709 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45862 ms - Host latency: 1.66764 ms (enqueue 0.476929 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45439 ms - Host latency: 1.66353 ms (enqueue 0.463464 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45436 ms - Host latency: 1.66407 ms (enqueue 0.474683 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45669 ms - Host latency: 1.66666 ms (enqueue 0.463672 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45632 ms - Host latency: 1.66394 ms (enqueue 0.468921 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45801 ms - Host latency: 1.66885 ms (enqueue 0.524609 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.76981 ms - Host latency: 1.98521 ms (enqueue 0.557227 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.6629 ms - Host latency: 1.87385 ms (enqueue 0.484692 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.56219 ms - Host latency: 1.77168 ms (enqueue 0.477441 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45667 ms - Host latency: 1.66538 ms (enqueue 0.473987 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45847 ms - Host latency: 1.66862 ms (enqueue 0.464929 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45548 ms - Host latency: 1.66418 ms (enqueue 0.463745 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46462 ms - Host latency: 1.67582 ms (enqueue 0.474634 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45648 ms - Host latency: 1.66547 ms (enqueue 0.476489 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45417 ms - Host latency: 1.66174 ms (enqueue 0.472717 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45558 ms - Host latency: 1.6656 ms (enqueue 0.473364 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45618 ms - Host latency: 1.6647 ms (enqueue 0.471826 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45366 ms - Host latency: 1.66173 ms (enqueue 0.470654 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45596 ms - Host latency: 1.6647 ms (enqueue 0.473462 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45854 ms - Host latency: 1.66739 ms (enqueue 0.472644 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45297 ms - Host latency: 1.66151 ms (enqueue 0.464917 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.49756 ms - Host latency: 1.70978 ms (enqueue 0.474036 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45776 ms - Host latency: 1.66554 ms (enqueue 0.475549 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45088 ms - Host latency: 1.6615 ms (enqueue 0.474377 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45724 ms - Host latency: 1.66624 ms (enqueue 0.490613 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45989 ms - Host latency: 1.67108 ms (enqueue 0.484131 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46218 ms - Host latency: 1.67249 ms (enqueue 0.4677 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46472 ms - Host latency: 1.67485 ms (enqueue 0.468408 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45635 ms - Host latency: 1.66803 ms (enqueue 0.476123 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45375 ms - Host latency: 1.6639 ms (enqueue 0.472083 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45615 ms - Host latency: 1.66486 ms (enqueue 0.465405 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45972 ms - Host latency: 1.66921 ms (enqueue 0.481421 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.62207 ms - Host latency: 1.83123 ms (enqueue 0.471436 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.64185 ms - Host latency: 1.86047 ms (enqueue 0.474023 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.49277 ms - Host latency: 1.70242 ms (enqueue 0.495557 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45742 ms - Host latency: 1.66683 ms (enqueue 0.477576 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45446 ms - Host latency: 1.66201 ms (enqueue 0.473682 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4558 ms - Host latency: 1.66407 ms (enqueue 0.468433 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46182 ms - Host latency: 1.67128 ms (enqueue 0.501709 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45786 ms - Host latency: 1.66842 ms (enqueue 0.496069 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45775 ms - Host latency: 1.66913 ms (enqueue 0.472668 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4589 ms - Host latency: 1.67365 ms (enqueue 0.531397 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4564 ms - Host latency: 1.66946 ms (enqueue 0.485535 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45564 ms - Host latency: 1.66764 ms (enqueue 0.498865 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45657 ms - Host latency: 1.66716 ms (enqueue 0.481299 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45889 ms - Host latency: 1.67002 ms (enqueue 0.495337 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4553 ms - Host latency: 1.66724 ms (enqueue 0.496826 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45786 ms - Host latency: 1.67043 ms (enqueue 0.504858 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4604 ms - Host latency: 1.66895 ms (enqueue 0.473828 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45742 ms - Host latency: 1.67043 ms (enqueue 0.504517 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46152 ms - Host latency: 1.67476 ms (enqueue 0.536597 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45701 ms - Host latency: 1.66807 ms (enqueue 0.489917 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45562 ms - Host latency: 1.66465 ms (enqueue 0.47605 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45811 ms - Host latency: 1.67019 ms (enqueue 0.493066 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45627 ms - Host latency: 1.66541 ms (enqueue 0.497485 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45466 ms - Host latency: 1.66682 ms (enqueue 0.471265 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45408 ms - Host latency: 1.66565 ms (enqueue 0.486035 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45825 ms - Host latency: 1.67256 ms (enqueue 0.502588 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46331 ms - Host latency: 1.67866 ms (enqueue 0.529346 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4623 ms - Host latency: 1.67185 ms (enqueue 0.477759 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46909 ms - Host latency: 1.68564 ms (enqueue 0.536157 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46609 ms - Host latency: 1.68064 ms (enqueue 0.513696 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4571 ms - Host latency: 1.67412 ms (enqueue 0.550146 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46013 ms - Host latency: 1.67385 ms (enqueue 0.519824 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45952 ms - Host latency: 1.66926 ms (enqueue 0.493164 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45923 ms - Host latency: 1.67485 ms (enqueue 0.560913 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4573 ms - Host latency: 1.66802 ms (enqueue 0.483008 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45876 ms - Host latency: 1.6666 ms (enqueue 0.475854 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45359 ms - Host latency: 1.66338 ms (enqueue 0.472754 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46372 ms - Host latency: 1.67756 ms (enqueue 0.500537 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45732 ms - Host latency: 1.66978 ms (enqueue 0.499756 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4572 ms - Host latency: 1.66516 ms (enqueue 0.475513 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4585 ms - Host latency: 1.67014 ms (enqueue 0.465063 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45857 ms - Host latency: 1.67244 ms (enqueue 0.546851 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.74458 ms - Host latency: 1.96262 ms (enqueue 0.476416 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.59404 ms - Host latency: 1.80503 ms (enqueue 0.477295 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.54126 ms - Host latency: 1.75359 ms (enqueue 0.475513 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46367 ms - Host latency: 1.6752 ms (enqueue 0.476343 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46001 ms - Host latency: 1.66965 ms (enqueue 0.465259 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45498 ms - Host latency: 1.66638 ms (enqueue 0.510474 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45784 ms - Host latency: 1.66733 ms (enqueue 0.472217 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45601 ms - Host latency: 1.6656 ms (enqueue 0.467407 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45542 ms - Host latency: 1.66416 ms (enqueue 0.461255 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45645 ms - Host latency: 1.66399 ms (enqueue 0.465625 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4575 ms - Host latency: 1.66516 ms (enqueue 0.464331 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45449 ms - Host latency: 1.6626 ms (enqueue 0.473071 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45593 ms - Host latency: 1.6666 ms (enqueue 0.462793 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46064 ms - Host latency: 1.67078 ms (enqueue 0.470996 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45759 ms - Host latency: 1.66782 ms (enqueue 0.513257 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.49551 ms - Host latency: 1.71338 ms (enqueue 0.522656 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45674 ms - Host latency: 1.6655 ms (enqueue 0.47395 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45381 ms - Host latency: 1.66326 ms (enqueue 0.467383 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45659 ms - Host latency: 1.66633 ms (enqueue 0.469873 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45776 ms - Host latency: 1.66899 ms (enqueue 0.473486 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46179 ms - Host latency: 1.67683 ms (enqueue 0.499561 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46445 ms - Host latency: 1.67759 ms (enqueue 0.493604 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45872 ms - Host latency: 1.67036 ms (enqueue 0.489014 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4572 ms - Host latency: 1.66514 ms (enqueue 0.470483 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45098 ms - Host latency: 1.66558 ms (enqueue 0.507397 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.55698 ms - Host latency: 1.76982 ms (enqueue 0.480151 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.68042 ms - Host latency: 1.89053 ms (enqueue 0.478198 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45718 ms - Host latency: 1.66716 ms (enqueue 0.471143 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45596 ms - Host latency: 1.66648 ms (enqueue 0.476001 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45593 ms - Host latency: 1.66328 ms (enqueue 0.482397 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45979 ms - Host latency: 1.66958 ms (enqueue 0.472827 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.4551 ms - Host latency: 1.66274 ms (enqueue 0.472119 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45889 ms - Host latency: 1.66904 ms (enqueue 0.469775 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45627 ms - Host latency: 1.66467 ms (enqueue 0.474683 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45603 ms - Host latency: 1.66785 ms (enqueue 0.511938 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45471 ms - Host latency: 1.66287 ms (enqueue 0.47395 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45632 ms - Host latency: 1.66428 ms (enqueue 0.4646 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45886 ms - Host latency: 1.66816 ms (enqueue 0.479492 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45942 ms - Host latency: 1.6708 ms (enqueue 0.465381 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45959 ms - Host latency: 1.6678 ms (enqueue 0.508154 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.46235 ms - Host latency: 1.6728 ms (enqueue 0.47124 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45552 ms - Host latency: 1.66609 ms (enqueue 0.476514 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45388 ms - Host latency: 1.66428 ms (enqueue 0.478101 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45781 ms - Host latency: 1.6667 ms (enqueue 0.484229 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45544 ms - Host latency: 1.6658 ms (enqueue 0.514893 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45823 ms - Host latency: 1.66675 ms (enqueue 0.478516 ms)
[03/27/2024-11:27:23] [I] Average on 10 runs - GPU latency: 1.45745 ms - Host latency: 1.66667 ms (enqueue 0.487988 ms)
[03/27/2024-11:27:23] [I] 
[03/27/2024-11:27:23] [I] === Performance summary ===
[03/27/2024-11:27:23] [I] Throughput: 677.972 qps
[03/27/2024-11:27:23] [I] Latency: min = 1.64441 ms, max = 3.78528 ms, mean = 1.68401 ms, median = 1.66721 ms, percentile(90%) = 1.68317 ms, percentile(95%) = 1.69019 ms, percentile(99%) = 2.2627 ms
[03/27/2024-11:27:23] [I] Enqueue Time: min = 0.436035 ms, max = 1.42065 ms, mean = 0.487267 ms, median = 0.479507 ms, percentile(90%) = 0.518921 ms, percentile(95%) = 0.548828 ms, percentile(99%) = 0.657715 ms
[03/27/2024-11:27:23] [I] H2D Latency: min = 0.187988 ms, max = 0.260437 ms, mean = 0.194701 ms, median = 0.193359 ms, percentile(90%) = 0.199707 ms, percentile(95%) = 0.203674 ms, percentile(99%) = 0.215332 ms
[03/27/2024-11:27:23] [I] GPU Compute Time: min = 1.43701 ms, max = 3.57422 ms, mean = 1.47317 ms, median = 1.45654 ms, percentile(90%) = 1.47079 ms, percentile(95%) = 1.47504 ms, percentile(99%) = 2.05078 ms
[03/27/2024-11:27:23] [I] D2H Latency: min = 0.00830078 ms, max = 0.0198975 ms, mean = 0.0161383 ms, median = 0.0167847 ms, percentile(90%) = 0.0185547 ms, percentile(95%) = 0.0187988 ms, percentile(99%) = 0.0193481 ms
[03/27/2024-11:27:23] [I] Total Host Walltime: 3.00307 s
[03/27/2024-11:27:23] [I] Total GPU Compute Time: 2.99937 s
[03/27/2024-11:27:23] [W] * GPU compute time is unstable, with coefficient of variance = 8.41969%.
[03/27/2024-[03/27/2024-11:27:23] [I] Explanations of the performance metrics are printed in the verbose logs.
11:27:23] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[03/27/2024-11:27:23] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/bin/trtexec --loadEngine=example.engine --fp16 --useDLACore=1 --allowGPUFallback

case: idleTime=2000

WARNING: --trace=nvmedia is deprecated, use tegra-accelerators in the future.
WARNING: Deprecated --accelerator-trace argument: nvmedia.
Value 'tegra-accelerators' should be used in the future.
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/bin/trtexec --loadEngine=example.engine --plugins=/usr/src/tensorrt/libplugin.so --fp16 --useDLACore=1 --allowGPUFallback --idleTime=2000
[03/27/2024-11:27:30] [I] === Model Options ===
[03/27/2024-11:27:30] [I] Format: *
[03/27/2024-11:27:30] [I] Model: 
[03/27/2024-11:27:30] [I] Output:
[03/27/2024-11:27:30] [I] === Build Options ===
[03/27/2024-11:27:30] [I] Max batch: 1
[03/27/2024-11:27:30] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[03/27/2024-11:27:30] [I] minTiming: 1
[03/27/2024-11:27:30] [I] avgTiming: 8
[03/27/2024-11:27:30] [I] Precision: FP32+FP16
[03/27/2024-11:27:30] [I] LayerPrecisions: 
[03/27/2024-11:27:30] [I] Calibration: 
[03/27/2024-11:27:30] [I] Refit: Disabled
[03/27/2024-11:27:30] [I] Sparsity: Disabled
[03/27/2024-11:27:30] [I] Safe mode: Disabled
[03/27/2024-11:27:30] [I] DirectIO mode: Disabled
[03/27/2024-11:27:30] [I] Restricted mode: Disabled
[03/27/2024-11:27:30] [I] Build only: Disabled
[03/27/2024-11:27:30] [I] Save engine: 
[03/27/2024-11:27:30] [I] Load engine: example.engine
[03/27/2024-11:27:30] [I] Profiling verbosity: 0
[03/27/2024-11:27:30] [I] Tactic sources: Using default tactic sources
[03/27/2024-11:27:30] [I] timingCacheMode: local
[03/27/2024-11:27:30] [I] timingCacheFile: 
[03/27/2024-11:27:30] [I] Heuristic: Disabled
[03/27/2024-11:27:30] [I] Preview Features: Use default preview flags.
[03/27/2024-11:27:30] [I] Input(s)s format: fp32:CHW
[03/27/2024-11:27:30] [I] Output(s)s format: fp32:CHW
[03/27/2024-11:27:30] [I] Input build shapes: model
[03/27/2024-11:27:30] [I] Input calibration shapes: model
[03/27/2024-11:27:30] [I] === System Options ===
[03/27/2024-11:27:30] [I] Device: 0
[03/27/2024-11:27:30] [I] DLACore: 1(With GPU fallback)
[03/27/2024-11:27:30] [I] Plugins: /usr/src/tensorrt/libplugin.so
[03/27/2024-11:27:30] [I] === Inference Options ===
[03/27/2024-11:27:30] [I] Batch: 1
[03/27/2024-11:27:30] [I] Input inference shapes: model
[03/27/2024-11:27:30] [I] Iterations: 10
[03/27/2024-11:27:30] [I] Duration: 3s (+ 200ms warm up)
[03/27/2024-11:27:30] [I] Sleep time: 0ms
[03/27/2024-11:27:30] [I] Idle time: 2000ms
[03/27/2024-11:27:30] [I] Streams: 1
[03/27/2024-11:27:30] [I] ExposeDMA: Disabled
[03/27/2024-11:27:30] [I] Data transfers: Enabled
[03/27/2024-11:27:30] [I] Spin-wait: Disabled
[03/27/2024-11:27:30] [I] Multithreading: Disabled
[03/27/2024-11:27:30] [I] CUDA Graph: Disabled
[03/27/2024-11:27:30] [I] Separate profiling: Disabled
[03/27/2024-11:27:30] [I] Time Deserialize: Disabled
[03/27/2024-11:27:30] [I] Time Refit: Disabled
[03/27/2024-11:27:30] [I] NVTX verbosity: 0
[03/27/2024-11:27:30] [I] Persistent Cache Ratio: 0
[03/27/2024-11:27:30] [I] Inputs:
[03/27/2024-11:27:30] [I] === Reporting Options ===
[03/27/2024-11:27:30] [I] Verbose: Disabled
[03/27/2024-11:27:30] [I] Averages: 10 inferences
[03/27/2024-11:27:30] [I] Percentiles: 90,95,99
[03/27/2024-11:27:30] [I] Dump refittable layers:Disabled
[03/27/2024-11:27:30] [I] Dump output: Disabled
[03/27/2024-11:27:30] [I] Profile: Disabled
[03/27/2024-11:27:30] [I] Export timing to JSON file: 
[03/27/2024-11:27:30] [I] Export output to JSON file: 
[03/27/2024-11:27:30] [I] Export profile to JSON file: 
[03/27/2024-11:27:30] [I] 
[03/27/2024-11:27:30] [I] === Device Information ===
[03/27/2024-11:27:30] [I] Selected Device: Orin
[03/27/2024-11:27:30] [I] Compute Capability: 8.7
[03/27/2024-11:27:30] [I] SMs: 16
[03/27/2024-11:27:30] [I] Compute Clock Rate: 1.3 GHz
[03/27/2024-11:27:30] [I] Device Global Memory: 62800 MiB
[03/27/2024-11:27:30] [I] Shared Memory per SM: 164 KiB
[03/27/2024-11:27:30] [I] Memory Bus Width: 256 bits (ECC disabled)
[03/27/2024-11:27:30] [I] Memory Clock Rate: 1.3 GHz
[03/27/2024-11:27:30] [I] 
[03/27/2024-11:27:30] [I] TensorRT version: 8.5.2
[03/27/2024-11:27:30] [I] Loading supplied plugin library: /usr/src/tensorrt/libplugin.so
[03/27/2024-11:27:31] [I] Engine loaded in 0.00143199 sec.
[03/27/2024-11:27:31] [I] [TRT] Loaded engine size: 1 MiB
[03/27/2024-11:27:32] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +954, GPU +908, now: CPU 1424, GPU 19043 (MiB)
[03/27/2024-11:27:32] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +130, GPU +114, now: CPU 1554, GPU 19157 (MiB)
[03/27/2024-11:27:32] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +1, GPU +0, now: CPU 1, GPU 0 (MiB)
[03/27/2024-11:27:32] [I] Engine deserialized in 1.60375 sec.
[03/27/2024-11:27:32] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1554, GPU 19157 (MiB)
[03/27/2024-11:27:32] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1554, GPU 19157 (MiB)
[03/27/2024-11:27:32] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2, now: CPU 1, GPU 2 (MiB)
[03/27/2024-11:27:32] [I] Setting persistentCacheLimit to 0 bytes.
[03/27/2024-11:27:32] [I] Using random values for input input.1
[03/27/2024-11:27:32] [I] Created input binding for input.1 with dimensions 1x540x960x3
[03/27/2024-11:27:32] [I] Using random values for output weather
[03/27/2024-11:27:32] [I] Created output binding for weather with dimensions 1x5
[03/27/2024-11:27:32] [I] Using random values for output light
[03/27/2024-11:27:32] [I] Created output binding for light with dimensions 1x3
[03/27/2024-11:27:32] [I] Using random values for output scene
[03/27/2024-11:27:32] [I] Created output binding for scene with dimensions 1x3
[03/27/2024-11:27:32] [I] Using random values for output road
[03/27/2024-11:27:32] [I] Created output binding for road with dimensions 1x5
[03/27/2024-11:27:32] [I] Starting inference
[03/27/2024-11:27:51] [I] Warmup completed 101 queries over 200 ms
[03/27/2024-11:27:51] [I] Timing trace has 10 queries over 16.1657 s
[03/27/2024-11:27:51] [I] 
[03/27/2024-11:27:51] [I] === Trace details ===
[03/27/2024-11:27:51] [I] Trace averages of 10 runs:
[03/27/2024-11:27:51] [I] Average on 10 runs - GPU latency: 16.8283 ms - Host latency: 17.0586 ms (enqueue 16.0207 ms)
[03/27/2024-11:27:51] [I] 
[03/27/2024-11:27:51] [I] === Performance summary ===
[03/27/2024-11:27:51] [I] Throughput: 0.618592 qps
[03/27/2024-11:27:51] [I] Latency: min = 1.62723 ms, max = 32.5337 ms, mean = 17.0586 ms, median = 14.0352 ms, percentile(90%) = 32.4321 ms, percentile(95%) = 32.5337 ms, percentile(99%) = 32.5337 ms
[03/27/2024-11:27:51] [I] Enqueue Time: min = 0.465164 ms, max = 31.6631 ms, mean = 16.0207 ms, median = 12.8262 ms, percentile(90%) = 31.5857 ms, percentile(95%) = 31.6631 ms, percentile(99%) = 31.6631 ms
[03/27/2024-11:27:51] [I] H2D Latency: min = 0.192154 ms, max = 0.236328 ms, mean = 0.220796 ms, median = 0.226562 ms, percentile(90%) = 0.228027 ms, percentile(95%) = 0.236328 ms, percentile(99%) = 0.236328 ms
[03/27/2024-11:27:51] [I] GPU Compute Time: min = 1.42679 ms, max = 32.2993 ms, mean = 16.8283 ms, median = 13.7993 ms, percentile(90%) = 32.1978 ms, percentile(95%) = 32.2993 ms, percentile(99%) = 32.2993 ms
[03/27/2024-11:27:51] [I] D2H Latency: min = 0.00828552 ms, max = 0.0159912 ms, mean = 0.00955658 ms, median = 0.00878906 ms, percentile(90%) = 0.00976562 ms, percentile(95%) = 0.0159912 ms, percentile(99%) = 0.0159912 ms
[03/27/2024-11:27:51] [I] Total Host Walltime: 16.1657 s
[03/27/2024-11:27:51] [I] Total GPU Compute Time: 0.168283 s
[03/27/2024-11:27:51] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[03/27/2024-11:27:51] [W]   If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[03/27/2024-11:27:51] [W] * GPU compute time is unstable, with coefficient of variance = 65.855%.
[03/27/2024-11:27:51] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[03/27/2024-11:27:51] [I] Explanations of the performance metrics are printed in the verbose logs.
[03/27/2024-11:27:51] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/bin/trtexec --loadEngine=example.engine --plugins=/usr/src/tensorrt/libplugin.so --fp16 --useDLACore=1 --allowGPUFallback --idleTime=2000

Hi,

Sorry for the late update.
Please find the source code below:

The sleep (idleTime) is called before the synchronization.
So it will longer the DLA submission time.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.