Sure, here are my results on 3080. To me, Orin inference should be much better than the one I get here, which is not the case. What do you think could be the reason for that please ?
&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # /home/anis/cv_base/installs/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/trtexec --loadEngine=real_model_trt.engine
[10/26/2023-10:31:47] [I] === Model Options ===
[10/26/2023-10:31:47] [I] Format: *
[10/26/2023-10:31:47] [I] Model:
[10/26/2023-10:31:47] [I] Output:
[10/26/2023-10:31:47] [I] === Build Options ===
[10/26/2023-10:31:47] [I] Max batch: 1
[10/26/2023-10:31:47] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[10/26/2023-10:31:47] [I] minTiming: 1
[10/26/2023-10:31:47] [I] avgTiming: 8
[10/26/2023-10:31:47] [I] Precision: FP32
[10/26/2023-10:31:47] [I] LayerPrecisions:
[10/26/2023-10:31:47] [I] Layer Device Types:
[10/26/2023-10:31:47] [I] Calibration:
[10/26/2023-10:31:47] [I] Refit: Disabled
[10/26/2023-10:31:47] [I] Version Compatible: Disabled
[10/26/2023-10:31:47] [I] TensorRT runtime: full
[10/26/2023-10:31:47] [I] Lean DLL Path:
[10/26/2023-10:31:47] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[10/26/2023-10:31:47] [I] Exclude Lean Runtime: Disabled
[10/26/2023-10:31:47] [I] Sparsity: Disabled
[10/26/2023-10:31:47] [I] Safe mode: Disabled
[10/26/2023-10:31:47] [I] Build DLA standalone loadable: Disabled
[10/26/2023-10:31:47] [I] Allow GPU fallback for DLA: Disabled
[10/26/2023-10:31:47] [I] DirectIO mode: Disabled
[10/26/2023-10:31:47] [I] Restricted mode: Disabled
[10/26/2023-10:31:47] [I] Skip inference: Disabled
[10/26/2023-10:31:47] [I] Save engine:
[10/26/2023-10:31:47] [I] Load engine: real_model_trt.engine
[10/26/2023-10:31:47] [I] Profiling verbosity: 0
[10/26/2023-10:31:47] [I] Tactic sources: Using default tactic sources
[10/26/2023-10:31:47] [I] timingCacheMode: local
[10/26/2023-10:31:47] [I] timingCacheFile:
[10/26/2023-10:31:47] [I] Heuristic: Disabled
[10/26/2023-10:31:47] [I] Preview Features: Use default preview flags.
[10/26/2023-10:31:47] [I] MaxAuxStreams: -1
[10/26/2023-10:31:47] [I] BuilderOptimizationLevel: -1
[10/26/2023-10:31:47] [I] Input(s)s format: fp32:CHW
[10/26/2023-10:31:47] [I] Output(s)s format: fp32:CHW
[10/26/2023-10:31:47] [I] Input build shapes: model
[10/26/2023-10:31:47] [I] Input calibration shapes: model
[10/26/2023-10:31:47] [I] === System Options ===
[10/26/2023-10:31:47] [I] Device: 0
[10/26/2023-10:31:47] [I] DLACore:
[10/26/2023-10:31:47] [I] Plugins:
[10/26/2023-10:31:47] [I] setPluginsToSerialize:
[10/26/2023-10:31:47] [I] dynamicPlugins:
[10/26/2023-10:31:47] [I] ignoreParsedPluginLibs: 0
[10/26/2023-10:31:47] [I]
[10/26/2023-10:31:47] [I] === Inference Options ===
[10/26/2023-10:31:47] [I] Batch: 1
[10/26/2023-10:31:47] [I] Input inference shapes: model
[10/26/2023-10:31:47] [I] Iterations: 10
[10/26/2023-10:31:47] [I] Duration: 3s (+ 200ms warm up)
[10/26/2023-10:31:47] [I] Sleep time: 0ms
[10/26/2023-10:31:47] [I] Idle time: 0ms
[10/26/2023-10:31:47] [I] Inference Streams: 1
[10/26/2023-10:31:47] [I] ExposeDMA: Disabled
[10/26/2023-10:31:47] [I] Data transfers: Enabled
[10/26/2023-10:31:47] [I] Spin-wait: Disabled
[10/26/2023-10:31:47] [I] Multithreading: Disabled
[10/26/2023-10:31:47] [I] CUDA Graph: Disabled
[10/26/2023-10:31:47] [I] Separate profiling: Disabled
[10/26/2023-10:31:47] [I] Time Deserialize: Disabled
[10/26/2023-10:31:47] [I] Time Refit: Disabled
[10/26/2023-10:31:47] [I] NVTX verbosity: 0
[10/26/2023-10:31:47] [I] Persistent Cache Ratio: 0
[10/26/2023-10:31:47] [I] Inputs:
[10/26/2023-10:31:47] [I] === Reporting Options ===
[10/26/2023-10:31:47] [I] Verbose: Disabled
[10/26/2023-10:31:47] [I] Averages: 10 inferences
[10/26/2023-10:31:47] [I] Percentiles: 90,95,99
[10/26/2023-10:31:47] [I] Dump refittable layers:Disabled
[10/26/2023-10:31:47] [I] Dump output: Disabled
[10/26/2023-10:31:47] [I] Profile: Disabled
[10/26/2023-10:31:47] [I] Export timing to JSON file:
[10/26/2023-10:31:47] [I] Export output to JSON file:
[10/26/2023-10:31:47] [I] Export profile to JSON file:
[10/26/2023-10:31:47] [I]
[10/26/2023-10:31:48] [I] === Device Information ===
[10/26/2023-10:31:48] [I] Selected Device: NVIDIA GeForce RTX 3080 Ti Laptop GPU
[10/26/2023-10:31:48] [I] Compute Capability: 8.6
[10/26/2023-10:31:48] [I] SMs: 58
[10/26/2023-10:31:48] [I] Device Global Memory: 16116 MiB
[10/26/2023-10:31:48] [I] Shared Memory per SM: 100 KiB
[10/26/2023-10:31:48] [I] Memory Bus Width: 256 bits (ECC disabled)
[10/26/2023-10:31:48] [I] Application Compute Clock Rate: 1.545 GHz
[10/26/2023-10:31:48] [I] Application Memory Clock Rate: 8.001 GHz
[10/26/2023-10:31:48] [I]
[10/26/2023-10:31:48] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[10/26/2023-10:31:48] [I]
[10/26/2023-10:31:48] [I] TensorRT version: 8.6.1
[10/26/2023-10:31:48] [I] Loading standard plugins
[10/26/2023-10:31:49] [I] Engine loaded in 0.164344 sec.
[10/26/2023-10:31:49] [I] [TRT] Loaded engine size: 192 MiB
[10/26/2023-10:31:49] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +188, now: CPU 0, GPU 188 (MiB)
[10/26/2023-10:31:49] [I] Engine deserialized in 0.412792 sec.
[10/26/2023-10:31:49] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +20, now: CPU 1, GPU 208 (MiB)
[10/26/2023-10:31:49] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See “Lazy Loading” section of CUDA documentation CUDA C Programming Guide
[10/26/2023-10:31:49] [I] Setting persistentCacheLimit to 0 bytes.
[10/26/2023-10:31:49] [I] Using random values for input input_0
[10/26/2023-10:31:49] [I] Input binding for input_0 with dimensions 1x3x384x640 is created.
[10/26/2023-10:31:49] [I] Output binding for output_0 with dimensions 1x5040x6 is created.
[10/26/2023-10:31:49] [I] Starting inference
[10/26/2023-10:31:52] [I] Warmup completed 43 queries over 200 ms
[10/26/2023-10:31:52] [I] Timing trace has 631 queries over 3.01371 s
[10/26/2023-10:31:52] [I]
[10/26/2023-10:31:52] [I] === Trace details ===
[10/26/2023-10:31:52] [I] Trace averages of 10 runs:
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.72258 ms - Host latency: 4.99298 ms (enqueue 1.50741 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.7232 ms - Host latency: 4.99966 ms (enqueue 1.43652 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.72433 ms - Host latency: 4.99382 ms (enqueue 1.54504 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.73324 ms - Host latency: 5.00919 ms (enqueue 1.4942 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.73836 ms - Host latency: 5.01987 ms (enqueue 1.53353 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.73672 ms - Host latency: 5.00632 ms (enqueue 1.4737 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.73579 ms - Host latency: 5.00628 ms (enqueue 1.51228 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.73734 ms - Host latency: 5.00536 ms (enqueue 1.44769 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.73763 ms - Host latency: 5.00671 ms (enqueue 1.52356 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.73887 ms - Host latency: 5.02597 ms (enqueue 1.53574 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.73817 ms - Host latency: 5.00743 ms (enqueue 1.50223 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.73702 ms - Host latency: 5.00533 ms (enqueue 1.52804 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.73755 ms - Host latency: 5.00848 ms (enqueue 1.4395 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.73704 ms - Host latency: 5.00766 ms (enqueue 1.45115 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.8684 ms - Host latency: 5.13648 ms (enqueue 1.56238 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.85784 ms - Host latency: 5.12993 ms (enqueue 1.49645 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.83175 ms - Host latency: 5.10137 ms (enqueue 1.56773 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.8043 ms - Host latency: 5.07369 ms (enqueue 1.52651 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.78649 ms - Host latency: 5.06724 ms (enqueue 1.49972 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76908 ms - Host latency: 5.03754 ms (enqueue 1.56487 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76509 ms - Host latency: 5.04608 ms (enqueue 1.55416 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.75299 ms - Host latency: 5.02311 ms (enqueue 1.39222 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.74756 ms - Host latency: 5.02847 ms (enqueue 1.51088 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.7441 ms - Host latency: 5.02529 ms (enqueue 1.43589 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.75104 ms - Host latency: 5.0443 ms (enqueue 1.48708 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.74368 ms - Host latency: 5.01317 ms (enqueue 1.52688 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.74326 ms - Host latency: 5.01222 ms (enqueue 1.46608 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.74431 ms - Host latency: 5.01556 ms (enqueue 1.54716 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.74412 ms - Host latency: 5.01187 ms (enqueue 1.49382 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.74561 ms - Host latency: 5.01483 ms (enqueue 1.55328 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.7657 ms - Host latency: 5.03463 ms (enqueue 1.48347 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.7522 ms - Host latency: 5.02009 ms (enqueue 1.44824 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.74883 ms - Host latency: 5.02306 ms (enqueue 1.54489 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.78022 ms - Host latency: 5.0602 ms (enqueue 1.39436 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.81821 ms - Host latency: 5.08734 ms (enqueue 1.37657 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.82599 ms - Host latency: 5.10791 ms (enqueue 1.56145 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.82363 ms - Host latency: 5.09242 ms (enqueue 1.55054 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.80361 ms - Host latency: 5.08021 ms (enqueue 1.5431 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.7874 ms - Host latency: 5.05627 ms (enqueue 1.47452 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.78599 ms - Host latency: 5.05696 ms (enqueue 1.54534 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.7699 ms - Host latency: 5.03936 ms (enqueue 1.54199 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.7668 ms - Host latency: 5.03613 ms (enqueue 1.47878 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76887 ms - Host latency: 5.04919 ms (enqueue 1.41929 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76907 ms - Host latency: 5.04937 ms (enqueue 1.52446 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.7698 ms - Host latency: 5.06228 ms (enqueue 1.55574 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76877 ms - Host latency: 5.0469 ms (enqueue 1.42532 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76836 ms - Host latency: 5.04028 ms (enqueue 1.37715 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76436 ms - Host latency: 5.03508 ms (enqueue 1.4637 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76123 ms - Host latency: 5.02927 ms (enqueue 1.60452 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76121 ms - Host latency: 5.03076 ms (enqueue 1.51987 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.77354 ms - Host latency: 5.04363 ms (enqueue 1.52141 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76453 ms - Host latency: 5.03586 ms (enqueue 1.47712 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76694 ms - Host latency: 5.03901 ms (enqueue 1.52253 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76855 ms - Host latency: 5.05759 ms (enqueue 1.38064 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.77751 ms - Host latency: 5.03938 ms (enqueue 1.43896 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.77976 ms - Host latency: 5.04946 ms (enqueue 1.42031 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.78242 ms - Host latency: 5.06536 ms (enqueue 1.52871 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.78118 ms - Host latency: 5.05344 ms (enqueue 1.59441 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.78259 ms - Host latency: 5.05225 ms (enqueue 1.61548 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.78081 ms - Host latency: 5.05215 ms (enqueue 1.43298 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.78037 ms - Host latency: 5.04861 ms (enqueue 1.44202 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.7842 ms - Host latency: 5.06677 ms (enqueue 1.55566 ms)
[10/26/2023-10:31:52] [I] Average on 10 runs - GPU latency: 4.76714 ms - Host latency: 5.03625 ms (enqueue 1.51907 ms)
[10/26/2023-10:31:52] [I]
[10/26/2023-10:31:52] [I] === Performance summary ===
[10/26/2023-10:31:52] [I] Throughput: 209.376 qps
[10/26/2023-10:31:52] [I] Latency: min = 4.98651 ms, max = 5.21228 ms, mean = 5.04054 ms, median = 5.03833 ms, percentile(90%) = 5.09277 ms, percentile(95%) = 5.12927 ms, percentile(99%) = 5.16223 ms
[10/26/2023-10:31:52] [I] Enqueue Time: min = 0.954346 ms, max = 1.86316 ms, mean = 1.49854 ms, median = 1.48975 ms, percentile(90%) = 1.67883 ms, percentile(95%) = 1.71997 ms, percentile(99%) = 1.78662 ms
[10/26/2023-10:31:52] [I] H2D Latency: min = 0.240845 ms, max = 0.38501 ms, mean = 0.258606 ms, median = 0.254639 ms, percentile(90%) = 0.258179 ms, percentile(95%) = 0.269653 ms, percentile(99%) = 0.374023 ms
[10/26/2023-10:31:52] [I] GPU Compute Time: min = 4.71759 ms, max = 4.87933 ms, mean = 4.76711 ms, median = 4.76074 ms, percentile(90%) = 4.81995 ms, percentile(95%) = 4.82825 ms, percentile(99%) = 4.87219 ms
[10/26/2023-10:31:52] [I] D2H Latency: min = 0.012207 ms, max = 0.0400391 ms, mean = 0.0148242 ms, median = 0.0145264 ms, percentile(90%) = 0.0153809 ms, percentile(95%) = 0.015625 ms, percentile(99%) = 0.0290527 ms
[10/26/2023-10:31:52] [I] Total Host Walltime: 3.01371 s
[10/26/2023-10:31:52] [I] Total GPU Compute Time: 3.00805 s
[10/26/2023-10:31:52] [I] Explanations of the performance metrics are printed in the verbose logs.
[10/26/2023-10:31:52] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8601] # /home/anis/cv_base/installs/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/trtexec --loadEngine=real_model_trt.engine