HI @SivaRamaKrishnaNV ,
I ran the command line you posted to test on my board and I had no problem. Here is lines printed during conversion from ONNX to TRT file:
./trtexec --onnx=/home/monitor/Documents/multiprocessing/weights/ONNX_models/onnx_model.onnx --saveEngine=/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
&&&& RUNNING TensorRT.trtexec [TensorRT v8602] # ./trtexec --onnx=/home/monitor/Documents/multiprocessing/weights/ONNX_models/onnx_model.onnx --saveEngine=/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
[09/25/2024-10:05:50] [I] === Model Options ===
[09/25/2024-10:05:50] [I] Format: ONNX
[09/25/2024-10:05:50] [I] Model: /home/monitor/Documents/multiprocessing/weights/ONNX_models/onnx_model.onnx
[09/25/2024-10:05:50] [I] Output:
[09/25/2024-10:05:50] [I] === Build Options ===
[09/25/2024-10:05:50] [I] Max batch: explicit batch
[09/25/2024-10:05:50] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[09/25/2024-10:05:50] [I] minTiming: 1
[09/25/2024-10:05:50] [I] avgTiming: 8
[09/25/2024-10:05:50] [I] Precision: FP32+FP16
[09/25/2024-10:05:50] [I] LayerPrecisions:
[09/25/2024-10:05:50] [I] Layer Device Types:
[09/25/2024-10:05:50] [I] Calibration:
[09/25/2024-10:05:50] [I] Refit: Disabled
[09/25/2024-10:05:50] [I] Version Compatible: Disabled
[09/25/2024-10:05:50] [I] ONNX Native InstanceNorm: Disabled
[09/25/2024-10:05:50] [I] TensorRT runtime: full
[09/25/2024-10:05:50] [I] Lean DLL Path:
[09/25/2024-10:05:50] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[09/25/2024-10:05:50] [I] Exclude Lean Runtime: Disabled
[09/25/2024-10:05:50] [I] Sparsity: Disabled
[09/25/2024-10:05:50] [I] Safe mode: Disabled
[09/25/2024-10:05:50] [I] Build DLA standalone loadable: Disabled
[09/25/2024-10:05:50] [I] Allow GPU fallback for DLA: Disabled
[09/25/2024-10:05:50] [I] DirectIO mode: Disabled
[09/25/2024-10:05:50] [I] Restricted mode: Disabled
[09/25/2024-10:05:50] [I] Skip inference: Disabled
[09/25/2024-10:05:50] [I] Save engine: /home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt
[09/25/2024-10:05:50] [I] Load engine:
[09/25/2024-10:05:50] [I] Profiling verbosity: 0
[09/25/2024-10:05:50] [I] Tactic sources: Using default tactic sources
[09/25/2024-10:05:50] [I] timingCacheMode: local
[09/25/2024-10:05:50] [I] timingCacheFile:
[09/25/2024-10:05:50] [I] Heuristic: Disabled
[09/25/2024-10:05:50] [I] Preview Features: Use default preview flags.
[09/25/2024-10:05:50] [I] MaxAuxStreams: -1
[09/25/2024-10:05:50] [I] BuilderOptimizationLevel: -1
[09/25/2024-10:05:50] [I] Input(s): fp16:chw
[09/25/2024-10:05:50] [I] Output(s): fp16:chw
[09/25/2024-10:05:50] [I] Input build shapes: model
[09/25/2024-10:05:50] [I] Input calibration shapes: model
[09/25/2024-10:05:50] [I] === System Options ===
[09/25/2024-10:05:50] [I] Device: 0
[09/25/2024-10:05:50] [I] DLACore:
[09/25/2024-10:05:50] [I] Plugins:
[09/25/2024-10:05:50] [I] setPluginsToSerialize:
[09/25/2024-10:05:50] [I] dynamicPlugins:
[09/25/2024-10:05:50] [I] ignoreParsedPluginLibs: 0
[09/25/2024-10:05:50] [I]
[09/25/2024-10:05:50] [I] === Inference Options ===
[09/25/2024-10:05:50] [I] Batch: Explicit
[09/25/2024-10:05:50] [I] Input inference shapes: model
[09/25/2024-10:05:50] [I] Iterations: 10
[09/25/2024-10:05:50] [I] Duration: 3s (+ 200ms warm up)
[09/25/2024-10:05:50] [I] Sleep time: 0ms
[09/25/2024-10:05:50] [I] Idle time: 0ms
[09/25/2024-10:05:50] [I] Inference Streams: 1
[09/25/2024-10:05:50] [I] ExposeDMA: Disabled
[09/25/2024-10:05:50] [I] Data transfers: Enabled
[09/25/2024-10:05:50] [I] Spin-wait: Disabled
[09/25/2024-10:05:50] [I] Multithreading: Disabled
[09/25/2024-10:05:50] [I] CUDA Graph: Disabled
[09/25/2024-10:05:50] [I] Separate profiling: Disabled
[09/25/2024-10:05:50] [I] Time Deserialize: Disabled
[09/25/2024-10:05:50] [I] Time Refit: Disabled
[09/25/2024-10:05:50] [I] NVTX verbosity: 0
[09/25/2024-10:05:50] [I] Persistent Cache Ratio: 0
[09/25/2024-10:05:50] [I] Inputs:
[09/25/2024-10:05:50] [I] === Reporting Options ===
[09/25/2024-10:05:50] [I] Verbose: Disabled
[09/25/2024-10:05:50] [I] Averages: 10 inferences
[09/25/2024-10:05:50] [I] Percentiles: 90,95,99
[09/25/2024-10:05:50] [I] Dump refittable layers:Disabled
[09/25/2024-10:05:50] [I] Dump output: Disabled
[09/25/2024-10:05:50] [I] Profile: Disabled
[09/25/2024-10:05:50] [I] Export timing to JSON file:
[09/25/2024-10:05:50] [I] Export output to JSON file:
[09/25/2024-10:05:50] [I] Export profile to JSON file:
[09/25/2024-10:05:50] [I]
[09/25/2024-10:05:50] [I] === Device Information ===
[09/25/2024-10:05:50] [I] Selected Device: Orin
[09/25/2024-10:05:50] [I] Compute Capability: 8.7
[09/25/2024-10:05:50] [I] SMs: 8
[09/25/2024-10:05:50] [I] Device Global Memory: 62841 MiB
[09/25/2024-10:05:50] [I] Shared Memory per SM: 164 KiB
[09/25/2024-10:05:50] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/25/2024-10:05:50] [I] Application Compute Clock Rate: 1.3 GHz
[09/25/2024-10:05:50] [I] Application Memory Clock Rate: 0.612 GHz
[09/25/2024-10:05:50] [I]
[09/25/2024-10:05:50] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[09/25/2024-10:05:50] [I]
[09/25/2024-10:05:50] [I] TensorRT version: 8.6.2
[09/25/2024-10:05:50] [I] Loading standard plugins
[09/25/2024-10:05:50] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 33, GPU 5442 (MiB)
[09/25/2024-10:05:55] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1115, now: CPU 1223, GPU 6595 (MiB)
[09/25/2024-10:05:55] [I] Start parsing network model.
[09/25/2024-10:05:55] [I] [TRT] ----------------------------------------------------------------
[09/25/2024-10:05:55] [I] [TRT] Input filename: /home/monitor/Documents/multiprocessing/weights/ONNX_models/onnx_model.onnx
[09/25/2024-10:05:55] [I] [TRT] ONNX IR version: 0.0.4
[09/25/2024-10:05:55] [I] [TRT] Opset version: 9
[09/25/2024-10:05:55] [I] [TRT] Producer name: pytorch
[09/25/2024-10:05:55] [I] [TRT] Producer version: 1.1
[09/25/2024-10:05:55] [I] [TRT] Domain:
[09/25/2024-10:05:55] [I] [TRT] Model version: 0
[09/25/2024-10:05:55] [I] [TRT] Doc string:
[09/25/2024-10:05:55] [I] [TRT] ----------------------------------------------------------------
[09/25/2024-10:05:55] [W] [TRT] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/25/2024-10:05:55] [I] Finished parsing network model. Parse time: 0.0173678
[09/25/2024-10:05:55] [I] [TRT] Graph optimization time: 0.00515499 seconds.
[09/25/2024-10:05:55] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[09/25/2024-10:07:32] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[09/25/2024-10:07:32] [I] [TRT] Total Host Persistent Memory: 90896
[09/25/2024-10:07:32] [I] [TRT] Total Device Persistent Memory: 0
[09/25/2024-10:07:32] [I] [TRT] Total Scratch Memory: 0
[09/25/2024-10:07:32] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 6 MiB, GPU 12 MiB
[09/25/2024-10:07:32] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 31 steps to complete.
[09/25/2024-10:07:32] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.479671ms to assign 3 blocks to 31 nodes requiring 504320 bytes.
[09/25/2024-10:07:32] [I] [TRT] Total Activation Memory: 504320
[09/25/2024-10:07:33] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[09/25/2024-10:07:33] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[09/25/2024-10:07:33] [W] [TRT] Check verbose logs for the list of affected weights.
[09/25/2024-10:07:33] [W] [TRT] - 9 weights are affected by this issue: Detected subnormal FP16 values.
[09/25/2024-10:07:33] [W] [TRT] - 4 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[09/25/2024-10:07:33] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +3, GPU +4, now: CPU 3, GPU 4 (MiB)
[09/25/2024-10:07:33] [I] Engine built in 102.7 sec.
[09/25/2024-10:07:33] [I] [TRT] Loaded engine size: 4 MiB
[09/25/2024-10:07:33] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +3, now: CPU 0, GPU 3 (MiB)
[09/25/2024-10:07:33] [I] Engine deserialized in 0.0207004 sec.
[09/25/2024-10:07:33] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 3 (MiB)
[09/25/2024-10:07:33] [I] Setting persistentCacheLimit to 0 bytes.
[09/25/2024-10:07:33] [I] Using random values for input 0
[09/25/2024-10:07:33] [I] Input binding for 0 with dimensions 1x1x48x48 is created.
[09/25/2024-10:07:33] [I] Output binding for 97 with dimensions 1x7 is created.
[09/25/2024-10:07:33] [I] Starting inference
[09/25/2024-10:07:36] [I] Warmup completed 386 queries over 200 ms
[09/25/2024-10:07:36] [I] Timing trace has 7406 queries over 3.00106 s
[09/25/2024-10:07:36] [I]
[09/25/2024-10:07:36] [I] === Trace details ===
[09/25/2024-10:07:36] [I] Trace averages of 10 runs:
[09/25/2024-10:07:36] [I] Average on 10 runs - GPU latency: 0.393741 ms - Host latency: 0.418539 ms (enqueue 0.209541 ms)
[09/25/2024-10:07:36] [I] Average on 10 runs - GPU latency: 0.389307 ms - Host latency: 0.410782 ms (enqueue 0.204434 ms)
[09/25/2024-10:07:36] [I] Average on 10 runs - GPU latency: 0.3912 ms - Host latency: 0.412581 ms (enqueue 0.20701 ms)
[09/25/2024-10:07:36] [I] Average on 10 runs - GPU latency: 0.390039 ms - Host latency: 0.410907 ms (enqueue 0.210999 ms)
[09/25/2024-10:07:36] [I] Average on 10 runs - GPU latency: 0.388818 ms - Host latency: 0.408838 ms (enqueue 0.178931 ms)
[09/25/2024-10:07:36] [I] Average on 10 runs - GPU latency: 0.393164 ms - Host latency: 0.411987 ms (enqueue 0.180518 ms)
[09/25/2024-10:07:36] [I]
[09/25/2024-10:07:36] [I] === Performance summary ===
[09/25/2024-10:07:36] [I] Throughput: 2467.8 qps
[09/25/2024-10:07:36] [I] Latency: min = 0.397827 ms, max = 4.04852 ms, mean = 0.420603 ms, median = 0.409973 ms, percentile(90%) = 0.417969 ms, percentile(95%) = 0.420898 ms, percentile(99%) = 0.965729 ms
[09/25/2024-10:07:36] [I] Enqueue Time: min = 0.153687 ms, max = 0.939789 ms, mean = 0.186502 ms, median = 0.181824 ms, percentile(90%) = 0.204895 ms, percentile(95%) = 0.217773 ms, percentile(99%) = 0.244507 ms
[09/25/2024-10:07:36] [I] H2D Latency: min = 0.00512695 ms, max = 1.36267 ms, mean = 0.0125914 ms, median = 0.0106201 ms, percentile(90%) = 0.015564 ms, percentile(95%) = 0.0166016 ms, percentile(99%) = 0.0217896 ms
[09/25/2024-10:07:36] [I] GPU Compute Time: min = 0.382202 ms, max = 4.03241 ms, mean = 0.400718 ms, median = 0.391357 ms, percentile(90%) = 0.397217 ms, percentile(95%) = 0.399536 ms, percentile(99%) = 0.942352 ms
[09/25/2024-10:07:36] [I] D2H Latency: min = 0.00485229 ms, max = 0.0106506 ms, mean = 0.00729279 ms, median = 0.00720215 ms, percentile(90%) = 0.00845337 ms, percentile(95%) = 0.00878906 ms, percentile(99%) = 0.00927734 ms
[09/25/2024-10:07:36] [I] Total Host Walltime: 3.00106 s
[09/25/2024-10:07:36] [I] Total GPU Compute Time: 2.96772 s
[09/25/2024-10:07:36] [W] * GPU compute time is unstable, with coefficient of variance = 28.686%.
[09/25/2024-10:07:36] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[09/25/2024-10:07:36] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/25/2024-10:07:36] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8602] # ./trtexec --onnx=/home/monitor/Documents/multiprocessing/weights/ONNX_models/onnx_model.onnx --saveEngine=/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
Then, I checked the .trt model with the command ./trtexec --loadEngine=/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16, I had following lines which look I had no problem on this phase:
&&&& RUNNING TensorRT.trtexec [TensorRT v8602] # ./trtexec --loadEngine=/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
[09/25/2024-10:11:48] [I] === Model Options ===
[09/25/2024-10:11:48] [I] Format: *
[09/25/2024-10:11:48] [I] Model:
[09/25/2024-10:11:48] [I] Output:
[09/25/2024-10:11:48] [I] === Build Options ===
[09/25/2024-10:11:48] [I] Max batch: 1
[09/25/2024-10:11:48] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[09/25/2024-10:11:48] [I] minTiming: 1
[09/25/2024-10:11:48] [I] avgTiming: 8
[09/25/2024-10:11:48] [I] Precision: FP32+FP16
[09/25/2024-10:11:48] [I] LayerPrecisions:
[09/25/2024-10:11:48] [I] Layer Device Types:
[09/25/2024-10:11:48] [I] Calibration:
[09/25/2024-10:11:48] [I] Refit: Disabled
[09/25/2024-10:11:48] [I] Version Compatible: Disabled
[09/25/2024-10:11:48] [I] ONNX Native InstanceNorm: Disabled
[09/25/2024-10:11:48] [I] TensorRT runtime: full
[09/25/2024-10:11:48] [I] Lean DLL Path:
[09/25/2024-10:11:48] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[09/25/2024-10:11:48] [I] Exclude Lean Runtime: Disabled
[09/25/2024-10:11:48] [I] Sparsity: Disabled
[09/25/2024-10:11:48] [I] Safe mode: Disabled
[09/25/2024-10:11:48] [I] Build DLA standalone loadable: Disabled
[09/25/2024-10:11:48] [I] Allow GPU fallback for DLA: Disabled
[09/25/2024-10:11:48] [I] DirectIO mode: Disabled
[09/25/2024-10:11:48] [I] Restricted mode: Disabled
[09/25/2024-10:11:48] [I] Skip inference: Disabled
[09/25/2024-10:11:48] [I] Save engine:
[09/25/2024-10:11:48] [I] Load engine: /home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt
[09/25/2024-10:11:48] [I] Profiling verbosity: 0
[09/25/2024-10:11:48] [I] Tactic sources: Using default tactic sources
[09/25/2024-10:11:48] [I] timingCacheMode: local
[09/25/2024-10:11:48] [I] timingCacheFile:
[09/25/2024-10:11:48] [I] Heuristic: Disabled
[09/25/2024-10:11:48] [I] Preview Features: Use default preview flags.
[09/25/2024-10:11:48] [I] MaxAuxStreams: -1
[09/25/2024-10:11:48] [I] BuilderOptimizationLevel: -1
[09/25/2024-10:11:48] [I] Input(s): fp16:chw
[09/25/2024-10:11:48] [I] Output(s): fp16:chw
[09/25/2024-10:11:48] [I] Input build shapes: model
[09/25/2024-10:11:48] [I] Input calibration shapes: model
[09/25/2024-10:11:48] [I] === System Options ===
[09/25/2024-10:11:48] [I] Device: 0
[09/25/2024-10:11:48] [I] DLACore:
[09/25/2024-10:11:48] [I] Plugins:
[09/25/2024-10:11:48] [I] setPluginsToSerialize:
[09/25/2024-10:11:48] [I] dynamicPlugins:
[09/25/2024-10:11:48] [I] ignoreParsedPluginLibs: 0
[09/25/2024-10:11:48] [I]
[09/25/2024-10:11:48] [I] === Inference Options ===
[09/25/2024-10:11:48] [I] Batch: 1
[09/25/2024-10:11:48] [I] Input inference shapes: model
[09/25/2024-10:11:48] [I] Iterations: 10
[09/25/2024-10:11:48] [I] Duration: 3s (+ 200ms warm up)
[09/25/2024-10:11:48] [I] Sleep time: 0ms
[09/25/2024-10:11:48] [I] Idle time: 0ms
[09/25/2024-10:11:48] [I] Inference Streams: 1
[09/25/2024-10:11:48] [I] ExposeDMA: Disabled
[09/25/2024-10:11:48] [I] Data transfers: Enabled
[09/25/2024-10:11:48] [I] Spin-wait: Disabled
[09/25/2024-10:11:48] [I] Multithreading: Disabled
[09/25/2024-10:11:48] [I] CUDA Graph: Disabled
[09/25/2024-10:11:48] [I] Separate profiling: Disabled
[09/25/2024-10:11:48] [I] Time Deserialize: Disabled
[09/25/2024-10:11:48] [I] Time Refit: Disabled
[09/25/2024-10:11:48] [I] NVTX verbosity: 0
[09/25/2024-10:11:48] [I] Persistent Cache Ratio: 0
[09/25/2024-10:11:48] [I] Inputs:
[09/25/2024-10:11:48] [I] === Reporting Options ===
[09/25/2024-10:11:48] [I] Verbose: Disabled
[09/25/2024-10:11:48] [I] Averages: 10 inferences
[09/25/2024-10:11:48] [I] Percentiles: 90,95,99
[09/25/2024-10:11:48] [I] Dump refittable layers:Disabled
[09/25/2024-10:11:48] [I] Dump output: Disabled
[09/25/2024-10:11:48] [I] Profile: Disabled
[09/25/2024-10:11:48] [I] Export timing to JSON file:
[09/25/2024-10:11:48] [I] Export output to JSON file:
[09/25/2024-10:11:48] [I] Export profile to JSON file:
[09/25/2024-10:11:48] [I]
[09/25/2024-10:11:48] [I] === Device Information ===
[09/25/2024-10:11:48] [I] Selected Device: Orin
[09/25/2024-10:11:48] [I] Compute Capability: 8.7
[09/25/2024-10:11:48] [I] SMs: 8
[09/25/2024-10:11:48] [I] Device Global Memory: 62841 MiB
[09/25/2024-10:11:48] [I] Shared Memory per SM: 164 KiB
[09/25/2024-10:11:48] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/25/2024-10:11:48] [I] Application Compute Clock Rate: 1.3 GHz
[09/25/2024-10:11:48] [I] Application Memory Clock Rate: 0.612 GHz
[09/25/2024-10:11:48] [I]
[09/25/2024-10:11:48] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[09/25/2024-10:11:48] [I]
[09/25/2024-10:11:48] [I] TensorRT version: 8.6.2
[09/25/2024-10:11:48] [I] Loading standard plugins
[09/25/2024-10:11:48] [I] Engine loaded in 0.00542974 sec.
[09/25/2024-10:11:48] [I] [TRT] Loaded engine size: 4 MiB
[09/25/2024-10:11:48] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +3, now: CPU 0, GPU 3 (MiB)
[09/25/2024-10:11:48] [I] Engine deserialized in 0.0467724 sec.
[09/25/2024-10:11:48] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 3 (MiB)
[09/25/2024-10:11:48] [I] Setting persistentCacheLimit to 0 bytes.
[09/25/2024-10:11:48] [I] Using random values for input 0
[09/25/2024-10:11:48] [I] Input binding for 0 with dimensions 1x1x48x48 is created.
[09/25/2024-10:11:48] [I] Output binding for 97 with dimensions 1x7 is created.
[09/25/2024-10:11:48] [I] Starting inference
[09/25/2024-10:11:51] [I] Warmup completed 399 queries over 200 ms
[09/25/2024-10:11:51] [I] Timing trace has 7605 queries over 3.00136 s
[09/25/2024-10:11:51] [I]
[09/25/2024-10:11:51] [I] === Trace details ===
[09/25/2024-10:11:51] [I] Trace averages of 10 runs:
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.387817 ms - Host latency: 0.411005 ms (enqueue 0.214142 ms)
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.388913 ms - Host latency: 0.411009 ms (enqueue 0.206746 ms)
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.386169 ms - Host latency: 0.407826 ms (enqueue 0.204134 ms)
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.389526 ms - Host latency: 0.408374 ms (enqueue 0.173853 ms)
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.385791 ms - Host latency: 0.403564 ms (enqueue 0.175391 ms)
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.389258 ms - Host latency: 0.408618 ms (enqueue 0.178979 ms)
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.387964 ms - Host latency: 0.406104 ms (enqueue 0.172656 ms)
[09/25/2024-10:11:51] [I]
[09/25/2024-10:11:51] [I] === Performance summary ===
[09/25/2024-10:11:51] [I] Throughput: 2533.85 qps
[09/25/2024-10:11:51] [I] Latency: min = 0.393188 ms, max = 4.01794 ms, mean = 0.408715 ms, median = 0.405273 ms, percentile(90%) = 0.411865 ms, percentile(95%) = 0.41394 ms, percentile(99%) = 0.418106 ms
[09/25/2024-10:11:51] [I] Enqueue Time: min = 0.147705 ms, max = 0.466797 ms, mean = 0.181245 ms, median = 0.179443 ms, percentile(90%) = 0.190552 ms, percentile(95%) = 0.198486 ms, percentile(99%) = 0.221649 ms
[09/25/2024-10:11:51] [I] H2D Latency: min = 0.00683594 ms, max = 0.0393066 ms, mean = 0.0116387 ms, median = 0.0107422 ms, percentile(90%) = 0.0150757 ms, percentile(95%) = 0.0159912 ms, percentile(99%) = 0.0204163 ms
[09/25/2024-10:11:51] [I] GPU Compute Time: min = 0.377686 ms, max = 3.99356 ms, mean = 0.389839 ms, median = 0.386475 ms, percentile(90%) = 0.391663 ms, percentile(95%) = 0.393555 ms, percentile(99%) = 0.397217 ms
[09/25/2024-10:11:51] [I] D2H Latency: min = 0.00500488 ms, max = 0.0100098 ms, mean = 0.00723687 ms, median = 0.00708008 ms, percentile(90%) = 0.00854492 ms, percentile(95%) = 0.00878906 ms, percentile(99%) = 0.00915527 ms
[09/25/2024-10:11:51] [I] Total Host Walltime: 3.00136 s
[09/25/2024-10:11:51] [I] Total GPU Compute Time: 2.96473 s
[09/25/2024-10:11:51] [W] * GPU compute time is unstable, with coefficient of variance = 22.4159%.
[09/25/2024-10:11:51] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[09/25/2024-10:11:51] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/25/2024-10:11:51] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8602] # ./trtexec --loadEngine=/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
So I continue with the section 5 of the notebook named “Using PyTorch through ONNX.ipynb” found on the TensorRT github page: GitHub - NVIDIA/TensorRT: NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT. with lines:
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
f = open(“/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt” , “rb”)
runtime = trt.Runtime(trt.Logger(trt.Logger.WARNING))
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()
and I have following errors with command “engine = runtime.deserialize_cuda_engine(f.read())”:
[09/25/2024-10:19:42] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (invalid argument)
[09/25/2024-10:19:42] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (invalid argument)
Segmentation fault (core dumped)
Here is the .trt file i have generated.
Thanks for your help!
TensorRT_model.zip (3.9 MB)