Error loading .trt model

cap_nvidia · September 18, 2024, 12:18pm

HI everyone,
I’m a beginner at tensorRT use.
I successfully convert a .onnx model to .trt model using the example given at the link GitHub - NVIDIA/TensorRT: NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Now, I want to load the . trt file for an inference phase.
But I have following errors at the command line engine = runtime.deserialize_cuda_engine(f.read()) :
[09/18/2024-14:11:42] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (invalid argument)
[09/18/2024-14:11:42] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (invalid argument)
Segmentation fault (core dumped).

I have cuda 12.2 and tensorRT version is 8.6.2.

How can I resolve this issue?

Thanks a lot !

SivaRamaKrishnaNV · September 19, 2024, 5:50am

Dear @cap_nvidia,
Jetpack 6.0 has CUDA 12.2 and TRT 8.6.2. Can you quickly verify the ONNX → TRT conversion and model loading using trtexec ? Is it possible to share the model here or via private message?

cap_nvidia · September 19, 2024, 7:32am

Hi,
Yes, I’m on JetPack 6.0.

I don’t have the access of the .trt file because I work on the Jetson board one day a week but I have the .pt and the .onnx files.

How can I check the ONNX → TRT conversion ?

For convert the .onnx file to .trt, I went to this folder on the jetson "/usr/src/tensorrt/bin/trtexec/” and I wrote the following command line :
trtexec --onnx=onnx_model.onnx --saveEngine=tensorRT.trt --inputIOFormats=fp16:chw --outp

I had no error during the conversion.

I also tried to start with the PyTorch file by converting it to .onnx file and converting it to .trt file with the same command line mentionned above.
Here are the PyTorch file and ONNX file.

Thanks a lot for your answer !
weights_files_pytorch_onnx.zip (11.6 MB)

SivaRamaKrishnaNV · September 24, 2024, 6:29am

Dear @cap_nvidia,

The commands looks incomplete. Are you generating FP32 or Fp16 model?
I don’t see any issue with trtexec when loading the model?

#prepare FP16 model
./trtexec --onnx=/home/nvidia/onnx_model.onnx --saveEngine=/home/nvidia/out.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
#Load model to test with random inputs
./trtexec --loadEngine=/home/nvidia/out.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16

cap_nvidia · September 25, 2024, 8:24am

HI @SivaRamaKrishnaNV ,

I ran the command line you posted to test on my board and I had no problem. Here is lines printed during conversion from ONNX to TRT file:
./trtexec --onnx=/home/monitor/Documents/multiprocessing/weights/ONNX_models/onnx_model.onnx --saveEngine=/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
&&&& RUNNING TensorRT.trtexec [TensorRT v8602] # ./trtexec --onnx=/home/monitor/Documents/multiprocessing/weights/ONNX_models/onnx_model.onnx --saveEngine=/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
[09/25/2024-10:05:50] [I] === Model Options ===
[09/25/2024-10:05:50] [I] Format: ONNX
[09/25/2024-10:05:50] [I] Model: /home/monitor/Documents/multiprocessing/weights/ONNX_models/onnx_model.onnx
[09/25/2024-10:05:50] [I] Output:
[09/25/2024-10:05:50] [I] === Build Options ===
[09/25/2024-10:05:50] [I] Max batch: explicit batch
[09/25/2024-10:05:50] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[09/25/2024-10:05:50] [I] minTiming: 1
[09/25/2024-10:05:50] [I] avgTiming: 8
[09/25/2024-10:05:50] [I] Precision: FP32+FP16
[09/25/2024-10:05:50] [I] LayerPrecisions:
[09/25/2024-10:05:50] [I] Layer Device Types:
[09/25/2024-10:05:50] [I] Calibration:
[09/25/2024-10:05:50] [I] Refit: Disabled
[09/25/2024-10:05:50] [I] Version Compatible: Disabled
[09/25/2024-10:05:50] [I] ONNX Native InstanceNorm: Disabled
[09/25/2024-10:05:50] [I] TensorRT runtime: full
[09/25/2024-10:05:50] [I] Lean DLL Path:
[09/25/2024-10:05:50] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[09/25/2024-10:05:50] [I] Exclude Lean Runtime: Disabled
[09/25/2024-10:05:50] [I] Sparsity: Disabled
[09/25/2024-10:05:50] [I] Safe mode: Disabled
[09/25/2024-10:05:50] [I] Build DLA standalone loadable: Disabled
[09/25/2024-10:05:50] [I] Allow GPU fallback for DLA: Disabled
[09/25/2024-10:05:50] [I] DirectIO mode: Disabled
[09/25/2024-10:05:50] [I] Restricted mode: Disabled
[09/25/2024-10:05:50] [I] Skip inference: Disabled
[09/25/2024-10:05:50] [I] Save engine: /home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt
[09/25/2024-10:05:50] [I] Load engine:
[09/25/2024-10:05:50] [I] Profiling verbosity: 0
[09/25/2024-10:05:50] [I] Tactic sources: Using default tactic sources
[09/25/2024-10:05:50] [I] timingCacheMode: local
[09/25/2024-10:05:50] [I] timingCacheFile:
[09/25/2024-10:05:50] [I] Heuristic: Disabled
[09/25/2024-10:05:50] [I] Preview Features: Use default preview flags.
[09/25/2024-10:05:50] [I] MaxAuxStreams: -1
[09/25/2024-10:05:50] [I] BuilderOptimizationLevel: -1
[09/25/2024-10:05:50] [I] Input(s): fp16:chw
[09/25/2024-10:05:50] [I] Output(s): fp16:chw
[09/25/2024-10:05:50] [I] Input build shapes: model
[09/25/2024-10:05:50] [I] Input calibration shapes: model
[09/25/2024-10:05:50] [I] === System Options ===
[09/25/2024-10:05:50] [I] Device: 0
[09/25/2024-10:05:50] [I] DLACore:
[09/25/2024-10:05:50] [I] Plugins:
[09/25/2024-10:05:50] [I] setPluginsToSerialize:
[09/25/2024-10:05:50] [I] dynamicPlugins:
[09/25/2024-10:05:50] [I] ignoreParsedPluginLibs: 0
[09/25/2024-10:05:50] [I]
[09/25/2024-10:05:50] [I] === Inference Options ===
[09/25/2024-10:05:50] [I] Batch: Explicit
[09/25/2024-10:05:50] [I] Input inference shapes: model
[09/25/2024-10:05:50] [I] Iterations: 10
[09/25/2024-10:05:50] [I] Duration: 3s (+ 200ms warm up)
[09/25/2024-10:05:50] [I] Sleep time: 0ms
[09/25/2024-10:05:50] [I] Idle time: 0ms
[09/25/2024-10:05:50] [I] Inference Streams: 1
[09/25/2024-10:05:50] [I] ExposeDMA: Disabled
[09/25/2024-10:05:50] [I] Data transfers: Enabled
[09/25/2024-10:05:50] [I] Spin-wait: Disabled
[09/25/2024-10:05:50] [I] Multithreading: Disabled
[09/25/2024-10:05:50] [I] CUDA Graph: Disabled
[09/25/2024-10:05:50] [I] Separate profiling: Disabled
[09/25/2024-10:05:50] [I] Time Deserialize: Disabled
[09/25/2024-10:05:50] [I] Time Refit: Disabled
[09/25/2024-10:05:50] [I] NVTX verbosity: 0
[09/25/2024-10:05:50] [I] Persistent Cache Ratio: 0
[09/25/2024-10:05:50] [I] Inputs:
[09/25/2024-10:05:50] [I] === Reporting Options ===
[09/25/2024-10:05:50] [I] Verbose: Disabled
[09/25/2024-10:05:50] [I] Averages: 10 inferences
[09/25/2024-10:05:50] [I] Percentiles: 90,95,99
[09/25/2024-10:05:50] [I] Dump refittable layers:Disabled
[09/25/2024-10:05:50] [I] Dump output: Disabled
[09/25/2024-10:05:50] [I] Profile: Disabled
[09/25/2024-10:05:50] [I] Export timing to JSON file:
[09/25/2024-10:05:50] [I] Export output to JSON file:
[09/25/2024-10:05:50] [I] Export profile to JSON file:
[09/25/2024-10:05:50] [I]
[09/25/2024-10:05:50] [I] === Device Information ===
[09/25/2024-10:05:50] [I] Selected Device: Orin
[09/25/2024-10:05:50] [I] Compute Capability: 8.7
[09/25/2024-10:05:50] [I] SMs: 8
[09/25/2024-10:05:50] [I] Device Global Memory: 62841 MiB
[09/25/2024-10:05:50] [I] Shared Memory per SM: 164 KiB
[09/25/2024-10:05:50] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/25/2024-10:05:50] [I] Application Compute Clock Rate: 1.3 GHz
[09/25/2024-10:05:50] [I] Application Memory Clock Rate: 0.612 GHz
[09/25/2024-10:05:50] [I]
[09/25/2024-10:05:50] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[09/25/2024-10:05:50] [I]
[09/25/2024-10:05:50] [I] TensorRT version: 8.6.2
[09/25/2024-10:05:50] [I] Loading standard plugins
[09/25/2024-10:05:50] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 33, GPU 5442 (MiB)
[09/25/2024-10:05:55] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1115, now: CPU 1223, GPU 6595 (MiB)
[09/25/2024-10:05:55] [I] Start parsing network model.
[09/25/2024-10:05:55] [I] [TRT] ----------------------------------------------------------------
[09/25/2024-10:05:55] [I] [TRT] Input filename: /home/monitor/Documents/multiprocessing/weights/ONNX_models/onnx_model.onnx
[09/25/2024-10:05:55] [I] [TRT] ONNX IR version: 0.0.4
[09/25/2024-10:05:55] [I] [TRT] Opset version: 9
[09/25/2024-10:05:55] [I] [TRT] Producer name: pytorch
[09/25/2024-10:05:55] [I] [TRT] Producer version: 1.1
[09/25/2024-10:05:55] [I] [TRT] Domain:
[09/25/2024-10:05:55] [I] [TRT] Model version: 0
[09/25/2024-10:05:55] [I] [TRT] Doc string:
[09/25/2024-10:05:55] [I] [TRT] ----------------------------------------------------------------
[09/25/2024-10:05:55] [W] [TRT] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/25/2024-10:05:55] [I] Finished parsing network model. Parse time: 0.0173678
[09/25/2024-10:05:55] [I] [TRT] Graph optimization time: 0.00515499 seconds.
[09/25/2024-10:05:55] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[09/25/2024-10:07:32] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[09/25/2024-10:07:32] [I] [TRT] Total Host Persistent Memory: 90896
[09/25/2024-10:07:32] [I] [TRT] Total Device Persistent Memory: 0
[09/25/2024-10:07:32] [I] [TRT] Total Scratch Memory: 0
[09/25/2024-10:07:32] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 6 MiB, GPU 12 MiB
[09/25/2024-10:07:32] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 31 steps to complete.
[09/25/2024-10:07:32] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.479671ms to assign 3 blocks to 31 nodes requiring 504320 bytes.
[09/25/2024-10:07:32] [I] [TRT] Total Activation Memory: 504320
[09/25/2024-10:07:33] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[09/25/2024-10:07:33] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[09/25/2024-10:07:33] [W] [TRT] Check verbose logs for the list of affected weights.
[09/25/2024-10:07:33] [W] [TRT] - 9 weights are affected by this issue: Detected subnormal FP16 values.
[09/25/2024-10:07:33] [W] [TRT] - 4 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[09/25/2024-10:07:33] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +3, GPU +4, now: CPU 3, GPU 4 (MiB)
[09/25/2024-10:07:33] [I] Engine built in 102.7 sec.
[09/25/2024-10:07:33] [I] [TRT] Loaded engine size: 4 MiB
[09/25/2024-10:07:33] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +3, now: CPU 0, GPU 3 (MiB)
[09/25/2024-10:07:33] [I] Engine deserialized in 0.0207004 sec.
[09/25/2024-10:07:33] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 3 (MiB)
[09/25/2024-10:07:33] [I] Setting persistentCacheLimit to 0 bytes.
[09/25/2024-10:07:33] [I] Using random values for input 0
[09/25/2024-10:07:33] [I] Input binding for 0 with dimensions 1x1x48x48 is created.
[09/25/2024-10:07:33] [I] Output binding for 97 with dimensions 1x7 is created.
[09/25/2024-10:07:33] [I] Starting inference
[09/25/2024-10:07:36] [I] Warmup completed 386 queries over 200 ms
[09/25/2024-10:07:36] [I] Timing trace has 7406 queries over 3.00106 s
[09/25/2024-10:07:36] [I]
[09/25/2024-10:07:36] [I] === Trace details ===
[09/25/2024-10:07:36] [I] Trace averages of 10 runs:
[09/25/2024-10:07:36] [I] Average on 10 runs - GPU latency: 0.393741 ms - Host latency: 0.418539 ms (enqueue 0.209541 ms)
[09/25/2024-10:07:36] [I] Average on 10 runs - GPU latency: 0.389307 ms - Host latency: 0.410782 ms (enqueue 0.204434 ms)
[09/25/2024-10:07:36] [I] Average on 10 runs - GPU latency: 0.3912 ms - Host latency: 0.412581 ms (enqueue 0.20701 ms)
[09/25/2024-10:07:36] [I] Average on 10 runs - GPU latency: 0.390039 ms - Host latency: 0.410907 ms (enqueue 0.210999 ms)

[09/25/2024-10:07:36] [I] Average on 10 runs - GPU latency: 0.388818 ms - Host latency: 0.408838 ms (enqueue 0.178931 ms)
[09/25/2024-10:07:36] [I] Average on 10 runs - GPU latency: 0.393164 ms - Host latency: 0.411987 ms (enqueue 0.180518 ms)
[09/25/2024-10:07:36] [I]
[09/25/2024-10:07:36] [I] === Performance summary ===
[09/25/2024-10:07:36] [I] Throughput: 2467.8 qps
[09/25/2024-10:07:36] [I] Latency: min = 0.397827 ms, max = 4.04852 ms, mean = 0.420603 ms, median = 0.409973 ms, percentile(90%) = 0.417969 ms, percentile(95%) = 0.420898 ms, percentile(99%) = 0.965729 ms
[09/25/2024-10:07:36] [I] Enqueue Time: min = 0.153687 ms, max = 0.939789 ms, mean = 0.186502 ms, median = 0.181824 ms, percentile(90%) = 0.204895 ms, percentile(95%) = 0.217773 ms, percentile(99%) = 0.244507 ms
[09/25/2024-10:07:36] [I] H2D Latency: min = 0.00512695 ms, max = 1.36267 ms, mean = 0.0125914 ms, median = 0.0106201 ms, percentile(90%) = 0.015564 ms, percentile(95%) = 0.0166016 ms, percentile(99%) = 0.0217896 ms
[09/25/2024-10:07:36] [I] GPU Compute Time: min = 0.382202 ms, max = 4.03241 ms, mean = 0.400718 ms, median = 0.391357 ms, percentile(90%) = 0.397217 ms, percentile(95%) = 0.399536 ms, percentile(99%) = 0.942352 ms
[09/25/2024-10:07:36] [I] D2H Latency: min = 0.00485229 ms, max = 0.0106506 ms, mean = 0.00729279 ms, median = 0.00720215 ms, percentile(90%) = 0.00845337 ms, percentile(95%) = 0.00878906 ms, percentile(99%) = 0.00927734 ms
[09/25/2024-10:07:36] [I] Total Host Walltime: 3.00106 s
[09/25/2024-10:07:36] [I] Total GPU Compute Time: 2.96772 s
[09/25/2024-10:07:36] [W] * GPU compute time is unstable, with coefficient of variance = 28.686%.
[09/25/2024-10:07:36] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[09/25/2024-10:07:36] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/25/2024-10:07:36] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8602] # ./trtexec --onnx=/home/monitor/Documents/multiprocessing/weights/ONNX_models/onnx_model.onnx --saveEngine=/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16

Then, I checked the .trt model with the command ./trtexec --loadEngine=/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16, I had following lines which look I had no problem on this phase:

&&&& RUNNING TensorRT.trtexec [TensorRT v8602] # ./trtexec --loadEngine=/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
[09/25/2024-10:11:48] [I] === Model Options ===
[09/25/2024-10:11:48] [I] Format: *
[09/25/2024-10:11:48] [I] Model:
[09/25/2024-10:11:48] [I] Output:
[09/25/2024-10:11:48] [I] === Build Options ===
[09/25/2024-10:11:48] [I] Max batch: 1
[09/25/2024-10:11:48] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[09/25/2024-10:11:48] [I] minTiming: 1
[09/25/2024-10:11:48] [I] avgTiming: 8
[09/25/2024-10:11:48] [I] Precision: FP32+FP16
[09/25/2024-10:11:48] [I] LayerPrecisions:
[09/25/2024-10:11:48] [I] Layer Device Types:
[09/25/2024-10:11:48] [I] Calibration:
[09/25/2024-10:11:48] [I] Refit: Disabled
[09/25/2024-10:11:48] [I] Version Compatible: Disabled
[09/25/2024-10:11:48] [I] ONNX Native InstanceNorm: Disabled
[09/25/2024-10:11:48] [I] TensorRT runtime: full
[09/25/2024-10:11:48] [I] Lean DLL Path:
[09/25/2024-10:11:48] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[09/25/2024-10:11:48] [I] Exclude Lean Runtime: Disabled
[09/25/2024-10:11:48] [I] Sparsity: Disabled
[09/25/2024-10:11:48] [I] Safe mode: Disabled
[09/25/2024-10:11:48] [I] Build DLA standalone loadable: Disabled
[09/25/2024-10:11:48] [I] Allow GPU fallback for DLA: Disabled
[09/25/2024-10:11:48] [I] DirectIO mode: Disabled
[09/25/2024-10:11:48] [I] Restricted mode: Disabled
[09/25/2024-10:11:48] [I] Skip inference: Disabled
[09/25/2024-10:11:48] [I] Save engine:
[09/25/2024-10:11:48] [I] Load engine: /home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt
[09/25/2024-10:11:48] [I] Profiling verbosity: 0
[09/25/2024-10:11:48] [I] Tactic sources: Using default tactic sources
[09/25/2024-10:11:48] [I] timingCacheMode: local
[09/25/2024-10:11:48] [I] timingCacheFile:
[09/25/2024-10:11:48] [I] Heuristic: Disabled
[09/25/2024-10:11:48] [I] Preview Features: Use default preview flags.
[09/25/2024-10:11:48] [I] MaxAuxStreams: -1
[09/25/2024-10:11:48] [I] BuilderOptimizationLevel: -1
[09/25/2024-10:11:48] [I] Input(s): fp16:chw
[09/25/2024-10:11:48] [I] Output(s): fp16:chw
[09/25/2024-10:11:48] [I] Input build shapes: model
[09/25/2024-10:11:48] [I] Input calibration shapes: model
[09/25/2024-10:11:48] [I] === System Options ===
[09/25/2024-10:11:48] [I] Device: 0
[09/25/2024-10:11:48] [I] DLACore:
[09/25/2024-10:11:48] [I] Plugins:
[09/25/2024-10:11:48] [I] setPluginsToSerialize:
[09/25/2024-10:11:48] [I] dynamicPlugins:
[09/25/2024-10:11:48] [I] ignoreParsedPluginLibs: 0
[09/25/2024-10:11:48] [I]
[09/25/2024-10:11:48] [I] === Inference Options ===
[09/25/2024-10:11:48] [I] Batch: 1
[09/25/2024-10:11:48] [I] Input inference shapes: model
[09/25/2024-10:11:48] [I] Iterations: 10
[09/25/2024-10:11:48] [I] Duration: 3s (+ 200ms warm up)
[09/25/2024-10:11:48] [I] Sleep time: 0ms
[09/25/2024-10:11:48] [I] Idle time: 0ms
[09/25/2024-10:11:48] [I] Inference Streams: 1
[09/25/2024-10:11:48] [I] ExposeDMA: Disabled
[09/25/2024-10:11:48] [I] Data transfers: Enabled
[09/25/2024-10:11:48] [I] Spin-wait: Disabled
[09/25/2024-10:11:48] [I] Multithreading: Disabled
[09/25/2024-10:11:48] [I] CUDA Graph: Disabled
[09/25/2024-10:11:48] [I] Separate profiling: Disabled
[09/25/2024-10:11:48] [I] Time Deserialize: Disabled
[09/25/2024-10:11:48] [I] Time Refit: Disabled
[09/25/2024-10:11:48] [I] NVTX verbosity: 0
[09/25/2024-10:11:48] [I] Persistent Cache Ratio: 0
[09/25/2024-10:11:48] [I] Inputs:
[09/25/2024-10:11:48] [I] === Reporting Options ===
[09/25/2024-10:11:48] [I] Verbose: Disabled
[09/25/2024-10:11:48] [I] Averages: 10 inferences
[09/25/2024-10:11:48] [I] Percentiles: 90,95,99
[09/25/2024-10:11:48] [I] Dump refittable layers:Disabled
[09/25/2024-10:11:48] [I] Dump output: Disabled
[09/25/2024-10:11:48] [I] Profile: Disabled
[09/25/2024-10:11:48] [I] Export timing to JSON file:
[09/25/2024-10:11:48] [I] Export output to JSON file:
[09/25/2024-10:11:48] [I] Export profile to JSON file:
[09/25/2024-10:11:48] [I]
[09/25/2024-10:11:48] [I] === Device Information ===
[09/25/2024-10:11:48] [I] Selected Device: Orin
[09/25/2024-10:11:48] [I] Compute Capability: 8.7
[09/25/2024-10:11:48] [I] SMs: 8
[09/25/2024-10:11:48] [I] Device Global Memory: 62841 MiB
[09/25/2024-10:11:48] [I] Shared Memory per SM: 164 KiB
[09/25/2024-10:11:48] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/25/2024-10:11:48] [I] Application Compute Clock Rate: 1.3 GHz
[09/25/2024-10:11:48] [I] Application Memory Clock Rate: 0.612 GHz
[09/25/2024-10:11:48] [I]
[09/25/2024-10:11:48] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[09/25/2024-10:11:48] [I]
[09/25/2024-10:11:48] [I] TensorRT version: 8.6.2
[09/25/2024-10:11:48] [I] Loading standard plugins
[09/25/2024-10:11:48] [I] Engine loaded in 0.00542974 sec.
[09/25/2024-10:11:48] [I] [TRT] Loaded engine size: 4 MiB
[09/25/2024-10:11:48] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +3, now: CPU 0, GPU 3 (MiB)
[09/25/2024-10:11:48] [I] Engine deserialized in 0.0467724 sec.
[09/25/2024-10:11:48] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 3 (MiB)
[09/25/2024-10:11:48] [I] Setting persistentCacheLimit to 0 bytes.
[09/25/2024-10:11:48] [I] Using random values for input 0
[09/25/2024-10:11:48] [I] Input binding for 0 with dimensions 1x1x48x48 is created.
[09/25/2024-10:11:48] [I] Output binding for 97 with dimensions 1x7 is created.
[09/25/2024-10:11:48] [I] Starting inference
[09/25/2024-10:11:51] [I] Warmup completed 399 queries over 200 ms
[09/25/2024-10:11:51] [I] Timing trace has 7605 queries over 3.00136 s
[09/25/2024-10:11:51] [I]
[09/25/2024-10:11:51] [I] === Trace details ===
[09/25/2024-10:11:51] [I] Trace averages of 10 runs:
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.387817 ms - Host latency: 0.411005 ms (enqueue 0.214142 ms)
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.388913 ms - Host latency: 0.411009 ms (enqueue 0.206746 ms)
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.386169 ms - Host latency: 0.407826 ms (enqueue 0.204134 ms)

[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.389526 ms - Host latency: 0.408374 ms (enqueue 0.173853 ms)
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.385791 ms - Host latency: 0.403564 ms (enqueue 0.175391 ms)
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.389258 ms - Host latency: 0.408618 ms (enqueue 0.178979 ms)
[09/25/2024-10:11:51] [I] Average on 10 runs - GPU latency: 0.387964 ms - Host latency: 0.406104 ms (enqueue 0.172656 ms)
[09/25/2024-10:11:51] [I]
[09/25/2024-10:11:51] [I] === Performance summary ===
[09/25/2024-10:11:51] [I] Throughput: 2533.85 qps
[09/25/2024-10:11:51] [I] Latency: min = 0.393188 ms, max = 4.01794 ms, mean = 0.408715 ms, median = 0.405273 ms, percentile(90%) = 0.411865 ms, percentile(95%) = 0.41394 ms, percentile(99%) = 0.418106 ms
[09/25/2024-10:11:51] [I] Enqueue Time: min = 0.147705 ms, max = 0.466797 ms, mean = 0.181245 ms, median = 0.179443 ms, percentile(90%) = 0.190552 ms, percentile(95%) = 0.198486 ms, percentile(99%) = 0.221649 ms
[09/25/2024-10:11:51] [I] H2D Latency: min = 0.00683594 ms, max = 0.0393066 ms, mean = 0.0116387 ms, median = 0.0107422 ms, percentile(90%) = 0.0150757 ms, percentile(95%) = 0.0159912 ms, percentile(99%) = 0.0204163 ms
[09/25/2024-10:11:51] [I] GPU Compute Time: min = 0.377686 ms, max = 3.99356 ms, mean = 0.389839 ms, median = 0.386475 ms, percentile(90%) = 0.391663 ms, percentile(95%) = 0.393555 ms, percentile(99%) = 0.397217 ms
[09/25/2024-10:11:51] [I] D2H Latency: min = 0.00500488 ms, max = 0.0100098 ms, mean = 0.00723687 ms, median = 0.00708008 ms, percentile(90%) = 0.00854492 ms, percentile(95%) = 0.00878906 ms, percentile(99%) = 0.00915527 ms
[09/25/2024-10:11:51] [I] Total Host Walltime: 3.00136 s
[09/25/2024-10:11:51] [I] Total GPU Compute Time: 2.96473 s
[09/25/2024-10:11:51] [W] * GPU compute time is unstable, with coefficient of variance = 22.4159%.
[09/25/2024-10:11:51] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[09/25/2024-10:11:51] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/25/2024-10:11:51] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8602] # ./trtexec --loadEngine=/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16

So I continue with the section 5 of the notebook named “Using PyTorch through ONNX.ipynb” found on the TensorRT github page: GitHub - NVIDIA/TensorRT: NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT. with lines:
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

f = open(“/home/monitor/Documents/multiprocessing/weights/TensorRT_models/TensorRT_model.trt” , “rb”)
runtime = trt.Runtime(trt.Logger(trt.Logger.WARNING))

engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()

and I have following errors with command “engine = runtime.deserialize_cuda_engine(f.read())”:
[09/25/2024-10:19:42] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (invalid argument)
[09/25/2024-10:19:42] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (invalid argument)
Segmentation fault (core dumped)

Here is the .trt file i have generated.

Thanks for your help!
TensorRT_model.zip (3.9 MB)

SivaRamaKrishnaNV · October 15, 2024, 7:27am

Dear @cap_nvidia,
Could you share the your python code to test TRT model?

kayccc · October 23, 2024, 6:36am

There is no update from you for a period, assuming this is not an issue anymore.
Hence, we are closing this topic. If need further support, please open a new one.
Thanks

Is this still an issue to support? Any result can be shared?

Topic		Replies	Views
Model onnx trt engine generation process report different results compared between PC and jetson XAVIER NX Jetson Xavier NX tensorrt	19	1039	September 28, 2022
Tensorrt inference with batch > 1 TensorRT	4	1393	October 13, 2022
ConvTranspose + Add Slow TensorRT tensorrt	4	660	July 25, 2023
Tensor RT optimization causes performance downgrade compared to onnx model TensorRT	4	902	January 26, 2022
Issues while converting ONNX to TRT Jetson Nano tensorrt , onnx	9	1282	October 18, 2021
DW_DNN_INVALID_MODEL error for trt model (isPointPillarNet \| NVIDIA NGC) TAO Toolkit tensorrt , driveworks , onnx	6	51	February 12, 2025
ERORR with ONNX2TRT : Unknown embedded device detected Jetson Xavier NX onnx	18	4594	April 27, 2022
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1420	July 12, 2022
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	761	April 30, 2024
tensorRT inference unstable compared onnxruntime TensorRT	4	1331	May 4, 2021

Error loading .trt model

Related topics