Drive AGX Orin TensorRT inference failed

0xdeadbeef · August 10, 2023, 3:09am

Please provide the following info (tick the boxes after creating this topic):
Software Version
x DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
x Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.3.10904
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Hi ,

I am trying to do an inference of an a tensorRT model . I am hitting into this error

07-08-2023 16:22:55] CameraClient: Notification received from pipeline index:0 of type: NOTIF_WARN_ICP_FRAME_DROP
[07-08-2023 16:22:55] SIPLDeviceBlockNotificationHandler: Queue timeout
[07-08-2023 16:22:55] CameraClient: Notification received from pipeline index:0 of type: NOTIF_WARN_ICP_FRAME_DROP
[07-08-2023 16:22:55] CameraClient: Notification received from pipeline index:0 of type: NOTIF_WARN_ICP_FRAME_DROP
[07-08-2023 16:22:55] CameraClient: Notification received from pipeline index:0 of type: NOTIF_WARN_ICP_FRAME_DROP
[07-08-2023 16:22:55] CameraClient: Notification received from pipeline index:0 of type: NOTIF_WARN_ICP_FRAME_DROP
[07-08-2023 16:22:55] CameraClient: Notification received from pipeline index:0 of type: NOTIF_WARN_ICP_FRAME_DROP
[07-08-2023 16:22:55] SIPLClient: ImageQueue timeout
[07-08-2023 16:22:55] SIPLClient: ImageQueue timeout
[07-08-2023 16:22:55] CameraClient: Notification received from pipeline index:0 of type: NOTIF_WARN_ICP_FRAME_DROP
[07-08-2023 16:22:55] CameraClient: Notification received from pipeline index:0 of type: NOTIF_WARN_ICP_FRAME_DROP
[07-08-2023 16:22:55] CameraClient: Notification received from pipeline index:0 of type: NOTIF_WARN_ICP_FRAME_DROP
[07-08-2023 16:22:55] CameraClient: Notification received from pipeline index:0 of type: NOTIF_WARN_ICP_FRAME_DROP
[07-08-2023 16:22:55] CameraClient: Notification received from pipeline index:0 of type: NOTIF_WARN_ICP_FRAME_DROP
input1[0] = 0.356
interpretOutput completed
size= 0
[07-08-2023 16:22:55] CameraClient: raw bit type is missing or unexpected in virtual channel info, meta info might be incomplete
[07-08-2023 16:22:55] CameraClient: raw bit type is missing or unexpected in virtual channel info, meta info might be incomplete
[07-08-2023 16:22:55] CameraClient: Notification received from pipeline index:0 of type: NOTIF_WARN_ICP_FRAME_DROP
[07-08-2023 16:22:55] DNN: run infer internal batchsize 1 is not needed after 8.4.10.4.
[07-08-2023 16:22:55] 1: [convBaseRunner.cpp::execute::271] Error Code 1: Cask (Cask convolution execution)
[07-08-2023 16:22:55] Driveworks exception thrown: DW_INTERNAL_ERROR: TensorRT: Inference failed.

terminate called after throwing an instance of ‘std::runtime_error’
what(): [2023-08-07 16:22:55] DW Error DW_INTERNAL_ERROR executing DW function:
dwDNN_inferRaw(m_dnnOutputsDevice, &m_dnnInputDevice, 1U, m_dnn)
at /usr/local/driveworks/samples/src/dnn/sample_object_detector_tracker/main.cpp:421
Aborted

tensorRT model bin is created on the orin target using tensorRT optimization tool. Here is the output for the same

xdeadbeef@tegra-ubuntu:/usr/local/driveworks-5.10/tools/dnn$ sudo ./tensorRT_optimization --modelType=onnx --onnxFile=/home/xdeadbeef/yolov3.onnx --out=/home/xdeadbeef/yolo3_fin_RT.bin
[sudo] password for xdeadbeef:
[08-08-2023 01:25:01] WARNING: ExplicitBatch is enabled by default for ONNX models.
[08-08-2023 01:25:01] WARNING: --batchSize is ignored for explicit batch ONNX models for TRT versions after 6.2.0.3
Use model.batch[BATCH_SIZE].onnx (eg. model.batch8.onnx with ONNX batch size 8) with accompanying batch_sizes in configuration.json instead.
[08-08-2023 01:25:02] DNNGenerator: Initializing TensorRT generation on model /home/xdeadbeef/yolov3.onnx.
[08-08-2023 01:25:03] DNNGenerator: Input “000_net”: 64x3x608x608
[08-08-2023 01:25:03] DNNGenerator: Output “082_convolutional”: 64x255x19x19
[08-08-2023 01:25:03] DNNGenerator: Output “094_convolutional”: 64x255x38x38
[08-08-2023 01:25:03] DNNGenerator: Output “106_convolutional”: 64x255x76x76
[08-08-2023 01:57:30] DNNValidator: Iteration 0: 902.346008 ms.
[08-08-2023 01:57:32] DNNValidator: Iteration 1: 902.481750 ms.
[08-08-2023 01:57:34] DNNValidator: Iteration 2: 902.334473 ms.
[08-08-2023 01:57:35] DNNValidator: Iteration 3: 902.335693 ms.
[08-08-2023 01:57:37] DNNValidator: Iteration 4: 902.326233 ms.
[08-08-2023 01:57:39] DNNValidator: Iteration 5: 902.309937 ms.
[08-08-2023 01:57:41] DNNValidator: Iteration 6: 902.167969 ms.
[08-08-2023 01:57:43] DNNValidator: Iteration 7: 902.109741 ms.
[08-08-2023 01:57:44] DNNValidator: Iteration 8: 902.585754 ms.
[08-08-2023 01:57:46] DNNValidator: Iteration 9: 902.262268 ms.
[08-08-2023 01:57:46] DNNValidator: Average over 10 runs is 902.326050 ms.
[08-08-2023 01:57:47] Releasing Driveworks SDK Context
xdeadbeef@tegra-ubuntu:/usr/local/driveworks-5.10/tools/dnn$

I already looked at the Sample object detector tracker YOLOV3 model error inference - DRIVE AGX Xavier / DRIVE AGX Xavier General - NVIDIA Developer Forums and already implemented the changes suggested by Siva in the solution . But i still see the above error with inference failing.

Please let me know what logs needs to be provided to debug this issue.

SivaRamaKrishnaNV · August 10, 2023, 3:56am

Dear @0xdeadbeef,
Could you share the sample code, ONNX model, input video?

0xdeadbeef · August 10, 2023, 4:01am

Siva,

The sample is a modification on top of sample_object_detector_tracker , the input is a gmsl camera either valeo or entron camera (both have same behavior). Please let me know how i can share the model . here i am not able to share more than 100MB

0xdeadbeef · August 10, 2023, 4:06am

sudo ./sample_object_detector_tracker --camera-group=d --input-type=camera --camera-index=0 --camera-type=V1SIM728S2RU4070HB20 --tensorRT_model=/home/xdeadbeef/yolo3_RT.bin

This is my command.

SivaRamaKrishnaNV · August 10, 2023, 4:10am

Dear @0xdeadbeef,
You can upload source code, model, input video inside google drive(or any shared drive) and share the link here.

0xdeadbeef · August 10, 2023, 4:24am

Here you go

the input video is a camera gmsl feed.

0xdeadbeef · August 10, 2023, 10:24am

Hey Siva, any pointers ??

SivaRamaKrishnaNV · August 10, 2023, 11:17am

Dear @0xdeadbeef,
I could not open the drive. Could you check the permissions?

Can you check generating TRT model and check load engine using TensorRT trtexec tool to see if you hit any issue. From the error, I can see there is some issue TRT engine loading.

0xdeadbeef · August 10, 2023, 1:03pm

Siva, Is there any thing specific that we need to pass to trtexec tool with respect to Orin for the dGPU ? or any other parameters ?

Can you please give me the reference example for orin usage for this tool

SivaRamaKrishnaNV · August 10, 2023, 1:38pm

Dear @0xdeadbeef,
Please check https://github.com/NVIDIA/TensorRT/tree/release/8.6/samples/trtexec for usage of the tool. Note that you can generate TRT model for system(host or target) on which you run trtexec tool.

0xdeadbeef · August 10, 2023, 1:45pm

is there any example for the builder (using cpp) as well ?

0xdeadbeef · August 10, 2023, 4:27pm

hey Siva, now you should be able to access it

0xdeadbeef · August 10, 2023, 9:46pm

Hi Siva,
trtexec seems to help the situation, But we see that there is a considerable frame loss happening just with 1 camera. I believe there could be some issue in the rendering. may be rendering is not keeping up with 30fps. Not sure why. Now you have access to my code can you provide some pointers why this could be happening ?

SivaRamaKrishnaNV · August 11, 2023, 5:54am

Dear @0xdeadbeef,
Just to update you, I could run the model using trtexec on target. I will verify the issue on engine loading issue on DW side and update you.


nvidia@tegra-ubuntu:~/siva$ /usr/src/tensorrt/bin/trtexec --onnx=/home/nvidia/siva/yolov3.onnx --saveEngine=/home/nvidia/siva/yolov3.bin
&&&& RUNNING TensorRT.trtexec [TensorRT v8510] # /usr/src/tensorrt/bin/trtexec --onnx=/home/nvidia/siva/yolov3.onnx --saveEngine=/home/nvidia/siva/yolov3.bin
[08/11/2023-04:54:05] [I] === Model Options ===
[08/11/2023-04:54:05] [I] Format: ONNX
[08/11/2023-04:54:05] [I] Model: /home/nvidia/siva/yolov3.onnx
[08/11/2023-04:54:05] [I] Output:
[08/11/2023-04:54:05] [I] === Build Options ===
[08/11/2023-04:54:05] [I] Max batch: explicit batch
[08/11/2023-04:54:05] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[08/11/2023-04:54:05] [I] minTiming: 1
[08/11/2023-04:54:05] [I] avgTiming: 8
[08/11/2023-04:54:05] [I] Precision: FP32
[08/11/2023-04:54:05] [I] LayerPrecisions:
[08/11/2023-04:54:05] [I] Layer Device Types:
[08/11/2023-04:54:05] [I] Calibration:
[08/11/2023-04:54:05] [I] Refit: Disabled
[08/11/2023-04:54:05] [I] Sparsity: Disabled
[08/11/2023-04:54:05] [I] Safe mode: Disabled
[08/11/2023-04:54:05] [I] DirectIO mode: Disabled
[08/11/2023-04:54:05] [I] Restricted mode: Disabled
[08/11/2023-04:54:05] [I] Build only: Disabled
[08/11/2023-04:54:05] [I] Save engine: /home/nvidia/siva/yolov3.bin
[08/11/2023-04:54:05] [I] Load engine:
[08/11/2023-04:54:05] [I] Profiling verbosity: 0
[08/11/2023-04:54:05] [I] Tactic sources: Using default tactic sources
[08/11/2023-04:54:05] [I] timingCacheMode: local
[08/11/2023-04:54:05] [I] timingCacheFile:
[08/11/2023-04:54:05] [I] Heuristic: Disabled
[08/11/2023-04:54:05] [I] Preview Features: Use default preview flags.
[08/11/2023-04:54:05] [I] Input(s)s format: fp32:CHW
[08/11/2023-04:54:05] [I] Output(s)s format: fp32:CHW
[08/11/2023-04:54:05] [I] Input build shapes: model
[08/11/2023-04:54:05] [I] Input calibration shapes: model
[08/11/2023-04:54:05] [I] === System Options ===
[08/11/2023-04:54:05] [I] Device: 0
[08/11/2023-04:54:05] [I] DLACore:
[08/11/2023-04:54:05] [I] Plugins:
[08/11/2023-04:54:05] [I] === Inference Options ===
[08/11/2023-04:54:05] [I] Batch: Explicit
[08/11/2023-04:54:05] [I] Input inference shapes: model
[08/11/2023-04:54:05] [I] Iterations: 10
[08/11/2023-04:54:05] [I] Duration: 3s (+ 200ms warm up)
[08/11/2023-04:54:05] [I] Sleep time: 0ms
[08/11/2023-04:54:05] [I] Idle time: 0ms
[08/11/2023-04:54:05] [I] Streams: 1
[08/11/2023-04:54:05] [I] ExposeDMA: Disabled
[08/11/2023-04:54:05] [I] Data transfers: Enabled
[08/11/2023-04:54:05] [I] Spin-wait: Disabled
[08/11/2023-04:54:05] [I] Multithreading: Disabled
[08/11/2023-04:54:05] [I] CUDA Graph: Disabled
[08/11/2023-04:54:05] [I] Separate profiling: Disabled
[08/11/2023-04:54:05] [I] Time Deserialize: Disabled
[08/11/2023-04:54:05] [I] Time Refit: Disabled
[08/11/2023-04:54:05] [I] NVTX verbosity: 0
[08/11/2023-04:54:05] [I] Persistent Cache Ratio: 0
[08/11/2023-04:54:05] [I] Inputs:
[08/11/2023-04:54:05] [I] === Reporting Options ===
[08/11/2023-04:54:05] [I] Verbose: Disabled
[08/11/2023-04:54:05] [I] Averages: 10 inferences
[08/11/2023-04:54:05] [I] Percentiles: 90,95,99
[08/11/2023-04:54:05] [I] Dump refittable layers:Disabled
[08/11/2023-04:54:05] [I] Dump output: Disabled
[08/11/2023-04:54:05] [I] Profile: Disabled
[08/11/2023-04:54:05] [I] Export timing to JSON file:
[08/11/2023-04:54:05] [I] Export output to JSON file:
[08/11/2023-04:54:05] [I] Export profile to JSON file:
[08/11/2023-04:54:05] [I]
[08/11/2023-04:54:05] [I] === Device Information ===
[08/11/2023-04:54:05] [I] Selected Device: Orin
[08/11/2023-04:54:05] [I] Compute Capability: 8.7
[08/11/2023-04:54:05] [I] SMs: 16
[08/11/2023-04:54:05] [I] Compute Clock Rate: 1.275 GHz
[08/11/2023-04:54:05] [I] Device Global Memory: 28458 MiB
[08/11/2023-04:54:05] [I] Shared Memory per SM: 164 KiB
[08/11/2023-04:54:05] [I] Memory Bus Width: 128 bits (ECC disabled)
[08/11/2023-04:54:05] [I] Memory Clock Rate: 1.275 GHz
[08/11/2023-04:54:05] [I]
[08/11/2023-04:54:05] [I] TensorRT version: 8.5.10
[08/11/2023-04:54:06] [I] [TRT] [MemUsageChange] Init CUDA: CPU +269, GPU +0, now: CPU 298, GPU 5475 (MiB)
[08/11/2023-04:54:07] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +266, GPU +252, now: CPU 583, GPU 5744 (MiB)
[08/11/2023-04:54:07] [I] Start parsing network model
[08/11/2023-04:54:07] [I] [TRT] ----------------------------------------------------------------
[08/11/2023-04:54:07] [I] [TRT] Input filename:   /home/nvidia/siva/yolov3.onnx
[08/11/2023-04:54:07] [I] [TRT] ONNX IR version:  0.0.8
[08/11/2023-04:54:07] [I] [TRT] Opset version:    17
[08/11/2023-04:54:07] [I] [TRT] Producer name:    NVIDIA TensorRT sample
[08/11/2023-04:54:07] [I] [TRT] Producer version:
[08/11/2023-04:54:07] [I] [TRT] Domain:
[08/11/2023-04:54:07] [I] [TRT] Model version:    0
[08/11/2023-04:54:07] [I] [TRT] Doc string:
[08/11/2023-04:54:07] [I] [TRT] ----------------------------------------------------------------
[08/11/2023-04:54:07] [I] Finish parsing network model
[08/11/2023-04:54:07] [I] [TRT] ---------- Layers Running on DLA ----------
[08/11/2023-04:54:07] [I] [TRT] No layer is running on DLA
[08/11/2023-04:54:07] [I] [TRT] ---------- Layers Running on GPU ----------
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 001_convolutional + 001_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(001_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 002_convolutional + 002_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(002_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 003_convolutional + 003_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(003_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 004_convolutional + 004_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(004_convolutional_lrelu), 005_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 006_convolutional + 006_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(006_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 007_convolutional + 007_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(007_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 008_convolutional + 008_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(008_convolutional_lrelu), 009_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 010_convolutional + 010_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(010_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 011_convolutional + 011_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(011_convolutional_lrelu), 012_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 013_convolutional + 013_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(013_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 014_convolutional + 014_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(014_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 015_convolutional + 015_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(015_convolutional_lrelu), 016_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 017_convolutional + 017_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(017_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 018_convolutional + 018_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(018_convolutional_lrelu), 019_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 020_convolutional + 020_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(020_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 021_convolutional + 021_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(021_convolutional_lrelu), 022_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 023_convolutional + 023_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(023_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 024_convolutional + 024_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(024_convolutional_lrelu), 025_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 026_convolutional + 026_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(026_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 027_convolutional + 027_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(027_convolutional_lrelu), 028_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 029_convolutional + 029_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(029_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 030_convolutional + 030_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(030_convolutional_lrelu), 031_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 032_convolutional + 032_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(032_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 033_convolutional + 033_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(033_convolutional_lrelu), 034_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 035_convolutional + 035_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(035_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 036_convolutional + 036_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(036_convolutional_lrelu), 037_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 038_convolutional + 038_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(038_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 039_convolutional + 039_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(039_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 040_convolutional + 040_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(040_convolutional_lrelu), 041_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 042_convolutional + 042_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(042_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 043_convolutional + 043_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(043_convolutional_lrelu), 044_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 045_convolutional + 045_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(045_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 046_convolutional + 046_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(046_convolutional_lrelu), 047_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 048_convolutional + 048_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(048_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 049_convolutional + 049_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(049_convolutional_lrelu), 050_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 051_convolutional + 051_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(051_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 052_convolutional + 052_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(052_convolutional_lrelu), 053_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 054_convolutional + 054_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(054_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 055_convolutional + 055_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(055_convolutional_lrelu), 056_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 057_convolutional + 057_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(057_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 058_convolutional + 058_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(058_convolutional_lrelu), 059_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 060_convolutional + 060_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(060_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 061_convolutional + 061_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(061_convolutional_lrelu), 062_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 063_convolutional + 063_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(063_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 064_convolutional + 064_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(064_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 065_convolutional + 065_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(065_convolutional_lrelu), 066_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 067_convolutional + 067_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(067_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 068_convolutional + 068_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(068_convolutional_lrelu), 069_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 070_convolutional + 070_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(070_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 071_convolutional + 071_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(071_convolutional_lrelu), 072_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 073_convolutional + 073_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(073_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 074_convolutional + 074_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(PWN(074_convolutional_lrelu), 075_shortcut)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 076_convolutional + 076_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(076_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 077_convolutional + 077_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(077_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 078_convolutional + 078_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(078_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 079_convolutional + 079_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(079_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 080_convolutional + 080_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(080_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 081_convolutional + 081_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(081_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 082_convolutional
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 085_convolutional + 085_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(085_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] RESIZE: 086_upsample
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] COPY: 086_upsample copy
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 088_convolutional + 088_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(088_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 089_convolutional + 089_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(089_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 090_convolutional + 090_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(090_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 091_convolutional + 091_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(091_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 092_convolutional + 092_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(092_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 093_convolutional + 093_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(093_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 094_convolutional
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 097_convolutional + 097_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(097_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] RESIZE: 098_upsample
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] COPY: 098_upsample copy
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 100_convolutional + 100_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(100_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 101_convolutional + 101_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(101_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 102_convolutional + 102_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(102_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 103_convolutional + 103_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(103_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 104_convolutional + 104_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(104_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 105_convolutional + 105_convolutional_bn
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] POINTWISE: PWN(105_convolutional_lrelu)
[08/11/2023-04:54:07] [I] [TRT] [GpuLayer] CONVOLUTION: 106_convolutional
[08/11/2023-04:54:09] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +536, GPU +512, now: CPU 1592, GPU 6701 (MiB)
[08/11/2023-04:54:09] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +77, now: CPU 1675, GPU 6778 (MiB)
[08/11/2023-04:54:09] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.


[08/11/2023-05:41:21] [I] [TRT] Total Activation Memory: 73230150144
[08/11/2023-05:41:21] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[08/11/2023-05:41:21] [I] [TRT] Total Host Persistent Memory: 264672
[08/11/2023-05:41:21] [I] [TRT] Total Device Persistent Memory: 0
[08/11/2023-05:41:21] [I] [TRT] Total Scratch Memory: 0
[08/11/2023-05:41:21] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 252 MiB, GPU 13731 MiB
[08/11/2023-05:41:21] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 237 steps to complete.
[08/11/2023-05:41:21] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 10.8696ms to assign 6 blocks to 237 nodes requiring 7854621184 bytes.
[08/11/2023-05:41:21] [I] [TRT] Total Activation Memory: 7854621184
[08/11/2023-05:41:21] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +234, GPU +256, now: CPU 234, GPU 256 (MiB)
[08/11/2023-05:41:22] [I] Engine built in 2836.78 sec.
[08/11/2023-05:41:22] [I] [TRT] Loaded engine size: 237 MiB
[08/11/2023-05:41:22] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +236, now: CPU 0, GPU 236 (MiB)
[08/11/2023-05:41:22] [I] Engine deserialized in 0.0545005 sec.
[08/11/2023-05:41:24] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +7490, now: CPU 0, GPU 7726 (MiB)
[08/11/2023-05:41:24] [I] Setting persistentCacheLimit to 0 bytes.
[08/11/2023-05:41:24] [I] Using random values for input 000_net
[08/11/2023-05:41:24] [I] Created input binding for 000_net with dimensions 64x3x608x608
[08/11/2023-05:41:24] [I] Using random values for output 082_convolutional
[08/11/2023-05:41:24] [I] Created output binding for 082_convolutional with dimensions 64x255x19x19
[08/11/2023-05:41:24] [I] Using random values for output 094_convolutional
[08/11/2023-05:41:24] [I] Created output binding for 094_convolutional with dimensions 64x255x38x38
[08/11/2023-05:41:24] [I] Using random values for output 106_convolutional
[08/11/2023-05:41:25] [I] Created output binding for 106_convolutional with dimensions 64x255x76x76
[08/11/2023-05:41:25] [I] Starting inference
[08/11/2023-05:41:35] [I] Warmup completed 1 queries over 200 ms
[08/11/2023-05:41:35] [I] Timing trace has 10 queries over 10.7294 s
[08/11/2023-05:41:35] [I]
[08/11/2023-05:41:35] [I] === Trace details ===
[08/11/2023-05:41:35] [I] Trace averages of 10 runs:
[08/11/2023-05:41:35] [I] Average on 10 runs - GPU latency: 974.784 ms - Host latency: 1031.78 ms (enqueue 0.981995 ms)
[08/11/2023-05:41:35] [I]
[08/11/2023-05:41:35] [I] === Performance summary ===
[08/11/2023-05:41:35] [I] Throughput: 0.932016 qps
[08/11/2023-05:41:35] [I] Latency: min = 1012.07 ms, max = 1056.42 ms, mean = 1031.78 ms, median = 1027.12 ms, percentile(90%) = 1055.78 ms, percentile(95%) = 1056.42 ms, percentile(99%) = 1056.42 ms
[08/11/2023-05:41:35] [I] Enqueue Time: min = 0.8136 ms, max = 1.31689 ms, mean = 0.981995 ms, median = 0.953003 ms, percentile(90%) = 1.05591 ms, percentile(95%) = 1.31689 ms, percentile(99%) = 1.31689 ms
[08/11/2023-05:41:35] [I] H2D Latency: min = 17.4687 ms, max = 28.2749 ms, mean = 25.3344 ms, median = 25.6876 ms, percentile(90%) = 28.1725 ms, percentile(95%) = 28.2749 ms, percentile(99%) = 28.2749 ms
[08/11/2023-05:41:35] [I] GPU Compute Time: min = 960.146 ms, max = 993.742 ms, mean = 974.784 ms, median = 970.541 ms, percentile(90%) = 993.602 ms, percentile(95%) = 993.742 ms, percentile(99%) = 993.742 ms
[08/11/2023-05:41:35] [I] D2H Latency: min = 13.7354 ms, max = 37.5093 ms, mean = 31.6631 ms, median = 32.4792 ms, percentile(90%) = 37.4163 ms, percentile(95%) = 37.5093 ms, percentile(99%) = 37.5093 ms
[08/11/2023-05:41:35] [I] Total Host Walltime: 10.7294 s
[08/11/2023-05:41:35] [I] Total GPU Compute Time: 9.74784 s
[08/11/2023-05:41:35] [W] * GPU compute time is unstable, with coefficient of variance = 1.44341%.
[08/11/2023-05:41:35] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[08/11/2023-05:41:35] [I] Explanations of the performance metrics are printed in the verbose logs.
[08/11/2023-05:41:35] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8510] # /usr/src/tensorrt/bin/trtexec --onnx=/home/nvidia/siva/yolov3.onnx --saveEngine=/home/nvidia/siva/yolov3.bin
nvidia@tegra-ubuntu:~/siva$
nvidia@tegra-ubuntu:~/siva$
nvidia@tegra-ubuntu:~/siva$ /usr/src/tensorrt/bin/trtexec --loadEngine=/home/nvidia/siva/yolov3.bin                        &&&& RUNNING TensorRT.trtexec [TensorRT v8510] # /usr/src/tensorrt/bin/trtexec --loadEngine=/home/nvidia/siva/yolov3.bin
[08/11/2023-05:51:03] [I] === Model Options ===
[08/11/2023-05:51:03] [I] Format: *
[08/11/2023-05:51:03] [I] Model:
[08/11/2023-05:51:03] [I] Output:
[08/11/2023-05:51:03] [I] === Build Options ===
[08/11/2023-05:51:03] [I] Max batch: 1
[08/11/2023-05:51:03] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[08/11/2023-05:51:03] [I] minTiming: 1
[08/11/2023-05:51:03] [I] avgTiming: 8
[08/11/2023-05:51:03] [I] Precision: FP32
[08/11/2023-05:51:03] [I] LayerPrecisions:
[08/11/2023-05:51:03] [I] Layer Device Types:
[08/11/2023-05:51:03] [I] Calibration:
[08/11/2023-05:51:03] [I] Refit: Disabled
[08/11/2023-05:51:03] [I] Sparsity: Disabled
[08/11/2023-05:51:03] [I] Safe mode: Disabled
[08/11/2023-05:51:03] [I] DirectIO mode: Disabled
[08/11/2023-05:51:03] [I] Restricted mode: Disabled
[08/11/2023-05:51:03] [I] Build only: Disabled
[08/11/2023-05:51:03] [I] Save engine:
[08/11/2023-05:51:03] [I] Load engine: /home/nvidia/siva/yolov3.bin
[08/11/2023-05:51:03] [I] Profiling verbosity: 0
[08/11/2023-05:51:03] [I] Tactic sources: Using default tactic sources
[08/11/2023-05:51:03] [I] timingCacheMode: local
[08/11/2023-05:51:03] [I] timingCacheFile:
[08/11/2023-05:51:03] [I] Heuristic: Disabled
[08/11/2023-05:51:03] [I] Preview Features: Use default preview flags.
[08/11/2023-05:51:03] [I] Input(s)s format: fp32:CHW
[08/11/2023-05:51:03] [I] Output(s)s format: fp32:CHW
[08/11/2023-05:51:03] [I] Input build shapes: model
[08/11/2023-05:51:03] [I] Input calibration shapes: model
[08/11/2023-05:51:03] [I] === System Options ===
[08/11/2023-05:51:03] [I] Device: 0
[08/11/2023-05:51:03] [I] DLACore:
[08/11/2023-05:51:03] [I] Plugins:
[08/11/2023-05:51:03] [I] === Inference Options ===
[08/11/2023-05:51:03] [I] Batch: 1
[08/11/2023-05:51:03] [I] Input inference shapes: model
[08/11/2023-05:51:03] [I] Iterations: 10
[08/11/2023-05:51:03] [I] Duration: 3s (+ 200ms warm up)
[08/11/2023-05:51:03] [I] Sleep time: 0ms
[08/11/2023-05:51:03] [I] Idle time: 0ms
[08/11/2023-05:51:03] [I] Streams: 1
[08/11/2023-05:51:03] [I] ExposeDMA: Disabled
[08/11/2023-05:51:03] [I] Data transfers: Enabled
[08/11/2023-05:51:03] [I] Spin-wait: Disabled
[08/11/2023-05:51:03] [I] Multithreading: Disabled
[08/11/2023-05:51:03] [I] CUDA Graph: Disabled
[08/11/2023-05:51:03] [I] Separate profiling: Disabled
[08/11/2023-05:51:03] [I] Time Deserialize: Disabled
[08/11/2023-05:51:03] [I] Time Refit: Disabled
[08/11/2023-05:51:03] [I] NVTX verbosity: 0
[08/11/2023-05:51:03] [I] Persistent Cache Ratio: 0
[08/11/2023-05:51:03] [I] Inputs:
[08/11/2023-05:51:03] [I] === Reporting Options ===
[08/11/2023-05:51:03] [I] Verbose: Disabled
[08/11/2023-05:51:03] [I] Averages: 10 inferences
[08/11/2023-05:51:03] [I] Percentiles: 90,95,99
[08/11/2023-05:51:03] [I] Dump refittable layers:Disabled
[08/11/2023-05:51:03] [I] Dump output: Disabled
[08/11/2023-05:51:03] [I] Profile: Disabled
[08/11/2023-05:51:03] [I] Export timing to JSON file:
[08/11/2023-05:51:03] [I] Export output to JSON file:
[08/11/2023-05:51:03] [I] Export profile to JSON file:
[08/11/2023-05:51:03] [I]
[08/11/2023-05:51:03] [I] === Device Information ===
[08/11/2023-05:51:03] [I] Selected Device: Orin
[08/11/2023-05:51:03] [I] Compute Capability: 8.7
[08/11/2023-05:51:03] [I] SMs: 16
[08/11/2023-05:51:03] [I] Compute Clock Rate: 1.275 GHz
[08/11/2023-05:51:03] [I] Device Global Memory: 28458 MiB
[08/11/2023-05:51:03] [I] Shared Memory per SM: 164 KiB
[08/11/2023-05:51:03] [I] Memory Bus Width: 128 bits (ECC disabled)
[08/11/2023-05:51:03] [I] Memory Clock Rate: 1.275 GHz
[08/11/2023-05:51:03] [I]
[08/11/2023-05:51:03] [I] TensorRT version: 8.5.10
[08/11/2023-05:51:03] [I] Engine loaded in 0.258738 sec.
[08/11/2023-05:51:03] [I] [TRT] Loaded engine size: 237 MiB
[08/11/2023-05:51:03] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +236, now: CPU 0, GPU 236 (MiB)
[08/11/2023-05:51:03] [I] Engine deserialized in 0.541286 sec.
[08/11/2023-05:51:05] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +7490, now: CPU 0, GPU 7726 (MiB)
[08/11/2023-05:51:05] [I] Setting persistentCacheLimit to 0 bytes.
[08/11/2023-05:51:05] [I] Using random values for input 000_net
[08/11/2023-05:51:06] [I] Created input binding for 000_net with dimensions 64x3x608x608
[08/11/2023-05:51:06] [I] Using random values for output 082_convolutional
[08/11/2023-05:51:06] [I] Created output binding for 082_convolutional with dimensions 64x255x19x19
[08/11/2023-05:51:06] [I] Using random values for output 094_convolutional
[08/11/2023-05:51:06] [I] Created output binding for 094_convolutional with dimensions 64x255x38x38
[08/11/2023-05:51:06] [I] Using random values for output 106_convolutional
[08/11/2023-05:51:06] [I] Created output binding for 106_convolutional with dimensions 64x255x76x76
[08/11/2023-05:51:06] [I] Starting inference
[08/11/2023-05:51:16] [I] Warmup completed 1 queries over 200 ms
[08/11/2023-05:51:16] [I] Timing trace has 10 queries over 10.3387 s
[08/11/2023-05:51:16] [I]
[08/11/2023-05:51:16] [I] === Trace details ===
[08/11/2023-05:51:16] [I] Trace averages of 10 runs:
[08/11/2023-05:51:16] [I] Average on 10 runs - GPU latency: 941.794 ms - Host latency: 1000.48 ms (enqueue 0.871793 ms)
[08/11/2023-05:51:16] [I]
[08/11/2023-05:51:16] [I] === Performance summary ===
[08/11/2023-05:51:16] [I] Throughput: 0.967236 qps
[08/11/2023-05:51:16] [I] Latency: min = 963.053 ms, max = 1066.47 ms, mean = 1000.48 ms, median = 977.212 ms, percentile(90%) = 1065.4 ms, percentile(95%) = 1066.47 ms, percentile(99%) = 1066.47 ms
[08/11/2023-05:51:16] [I] Enqueue Time: min = 0.766602 ms, max = 1.02936 ms, mean = 0.871793 ms, median = 0.848633 ms, percentile(90%) = 0.984832 ms, percentile(95%) = 1.02936 ms, percentile(99%) = 1.02936 ms
[08/11/2023-05:51:16] [I] H2D Latency: min = 16.2759 ms, max = 28.3885 ms, mean = 24.8341 ms, median = 24.6426 ms, percentile(90%) = 28.3569 ms, percentile(95%) = 28.3885 ms, percentile(99%) = 28.3885 ms
[08/11/2023-05:51:16] [I] GPU Compute Time: min = 914.301 ms, max = 1000.3 ms, mean = 941.794 ms, median = 919.586 ms, percentile(90%) = 999.095 ms, percentile(95%) = 1000.3 ms, percentile(99%) = 1000.3 ms
[08/11/2023-05:51:16] [I] D2H Latency: min = 13.7393 ms, max = 42.5956 ms, mean = 33.8542 ms, median = 33.4032 ms, percentile(90%) = 41.625 ms, percentile(95%) = 42.5956 ms, percentile(99%) = 42.5956 ms
[08/11/2023-05:51:16] [I] Total Host Walltime: 10.3387 s
[08/11/2023-05:51:16] [I] Total GPU Compute Time: 9.41794 s
[08/11/2023-05:51:16] [W] * GPU compute time is unstable, with coefficient of variance = 4.00771%.
[08/11/2023-05:51:16] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[08/11/2023-05:51:16] [I] Explanations of the performance metrics are printed in the verbose logs.
[08/11/2023-05:51:16] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8510] # /usr/src/tensorrt/bin/trtexec --loadEngine=/home/nvidia/siva/yolov3.bin

0xdeadbeef · August 11, 2023, 1:08pm

Hi Siva, Yes trtexec is helping to move things further , but the significant frame drop is not really helping .

were you able to run a camera set up with this ?

SivaRamaKrishnaNV · August 11, 2023, 2:36pm

Dear @0xdeadbeef,
I don’t have camera connected to target for testing.
Firstly, are you able to run DW camera sample with live camera? That should help to know if any issue with camera module.

0xdeadbeef · August 11, 2023, 3:39pm

hey Siva,
yes i am able to get the sample_camera working with this camera. with single camera we are able to get 30FPS for this same camera

SivaRamaKrishnaNV · August 14, 2023, 5:15am

Dear @0xdeadbeef,
Did you check with any recorded video as input to your sample? This should confirm if the model is working.

0xdeadbeef · August 14, 2023, 10:25am

Hey Siva,

Model working good on docker with usb camera perfectly fine .

0xdeadbeef · August 14, 2023, 3:51pm

Hi Siva,

Can you check internally what is the difference between using tensorRT_optimization tool vs trtexec ?
Is there any builder script available for Orin target to convert Onnx to TensorRT bin.

The ONNX model in this case , is failing when used with tensorRT_optimization tool and it works partial with trtexc( with significant frameloss) so want to understand what is the real difference between the two.

Topic		Replies	Views
Error in executing TensorRT samples through docker container environment DRIVE AGX Orin General docker , driveos-dl	14	338	October 24, 2024
Can't run nvcr.io/nvidia/l4t-tensorrt:r8.2.1-runtime on Orin AGX Jetson AGX Orin tensorrt	19	1405	May 13, 2022
Modified sample object detector tracker DRIVE AGX Orin General driveos-dl	7	541	March 21, 2024
Convert onnx model using trtexec in DRIVE OS DRIVE AGX Orin General driveos-dl	8	195	September 4, 2024
Inference slow even using TensorRT Jetson AGX Orin tensorrt	15	2464	November 6, 2023
Convert onnx model using trtexec in DRIVE OS DRIVE AGX Orin General driveos-dl	2	110	August 6, 2024
Orin trtexec results in Segmentation Fault Jetson AGX Orin tensorrt , onnx	10	1914	March 3, 2023
Trtexec model conversion crashed at insufficient gpu memory Jetson Orin NX jetson-inference	27	6153	January 11, 2023
Using custom TensorRT model on sample applications DRIVE AGX Xavier General driveworks-dnn-framework	4	1302	March 22, 2022
How to infer the .trt/.engine model on Drive AGX Orin through cross-compile? DRIVE AGX Orin General driveos-dl	4	571	March 7, 2024

Drive AGX Orin TensorRT inference failed

Related topics