TensorRT model use too much memory on DriveOrin

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.1.10844
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

I deployed a model on cuda11.4-trt8.4.11-int8-drive_orin-gpu platform, but the memory footprint is up to 3.7GB. Even if I cut down the network structure and the output branch, I can’t effectively reduce it. Meanwhile another model with similar size and same input shape has a memory footprint of only 700MB. When I set “tactic_sources” as “CUBLAS”, the memory footprint dropped from 3.7GB to 2.2GB, and this strategy doesn’t work with normal models.
Do you have any idea that why the model uses so much memory and how to reduce it?

Dear @liuhaomin1,
Do you mean the memory consumption is more when building model using trtexec or final engine file?

I mean the model should not take up such memory during the inference on 11.4-trt8.4.11-int8-drive_orin-gpu platform.Even when I cut down the model to a small size, halve the input size and output nothing,the consumption is still high. So there must be something taking up memory that has nothing to do with the model structure.

Dear @liuhaomin1,
Do you have any idea that why the model uses so much memory and how to reduce it?

Does Developer Guide :: NVIDIA Deep Learning TensorRT Documentation helps?


I got another error this time when use this command:

trtexec --onnx=xxx.onnx ......   --useDLACore=1 ....

as u can see from the image. Does this mean my device only have 1 DLACore?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Dear @liuhaomin1,
May I know if you are using DRIVE AGX Orin devkit? I could use useDLACore=1 on my target.

 of host-side overheads or data transfers.
[10/21/2022-10:37:21] [V] Enqueue Time: the host latency to enqueue a query. If this is longer than GPU Compute Time, the GPU may be under-utilized.
[10/21/2022-10:37:21] [V] H2D Latency: the latency for host-to-device data transfers for input tensors of a single query.
[10/21/2022-10:37:21] [V] D2H Latency: the latency for device-to-host data transfers for output tensors of a single query.
[10/21/2022-10:37:21] [V] Latency: the summation of H2D Latency, GPU Compute Time, and D2H Latency. This is the latency to infer a single query.
[10/21/2022-10:37:21] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8412] # ./trtexec --onnx=/tensorrt/data/mnist.onnx --useDLACore=1 --verbose --allowGPUFallback

Please use text messages instead of images in post to be able to search by others in community. Thank you