Memory Usage Discrepancy with TensorRT 8.6 and 8.2

Description

I am trying to run inference using a SAMNet model (GitHub - yun-liu/FastSaliency: Code for "SAMNet: Stereoscopically Attentive Multi-scale Network for Lightweight Salient Object Detection" and "Lightweight Salient Object Detection via Hierarchical Visual Perception Learning") converted to a trt engine. On my computer with tensorrt 8.6.1 and cuda 12.0 i get a memory usage of 0.4GB from htop and 87MB from nvidia-smi when running my code. When i run the same code on the tx2 NX (with the model converted for the tx2), with tensorrt 8.2.1 and cuda 10.2, i get about 1.6GB of memory used from top.

Is this the expected behavior? I have at most about 1gb available to run my model in production, is there a way to reduce the memory used? This is my first time using tensorrt and cuda, am i doing something wrong in my code?

I am gratefull for any help! Thanks


I did some more digging and found out it is because trtexec buidls the model using cudnn and cublas on the tx2 and not on my computer. Lazy loading is used as well on my computer and not on the tx2 due to the tx2 using cuda 10.2.

I rebuilt the engine with cublas disabled and reduced the used memory to 1.2Gb. ANy other tips to reduce memory?


Environment

TensorRT Version: 8.2.1
GPU Type: NVIDIA Pascalâ„¢ Architecture GPU with 256 CUDA cores on Jetson TX2 NX
CUDA Version: 10.2
Operating System + Version: L4TR32

Relevant Files

cpp file to run the model:
samnetTest.tar.gz (4.7 MB)
onnx model:
samnet.onnx.tar.gz (3.5 MB)

Hi @theo.engels ,
I beileve Jetson TX2 Forum should be able to provide better support on the topic.

Moving it.

Thanks

Hi,

Yes, since we start to support lazy module loading in CUDA 11.8.
So memory usage is expected to decrease in newer CUDA environments.

For the older CUDA version, you can try to run TensorRT without cuDNN to save some memory.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.