Description
I am trying to run inference using a SAMNet model (GitHub - yun-liu/FastSaliency: Code for "SAMNet: Stereoscopically Attentive Multi-scale Network for Lightweight Salient Object Detection" and "Lightweight Salient Object Detection via Hierarchical Visual Perception Learning") converted to a trt engine. On my computer with tensorrt 8.6.1 and cuda 12.0 i get a memory usage of 0.4GB from htop and 87MB from nvidia-smi when running my code. When i run the same code on the tx2 NX (with the model converted for the tx2), with tensorrt 8.2.1 and cuda 10.2, i get about 1.6GB of memory used from top.
Is this the expected behavior? I have at most about 1gb available to run my model in production, is there a way to reduce the memory used? This is my first time using tensorrt and cuda, am i doing something wrong in my code?
I am gratefull for any help! Thanks
I did some more digging and found out it is because trtexec buidls the model using cudnn and cublas on the tx2 and not on my computer. Lazy loading is used as well on my computer and not on the tx2 due to the tx2 using cuda 10.2.
I rebuilt the engine with cublas disabled and reduced the used memory to 1.2Gb. ANy other tips to reduce memory?
Environment
TensorRT Version: 8.2.1
GPU Type: NVIDIA Pascalâ„¢ Architecture GPU with 256 CUDA cores on Jetson TX2 NX
CUDA Version: 10.2
Operating System + Version: L4TR32
Relevant Files
cpp file to run the model:
samnetTest.tar.gz (4.7 MB)
onnx model:
samnet.onnx.tar.gz (3.5 MB)