Memory usage when loading unet for inference on jetson nano

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson
• DeepStream Version 5.1
• JetPack Version (valid for Jetson only) 4.5.1
• TensorRT Version7.1.3.0-1+cuda10.2
• NVIDIA GPU Driver Version (valid for GPU only)
**• Issue Type( questions, new requirements, bugs)**questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi, I am trying to run inference with a custom unet model on my Jetson Nano. I trained it with tlt, then exported and created an engine file on the device. The engine file is around 37 MB. I use the deepstream-segmentation example from deepstream-python-apps to run a deepstream pipeline with this model. When loading the engine, the memory usage goes from 1.5 GB idle to 3.8 GB, so the system is almost freezing. This is happening before the actual inference takes place, during the model loading stage. Now when I try the dstest_segmentation_config_industrial.txt, the memory consumption only goes up to 2.7 GB from the 1.5 GB idle. I checked the .engine file for this config and it is 25 MB. so there are 2 questions from me

  1. Why would a model which weights 38 MB almost cause OOM while a 25 MB one does not?
  2. Why is the memory consumed in the range of GBs while the model size is around 20-30 MB?

would there be any comment on this? currently our project development is stalled because we do not know in what way should we optimize the model to get performance as stated by nvidia benchmarks

Hi,

The memory usage depends on the algorithm used in TensorRT.
More, it also takes some memory to load the TensorRT/cuDNN library(>600MB)

You can create the engine file with different workspace amounts to limit the used algorithm.

/usr/src/tensorrt/bin/trtexec --workspace=1024 ...

Thanks.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.