I am measuring the RAM used during inference using a classification model (Inception_v1), TensorRT and PyCUDA.
I have used a page-locked memory to allocate the tensors.
To get the Memory used, I have used
/proc/meminfo. Also, I am reading the memory used after the predictions have been sent back from GPU to CPU (as you can see in the image).
The TX2 is using around 3.5GB of RAM. I made a plot of it.
- Why is that amount of memory being used?
- Could it be less? Maybe if I had used the Unified Memory?
- Is it related to the amount of RAM used while optimizing the .onnx file with “trtexec”? I used this command:
/usr/src/tensorrt/bin/trtexec --onnx=inception_v1_2016_08_28_frozen.onnx --saveEngine=inception_v1_2016_08_28_fp16.trt --workspace=4096 --fp16