Hi! I use Orin Jetson Nano(8GB ram) to accelerate the inference of ResNet50 of PyTorch, I noticed that the used memory observed by Jtop is 2.5GB, while the GPU Shared RAM just only about 1.4GB. I'm confused as to why the used memory is much higher than GPU Shared RAM. Do you know why this is the case? Additionally, why does ResNet50 use so much memory?
Please refer to the attached Python file.
When inference, some memory is required to load the CUDA/cuDNN/TensorRT binary.
To reduce memory usage, please upgrade your software to JetPack 6 for a newer CUDA.
We introduce a new lazy loading feature from CUDA 11.8.
The feature only load the needed binary and be able to reduce the memory usage.