Hi,
We test this on our Nano with JetPack 4.5.1 and PyTorch v1.8.0 installed from this topic.
RAM 1052/3964MB (lfb 91x4MB) ...
RAM 1052/3964MB (lfb 91x4MB) ...
RAM 1052/3964MB (lfb 91x4MB) ...
RAM 1130/3964MB (lfb 91x4MB) ...
RAM 1240/3964MB (lfb 91x4MB) ...
RAM 1376/3964MB (lfb 91x4MB) ...
RAM 1513/3964MB (lfb 91x4MB) ...
RAM 1665/3964MB (lfb 88x4MB) ...
RAM 1821/3964MB (lfb 80x4MB) ...
RAM 1999/3964MB (lfb 74x4MB) ...
RAM 2146/3964MB (lfb 36x4MB) ...
RAM 2315/3964MB (lfb 55x1MB) ...
RAM 2386/3964MB (lfb 15x1MB) ...
RAM 2386/3964MB (lfb 15x1MB) ...
RAM 2386/3964MB (lfb 15x1MB) ...
The memory increases around 1GiB in our experiment.
Is this similar to your observation?
Usually, the underlying library is loaded when being used.
To create a GPU buffer may trigger some requirements of the CUDA-related library, ex cuDNN.
Thanks.