Hello, I have a few questions regarding RAM usage. We are operating on the jetson Xavier NK w/ Jetpack 4.6
We have 2 projects running models in pytorch, one that produces artifacts that are a dependency for the other. Both are hungry for memory resources, so we are in the process of working through resource management. What I am observing is that after running inference with either model, the gpu memory allocation remain the same. Ie when I create the model and load the weights it takes approx. 500mb of GPU memory. When I run inference, it consumes 1.1G. And after inference it remains at 1.1G until I shut the container down. I had been assuming that I had retained references to tensors somewhere in the code that I needed to dereference, but I wrote a method to check tensors and delete the non-model parameter tensors in memory. Yet the GPU allocation remains. So, I have a couple of questions:
- Is there an explicit way to determine all tensors in memory and weather they are model parameters. I’m currently examining all objects in the garbage collector, determining if they are tensors, then determining if they are of type torch.nn.parameter or type torch.device and deleting if neither.
- Is there perhaps a setting similar to PYTORCH_NO_CUDA_MEMORY_CACHING that I should be defining. My understanding is that this is bad practice in production due to additional latency. But in a resource limited environment where multiple containers need to pass GPU resources back and forth or other method of preventing memory allocated at inference from being retained.
- When running jtop I see that the memory allocated for each process doesn’t add up to the total memory used. Below is an example of what I am seeing from jtop ( i assume ~2G are from OS)
a. Baseline (nvargus-daemon & symbot_server): 30mb mem and 300mb gpu each – Total RAM used 2.7G
b. Baseline + (Project 1 Model Loaded: .7G cpu, .7G gpu) – Total RAM used 5.1G
c. Baseline + (Project 2 Model Loaded: .7G cpu, .9G gpu) – Total RAM used 4.8G
d. Baseline + (Projects 1 & 2 Model Loaded: 1.1G cpu, 1.5G gpu) – Total RAM used 7G
e. Baseline + (Both Loaded and Running inference) – RAM overflow - Running docker stats lists the memory usage to be different, often smaller than the memory listed in jtop. jtop agrees with tegrastats.