I am trying to figure out the per-process memory usage in xavier. Using the top utility in linux, I can see my process has RES = 2.68g while SHR is 2.07g, but I know I have allocated around 4GB of GPU memory using cudaHostAlloc. I thought there is only one unified memory for xavier and this RES number seems too small. How should I interpret these numbers in the case of xavier
Hi,
Have you checked the output of tegrastats?
It can show you the physical and swap memory status.
$ sudo tegrastats
Thanks.
That is another source of confusion actually, when I start from a clean reboot, I can see the memory usage produce by tegrastats goes from 1ish GB to about 7GB when the process gets started, which makes sense to me. However, after the system has been on for a while, without the process running, tegrastats is showing me 5ish GB of memory usage, upon starting my program, tegrastats is showing a much smaller increase again to 7ish GB. Swap has been 0 all the time. How do I make sense of that? And is resident set size (under /proc/(pid)/stats and top) making sense in the context of jetson xavier SoC?
Hi,
There is a mechanism that the driver will keep the memory a little bit longer to improve the performance of the next allocation.
For a more accurate calculation, you can try cudaMemGetInfo.
A sample can be found in the below topic:
Thanks.
Thank you for the link, now this is providing another datapoint which is different. In a very lightly loaded system which has been on for a long time, I can see the free memory being very low (2-3GB), As I start my process, this number actually increased before it decreased again. I guess the system is doing what you have described there (keep the memory a bit longer and only release it when it needs to, which is when we see this increase in free memory). Our objective is to measure the total amount of physical memory used by our process, and this method seems to be unable to provide that number.
Originally, we thought since there is no separate CPU and GPU memory in Xavier, the RES reported by top command would give us combined number, but that seems to not be the case. From what I can tell /sys/kernel/debug/nvmap/iovmm/clients seem to give a somewhat accurate cuda memory usage number. But I can’t find a way to query the overall memory usage or CPU memory usage.
Breaking down the numbers, I think one part of the GPU memory usage comes from the workarea
for the fft plans which we don’t know how is it allocated/implemented (all cuda memory in our code is allocated through cudaHostAlloc and supposedly pinned), could that end up not in RES?
Hi,
You can free some pre-hold memory by the following command:
$ sudo sysctl -w vm.drop_caches=3
As you know, Jetson’s physical memory is shared by the CPU and GPU.
So the tool is just measured the remaining or occupied physical memory.
Thanks.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.