size of shrare memory between CPU and GPU

feng.baoying · April 18, 2018, 3:31am

Hi Nvidia:
In tegra_multimedia_api/sample/v4l2cuda, we use command ./capture-cuda -d /dev/video0 -u -z to get video data from usb camera and convert the video data with GPU,and the memory we use is the share memory between CPU and GPU, is it right? if it is, how much is the size of share memory between CPU and GPU?
Expected your reply! Thanks a lot!

AastaLLL · April 18, 2018, 5:54am

Hi,

The function for allocating memory is cudaMallocManaged().
The basic idea of unified memory can be found here:
https://devblogs.nvidia.com/unified-memory-cuda-beginners/

In short, there are two buffers located in CPU and GPU but sharing a same pointer.
And the coherence between processes is automatically handled by the CUDA driver.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-unified-memory-programming-hd

For the buffer size, please check file ‘samples/v4l2cuda/capture.cpp’:

size_t size = width * height * 3;
...
cudaMallocManaged (&cuda_out_buffer, size, cudaMemAttachGlobal);

It is width * height * 3.

Thanks.

feng.baoying · April 18, 2018, 6:15am

Hi AastaLLL:
Thank you for your reply!
sorry,I’m not clearly express my question, size_t size = width * height * 3 is the size malloced in this program, I wonder to know the total Unified Memory space size in TX2.
Expected your reply! Thank you!

snarky · April 18, 2018, 6:03pm

The amount of “shared memory” is software controlled. Substantially all of the 8 GB of RAM on the TX2 module “could” be shared memory (minus bits needed by the OS to function.)

feng.baoying · April 19, 2018, 12:53am

Hi snarky:
Thank you for your reply!
May I understand your answer as follow:
There are total 8GB RAM on TX2 module,and this 8GB RAM could be as “shared memory” by function cudaMallocManaged(),
Is it right? but how to understand minus bits needed by the OS to function?
Expected your reply! Thank you!

snarky · April 19, 2018, 6:23pm

It should be obvious that the amount of RAM used by your system processes depends a lot on what you are doing with the system.
You cannot get a good answer without developing an actual solution, disabling system services you don’t need, run system services and applications you do need, and measuring.
For estimation purposes, a lightweight system might need about 200 MB for overhead; a heavy system installation might need more like 1 GB for overhead, but the specifics of your application will vary.

AastaLLL · April 23, 2018, 8:22am

Hi,

For an application in user space, the memory size is similar to the amount of allocation.
But in some case, the page number may be duplicate depends to CPU/GPU interactively access.

Here is a tutorial for your reference.

Thanks.

Topic		Replies	Views
How do I increase the shared GPU memory allocation multiplicator? CUDA Programming and Performance	4	10387	October 12, 2021
Is the available global memory in TX2 less than TX1? Jetson TX2	6	1368	October 18, 2021
The Zero Copy Shared memory mode consumes more CPU resources (jetson Xavier NX) Jetson Xavier NX tensorrt , cuda , cudnn	6	68	January 6, 2025
The memory sharing between cpu and gpu in Jetson TX2 Jetson TX2	6	7160	October 18, 2021
Cuda Memory Usage TX1 Jetson TX1	8	4528	December 16, 2015
Jetson TX2 GPU memory? Jetson TX2	8	18484	October 18, 2021
Memory alloc limited to less than half available RAM Jetson TX1	4	1075	October 18, 2021
CPU operation is very slow on memory allocated by cudaMallocHost Jetson TX2	13	1731	October 18, 2021
memory confusion how big is local/shared/global memory? CUDA Programming and Performance	6	3433	January 20, 2009
Unified Memory on Jetson TK1 Jetson TK1	2	1533	February 15, 2016

size of shrare memory between CPU and GPU

Related topics