Using CUDA Unified memory on embedded board (psychical unified memory)

grynet · July 12, 2016, 6:26pm

Hi All,

– If it is asked before, I am sorry I could not catch that topic –

I have a jetson tk-1 board that has physical unified memory that is shared by cpu and gpu. I have two question about using CUDA unified mem on this board.

If I want to write different part of same big data array by gpu and cpu at the same time( parallel execution cpu and gpu ), Is that possible ? or Does it make sense at all ? perhaps, cuda unified memory is only for producer-consumer model ? of course I am responsible of data race.
When I compare zero-copy and unified memory, is unified memory always better ? Because is the reason cuda unified memory that activates cache of CPU of Jetson ?

Thank you very much in advance

Robert_Crovella · July 12, 2016, 6:34pm

It currently does not fit the UM execution model:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-gpu-exclusive

“In general, it is not permitted for the CPU to access any managed allocations or variables while the GPU is active.”

I think some folks have found zero-copy to be the faster approach on Jetson where the memory is physically unified. You might need to benchmark the differences for your code and test case. There are no coherency guarantees when using zero-copy. There is effectively a coherency (CPU/GPU) guarantee with UM, see above.

Jimmy_Pettersson · July 12, 2016, 10:45pm

in my experience using zero-copy buffers and user managed coherency ( cudaStreamSynchronize(…), etc ) is the most efficient approach on TK1 when you need really “tight” CPU/GPU concurrency.

I don’t Think there is a real performance benefit with UM but rather ease of use.

grynet · July 14, 2016, 1:20pm

Hi All,

Thank you very much for your answer and sharing experience. They are quite useful.

So when you say “efficient coherency” @txbob what did you mean ? Can unified memory do something better than user managed coherency ? Such as less data usage, more cache usage etc.

@Jimmy Zero copy is cool but to manage coherency I need to duplicate data and migrate somehow. Unified memory does automatically thats why i am into that.

Robert_Crovella · July 14, 2016, 1:39pm

I don’t think I said “efficient coherency” anywhere in my posting.

grynet · July 14, 2016, 10:05pm

Uh sorry :) fast writing. Effectively coherency that I wanted to say.

Robert_Crovella · July 14, 2016, 10:24pm

I said:

There is CPU/GPU “coherency” because of the exclusive access provision of UM that I already pointed out and gave the document link for. Did you read it?

What it says is that when the GPU has access to a UM memory space, the CPU does not. And when the CPU has access to a UM memory space, the GPU does not. This is the nature of the current implementation (pre-CUDA 8/P100) of UM (and which is the implementation that would be relevant for Jetson TK1/TX1, the subject of this thread.)

Therefore, when the CPU is accessing the memory space, all previous GPU traffic to that space is guaranteed to be complete and coherent. Likewise when the GPU is access the memory space, all previous CPU traffic to that space is guaranteed to be complete and coherent.

Topic		Replies	Views
Does unified memory and zero copy always better than cudaMemcpy? CUDA Programming and Performance	4	1516	February 10, 2018
Zero-Copy and Managed memory on Jetson Jetson TX1	9	11712	August 20, 2018
With unified memory, is there a way to overwrite data that was last used on host, on the device, without causing page faults? CUDA Programming and Performance	2	618	May 5, 2022
Uninfied Memory on Jetson TK1/TX1 CUDA Programming and Performance	0	862	February 6, 2016
Unified Memory on Jetson TK1 Jetson TK1	2	1533	February 15, 2016
Is this a fair way of timing UMA code? CUDA Programming and Performance	5	1053	May 7, 2015
Can I use Unified Memory in a soft real-time system? CUDA Programming and Performance	13	348	April 1, 2024
Jetson TK1 memory management Jetson TK1	0	885	September 27, 2014
Best hardware options to reduce GPU and CPU memory transfer time? Jetson Nano	6	1063	January 19, 2022
Difference between cudaMallocManaged and zero copy memory function CUDA Programming and Performance	1	5804	March 1, 2018

Using CUDA Unified memory on embedded board (psychical unified memory)

Related topics