CUDA UVM MEMORY USAGE - IMPLEMENTATION DETAILS

Hi there!

I have a question on UVM and its implementation details.

I think that UVM can make use of not only DDR5 memory, but other, yet faster memory types in the GPU (shared, L1/L2 cache, registers). On the contrary, there are researchers that claim that UVM makes only use of GPU’s DDR5 memory (global memory, as I understand). However, looking through the documentation provided by nVidia I have not been able to confirm either extent.

Is there any document stating the inner mechanisms of the UVM implementation, or more generally, what types of memory and under which circumstances it uses them? And about the optimizations made by the compiler/“UVM manager”?

I would really appreciate any direction on these subjects.

Thanks in advance,

I don’t have any implementation details. But UM (“Unified Memory”)

[url]Programming Guide :: CUDA Toolkit Documentation

applies to global memory only. Not local, shared, constant, texture, or any other type. If you study the above programming guide section, this will be fairly evident, and in some cases, explicit (e.g. with respect to constant memory).

Thanks for the post!

It seems I overread the part referring to constant declarations. How embarrasing! :-0

There are, however, some thoughts/questions about that:

  • It would be great if nVidia could state this explicitly in their documentation.
  • If UVM is meant to lower the entrance barrier to GPU programming, is the compiler making any further memory optimizations or are shared and other faster memory types being just neglected when managed variables are used? It does not seem very smart to me to leave those faster memory types only to the experienced programmer making use of old memory statements.

Any hint on the previous questions or directions on where to ask for further information about UVM implementation details would be highly appreciated.

Thanks again!