Unified memory for CC 6.1

Dear all,

I am particularly interested in the unified memory feature extension that came with cuda 8.
Correct me if I am wrong, but before cuda 8, the use of cudamallocmanaged for a 100GB vector yielded allocation of 100GB on both host memory, GPU0 memory, and GPU1 memory, for a dual GPU system.

The previous example is particularly painful because there is no GPU with such amount of memory.
I thought I understood that a proper virtual memory addressing system, with page management appeared in cuda 8: https://devblogs.nvidia.com/parallelforall/inside-pascal/

My questions are quite simple:

-the article above only mentions GP100 chip, with CC6.0 (GP100), but is this feature available on 6.1 compute capability, like GTX 1050Ti/1060/1070/1080/Titan X ?

-Can I perform atomicAdd on multiGPU systems, with cuda managed memory ?

Thank you in advance for your help.

All Pascal GPUs are supposed to have this support from a HW perspective. I have personally run this experiment and validated support (oversubcription of GPU memory) using CUDA 8 on a Pascal Titan X on linux.

For architectures available at the moment, atomicity is only guaranteed for operations emanating from a single GPU. Not guaranteed to be atomic between atomic operations issued on separate GPUs and/or the system CPU.

Thank you very much txbob for you kind response.

This is a very good news, actually I prefer atomic operations to be performed with regard to a single GPU to ensure optimal performances.

I am currently designing a system to run experiments related to managed memory, and the link from the parallelforall blog actually states that this feature is OS-dependant:

“Certain operating system modifications are required to enable Unified Memory with the system allocator. NVIDIA is collaborating with Red Hat and working within the Linux community to enable this powerful functionality.”

On what linux OS do you run your tests, or where can I find a list of supported OS ?

Thank you in advance for your help.


The base feature is not OS-dependent and should work as advertised on the supported OSs/environments for unified memory.

The specific reference to “Unified Memory with the system allocator” refers to a specific capability: the ability to use managed features even if the underlying allocation was created with e.g. system malloc or new, instead of a managed allocator.

This ability (AFAIK) requires specific version of linux kernel. I haven’t tested this ability at this time and wouldn’t be able to give you a recipe or further information. I personally consider this to be an unsupported feature at this time, i.e. something that is coming in the future.

But if you use a managed allocator e.g. cudaMallocManaged, the oversubscription capability I previously described should be available according to the supported environments for UM as published in the programming guide.

Ok thank you for clarifying this point.