I am particularly interested in the unified memory feature extension that came with cuda 8.
Correct me if I am wrong, but before cuda 8, the use of cudamallocmanaged for a 100GB vector yielded allocation of 100GB on both host memory, GPU0 memory, and GPU1 memory, for a dual GPU system.
All Pascal GPUs are supposed to have this support from a HW perspective. I have personally run this experiment and validated support (oversubcription of GPU memory) using CUDA 8 on a Pascal Titan X on linux.
For architectures available at the moment, atomicity is only guaranteed for operations emanating from a single GPU. Not guaranteed to be atomic between atomic operations issued on separate GPUs and/or the system CPU.
This is a very good news, actually I prefer atomic operations to be performed with regard to a single GPU to ensure optimal performances.
I am currently designing a system to run experiments related to managed memory, and the link from the parallelforall blog actually states that this feature is OS-dependant:
“Certain operating system modifications are required to enable Unified Memory with the system allocator. NVIDIA is collaborating with Red Hat and working within the Linux community to enable this powerful functionality.”
On what linux OS do you run your tests, or where can I find a list of supported OS ?
The base feature is not OS-dependent and should work as advertised on the supported OSs/environments for unified memory.
The specific reference to “Unified Memory with the system allocator” refers to a specific capability: the ability to use managed features even if the underlying allocation was created with e.g. system malloc or new, instead of a managed allocator.
This ability (AFAIK) requires specific version of linux kernel. I haven’t tested this ability at this time and wouldn’t be able to give you a recipe or further information. I personally consider this to be an unsupported feature at this time, i.e. something that is coming in the future.
But if you use a managed allocator e.g. cudaMallocManaged, the oversubscription capability I previously described should be available according to the supported environments for UM as published in the programming guide.