Unified memory for CC 6.1

Tobbey · December 5, 2016, 1:25pm

Dear all,

I am particularly interested in the unified memory feature extension that came with cuda 8.
Correct me if I am wrong, but before cuda 8, the use of cudamallocmanaged for a 100GB vector yielded allocation of 100GB on both host memory, GPU0 memory, and GPU1 memory, for a dual GPU system.

The previous example is particularly painful because there is no GPU with such amount of memory.
I thought I understood that a proper virtual memory addressing system, with page management appeared in cuda 8: https://devblogs.nvidia.com/parallelforall/inside-pascal/

My questions are quite simple:

-the article above only mentions GP100 chip, with CC6.0 (GP100), but is this feature available on 6.1 compute capability, like GTX 1050Ti/1060/1070/1080/Titan X ?

-Can I perform atomicAdd on multiGPU systems, with cuda managed memory ?

Thank you in advance for your help.

Robert_Crovella · December 6, 2016, 4:52am

All Pascal GPUs are supposed to have this support from a HW perspective. I have personally run this experiment and validated support (oversubcription of GPU memory) using CUDA 8 on a Pascal Titan X on linux.

For architectures available at the moment, atomicity is only guaranteed for operations emanating from a single GPU. Not guaranteed to be atomic between atomic operations issued on separate GPUs and/or the system CPU.

Tobbey · December 6, 2016, 12:44pm

Thank you very much txbob for you kind response.

This is a very good news, actually I prefer atomic operations to be performed with regard to a single GPU to ensure optimal performances.

I am currently designing a system to run experiments related to managed memory, and the link from the parallelforall blog actually states that this feature is OS-dependant:

“Certain operating system modifications are required to enable Unified Memory with the system allocator. NVIDIA is collaborating with Red Hat and working within the Linux community to enable this powerful functionality.”

On what linux OS do you run your tests, or where can I find a list of supported OS ?

Thank you in advance for your help.

Regards

Robert_Crovella · December 6, 2016, 2:01pm

The base feature is not OS-dependent and should work as advertised on the supported OSs/environments for unified memory.

The specific reference to “Unified Memory with the system allocator” refers to a specific capability: the ability to use managed features even if the underlying allocation was created with e.g. system malloc or new, instead of a managed allocator.

This ability (AFAIK) requires specific version of linux kernel. I haven’t tested this ability at this time and wouldn’t be able to give you a recipe or further information. I personally consider this to be an unsupported feature at this time, i.e. something that is coming in the future.

But if you use a managed allocator e.g. cudaMallocManaged, the oversubscription capability I previously described should be available according to the supported environments for UM as published in the programming guide.

Tobbey · December 6, 2016, 2:44pm

Ok thank you for clarifying this point.

Topic		Replies	Views
unified memory with CUDA 8 CUDA Programming and Performance	7	3448	April 2, 2018
Unified Memory in CUDA 6 Technical Blog	87	2508	August 16, 2019
Unified Memory for CUDA Beginners Technical Blog	46	3009	December 1, 2023
about managed memory Legacy PGI Compilers	1	1814	October 9, 2017
Does Pascal Unified Memory, mentioned in pascal whitepaper, supported now? CUDA Programming and Performance	9	1775	April 14, 2017
Using unified memory in GTX 1070 with CUDA 8 CUDA Programming and Performance	4	1559	October 29, 2016
Unified memory with multiple GPUs and no P2P CUDA Programming and Performance cuda	5	472	January 9, 2025
CUDA 6.5 Unified Memory (cudamallocmanaged) CUDA Programming and Performance	1	2215	February 18, 2015
SM architecture 6.x additional Unified Memory (PeagableMemoryAccess and ConcurrentManaged Acess) support CUDA Programming and Performance	2	757	July 10, 2017
Problems with Unified Memory Under Pascal CUDA Programming and Performance	2	1130	January 24, 2017

Unified memory for CC 6.1

Related topics