CUDA 6.5 Unified Memory (cudamallocmanaged)

mn44 · February 18, 2015, 10:07am

Hello,
my question is related to CUDA programming and the implementation of Unified Memory.
I´m using a Single Nvidia Quadro K4000 GPU on a 64-Bit Linux System (OpenSuse13.1). I installed the required graphics driver and the CUDA Computing 6.5 Toolkit. Everything works well !
In a project I am working with a particle simulation in DualSPHysics (Opensource). The GPU code I am trying to modify is written in C++ and Cuda 4.0 (there is “cudamalloc” instead of “cudamallocmanaged”). Currently the simulations are on the limit of our 3GB GPU RAM. Now we want to use the full power of our GPU and so we tried to include the Unified Memory capability in the simulation tool to access the extra 64GB system RAM on our workstation.
I have already compiled and run the “Unified Memory Streams” sample successful. There is no error or complication.
To modify the code of the simulation I replaced every single “cudamalloc” with a “cudamallocmanaged”. I know that there are some more things to consider (e.g. remove memcpy) but as far as I can see that´s not the main problem.
The code compiles successful. There is no complication. When I simulate a case which requires less than the maximum accessible GPU RAM (3GB) everything works well. When I simulate a case which requires more than the maximum accessible GPU RAM (i.e. the simulation needs to access the Unified Memory) there is no success and the required RAM can not be allocated. I assume that there is a problem by accessing the Unified Memory via cudamallocmanaged…!?

Do you have an idea how I can solve this problem ?
Is there a tool or something similar on how I can test my Unified Memory ?
Could this be a hardware restricted problem ?
How much effort would it take to modify the tool successful ?

Please excuse my bad description - programming is is not my business.

Regards from Germany

Robert_Crovella · February 18, 2015, 2:37pm

Unified memory does not allow you to exceed the device memory (RAM) that is physically present on your GPU. The UM documentation states that the primary purpose of UM is to eliminate the need for explicit cudaMemcpy operations, but that the data must still be migrated to the processor (host or device) that is using it. This migration means that the transfer must still occur, and therefore the device memory must still be large enough to hold the data.

Based on your description, there is nothing wrong with your system or Unified Memory setup. The behavior you are experiencing is expected.

In general, GPU programs that require access to data that exceeds their physical RAM size may use some other techniques:

Pipelined access. Break the data into pieces and move a piece to the GPU when the GPU needs to operate on that piece. This is typically used when the GPU needs to make high-volume access to the data.
Zero Copy. Place the data in a zero-copy region (ie. host memory allocated with cudaHostAlloc, or similar) to give the GPU direct access to it. This has significant bandwidth restrictions, and so is only recommended when the GPU needs occasional or limited access to the data, not for high-volume access.

Topic		Replies	Views
Accessing Managed Memory During Asynchronous Copies CUDA Programming and Performance	4	448	March 4, 2024
Unified memory and overprovisioning CUDA Programming and Performance cuda	5	1424	March 6, 2022
cudaMemcpy Timing Variability on Windows CUDA Programming and Performance	3	685	May 23, 2018
Can I use Unified Memory in a soft real-time system? CUDA Programming and Performance	13	351	April 1, 2024
Unified Memory Limits? CUDA on Windows Subsystem for Linux	7	3755	July 6, 2022
Unified memory oversubscription and page faults CUDA Programming and Performance	7	2829	March 23, 2018
CUDA device memory access? CUDA Programming and Performance	11	15707	August 5, 2011
cudaStreamAttachMemAsync behavior questions GPU-Accelerated Libraries	0	1676	September 19, 2016
Pascal & capabilities 6.0 show cudaDevAttrConcurrentManagedAccess is 0 CUDA Programming and Performance	15	1377	December 27, 2018
Abysmal performance with Unified Memory and CUBLAS CUDA Programming and Performance	15	4300	November 29, 2014

CUDA 6.5 Unified Memory (cudamallocmanaged)

Related topics