When is 'virtual memory' available in CUDA ?

HannesF99 · October 12, 2009, 7:52am

Hi, i’m wondering whether the ‘virtual memory’ feature (meaning that if the GPU memory is fully occupied, currently un-used memory areas in the GPU memory are ‘swapped’ out to Host(CPU RAM) memory) is available in CUDA ? Maybe with the next GPU generation ‘FERMI’ ?
I consider it as an really important feature - it is annoying if a kernel quits execution because not enough GPU memory is available. The better alternative would be that it would get slower (due to swapping).

eyalhir74 · October 12, 2009, 8:14am

Look at pin memory… I myself just break the data to pieces and move them as needed to the GPU (my datasets are usually far higher than 6GB).

eyal

Simon_Green · October 12, 2009, 8:41am

No, there is no hardware support for virtual memory on any current GPU or Fermi. As the previous poster points out, you can always implement your own paging in the application.

It is also possible to use so-called “zero-copy” which allows you to read CPU memory directly across the PCIe bus, although with a much higher latency. See the programming guide for details.

CUDA also doesn’t support dynamic memory allocation from kernels, so I’m not sure how a kernel would quit due to out of memory.

It’s worth noting that GPU memory sizes will continue to increase too.

HannesF99 · October 12, 2009, 12:36pm

you are right - allocating GPU memory inside a kernel is not possible. I meant allocating GPU memory in the routine which calls the kernels.
It just would be fine if the user could be freed from the task of taking care of the GPU memory. Virtual memory would be very comfortable, and one is used to it since it is available for CPU memory for very long.
And for an real application it is not acceptable if the application ‘breaks’ because it got out of GPU memory.
Note you can not predict how much GPU memory you will use in an application (to be able to pre-allocate it) - depends on the program logic etc. Of course GPU memory size is increasing, but e.g. 1GB for the current GX280 is in fact ‘nothing’ if you are doing image/video processing…
best regards, Hannes

eyalhir74 · October 12, 2009, 1:05pm

as Simon corrected me - this is what zero-copy and pinned memory all about.

my CPU application also “breaks” if I allocate a 17GB RAM on a 16GB ram machine which is also diskless (a valid production environment)

there is nothing new or different here from GPU and CPU. If your application needs too much memory and the system (read GPU/CPU) can’t provide your

application will fail - Unless you imploy a certain chunk mechanism as I’ve mentioned above.

Not so true. You know exactly how much memory you’re going to use. You code this in you (new/malloc/cudaMalloc…) when you get to the line in code

that needs to allocate memory you know very well how much you need. Again the solution would be to put some code to determine at runtime how much memory

you need and then to break your algorithm accordingly to work in chunks.

Seismic datasets can get to tens of Gigabytes, you can’t expect the GPU to have so much RAM on it (and neither the CPU for that matter),

you just have to break your data into chunks each fitting into the available amount of memory on your system.

eyal

HannesF99 · October 12, 2009, 2:43pm

Of course I know how much memory i will need at the moment to execute a specific kernel. But when working with many different libraries it is a lot of work to modify all of them approbiately to ensure this ‘paging’/‘chunk’ mechanism at all places where memory is allocated. We are using currently e.g. the CUDPP library and some libraries from an university, in the future definitly the NVPP, CUFFT, some LAPACK libs and lots of other useful libraries.

Simon_Green · October 12, 2009, 3:23pm

Ah, I see. GPU memory virtualization at the kernel level might be possible, although with the current programming model it’s not obvious to me how the driver would know which memory pages each kernel would require.

It worth noting that Windows Vista already does some level of GPU memory virtualization - I believe multiple applications can allocate close to the whole GPU memory and it will manage swapping data back to main memory.

HannesF99 · October 13, 2009, 7:49am

Yes, in the new ‘WDDM’ for Vista/Windows 7 there seems to be some kind of GPU memory virtualization. I’m not sure whether it applies also for CUDA applications or only for DirectX.
See
[url=“http://download.microsoft.com/download/5/b/9/5b97017b-e28a-4bae-ba48-174cf47d23cd/PRI103_WH06.ppt”]http://download.microsoft.com/download/5/b...PRI103_WH06.ppt[/url]

This could be exactly what i’m looking for.

Topic		Replies	Views
Device Memory Mangement CUDA Programming and Performance	14	3458	December 5, 2008
CUDA device memory access? CUDA Programming and Performance	11	15703	August 5, 2011
Global memory occupied until cudaDeviceReset() or app exits CUDA Programming and Performance	0	2508	June 25, 2014
Help to understand the frame of CUDA programming CUDA Programming and Performance	2	1420	November 30, 2014
Memory on the Nvidia device between kernel calls tends to retain state CUDA Programming and Performance	26	14402	June 21, 2009
CUDA to run a virtual machine? CUDA Programming and Performance	17	31411	April 15, 2010
question about page locked memory CUDA Programming and Performance	2	8810	April 21, 2009
CUDA thread in background? CUDA Programming and Performance	10	16006	February 19, 2010
Some advice needed pls Doubts we have, we're starting with CUDA programming CUDA Programming and Performance	16	4698	June 22, 2011
Dazed and Confused.. CUDA Programming and Performance	6	1412	April 8, 2013

When is 'virtual memory' available in CUDA ?

Related topics