Memory fragmentation

This is probably a retoric question, but… Is there a way to control where a cudaMalloc would allocate the data relative to offset zero in GPU RAM?
The problem is that I have very big data to allocate on the GPU, consider the following scenerio:

  1. Allocate 1.5GB pointerA.
  2. Allocate 700MB pointerB.
  3. Allocate 700MB pointerC.
  4. Allocate 700MB pointerD.
  5. Allocate some various small size pointers.

For a C1060 that should fit, however depending on the positions of the arrays in the 4GB address space, it might fail.
Is there a way to ensure this would fit into memory other than making the arrays smaller by dividing them to chunks???


When we cannot control the location where the malloc happens on CPU (can we?), how justified is it to demand the same from GPU?

I have come to the conclusion that the most reliable way to get this done is to allocate every last byte of free memory (or at least as much as your “big” storage needs require) on the device in an initialization stage at the beginning of the code, and then manage the division of chunks of memory out of that initial allocation yourself. The card/driver maintains a number of different page sizes which can result in all sorts of odd fragmentation and “lost” memory, to the point where single byte allocations in what appears to be “high” memory space, actually allocates complete 64kb pages.

EDIT: I did have a play around with the issue of fragmentation a while ago and posted by results in this thread…st&p=586012

Some of the ways to ‘minimize’ memory fragmentation (which I generally follow on CPU) are:

  1. Allocating memory in large chunks.
  2. Allocating memory (preferably) in sizes of the form 2^n.
  3. If none of the above solve this issue, try to come-up with a custom memory allocation function!!

Thank you both for the fast replies :)

I feared that would be the answer.

@avidday - allocating all memory is a bit problematic to me I guess. First because textures are limited in size and part of this

“huge” array needs to be accessed by textures. Also a big dataset input won’t fit into memory in anycase (even into 4GB) so

I’ll have to juggle around a lot and don’t know if its doable… I’ll test it though…

@teju - custom memory allocation function is what I was thinking when I asked the original question. I guess on CPU you can

do that, but with CUDA not. Is there something like this in CUDA?



I’ve seen loads of custom memory alloc functions for CPU, but none for GPU, till now. The main issue with coming up with such a function is that it has to provide allocation routines for different kinds of memories, 1D, 2D-pitch, 3D, memory arrays, texture, constant memory… !!!