Hi,
This is probably a retoric question, but… Is there a way to control where a cudaMalloc would allocate the data relative to offset zero in GPU RAM?
The problem is that I have very big data to allocate on the GPU, consider the following scenerio:
Allocate 1.5GB pointerA.
Allocate 700MB pointerB.
Allocate 700MB pointerC.
Allocate 700MB pointerD.
Allocate some various small size pointers.
For a C1060 that should fit, however depending on the positions of the arrays in the 4GB address space, it might fail.
Is there a way to ensure this would fit into memory other than making the arrays smaller by dividing them to chunks???
I have come to the conclusion that the most reliable way to get this done is to allocate every last byte of free memory (or at least as much as your “big” storage needs require) on the device in an initialization stage at the beginning of the code, and then manage the division of chunks of memory out of that initial allocation yourself. The card/driver maintains a number of different page sizes which can result in all sorts of odd fragmentation and “lost” memory, to the point where single byte allocations in what appears to be “high” memory space, actually allocates complete 64kb pages.
I’ve seen loads of custom memory alloc functions for CPU, but none for GPU, till now. The main issue with coming up with such a function is that it has to provide allocation routines for different kinds of memories, 1D, 2D-pitch, 3D, memory arrays, texture, constant memory… !!!