Memory allocation by GPU -> Possible in future releases?

cudafelix · March 12, 2009, 4:30pm

Hi there,

Does anyone know, if in the near future, CUDA will support device memory allocation handled by the GPU kernel? As far as I know, the GPU is already able to create new triangles under DirectX10 (is this correct?).

Sorry, if this question has been asked before, I couldn’t find any thread answering it :-)

Greetings, Felix

tmurray · March 12, 2009, 6:10pm

Why do you want it? I’ve yet to hear a case where this is the right answer, but I try to keep an open mind :)

cudafelix · March 12, 2009, 10:30pm

I am trying to implement an arbitrary precision integer arithmetic in CUDA. If the GPU kernel would be able to allocate its own device memory, that would simplify many of the algorithms I’m using such as division and multiplication of Bigints. The algorithms are based on Knuth’s “The Art of Computer Programming” and sometimes need some re/allocation of memory.

At the moment, I’m precalculating all the needed data on the CPU and transfer that data along with the bigints to the GPU, which creates a transfer overhead I would like to omit.

I take your reply as an indirect answer to my question, that there is no such thing in developement, right? :)

Smokey · March 13, 2009, 12:35am

The only time one could ever require dynamic malloc on the GPU is in the case of an unpredictable kernel - which means it’s calculating results from internal gpu clock, or an external ‘random’ property that the CPU itself can’t ‘easily’ predict the size of the result from (to the point where it would be faster just to calculate the kernel on the CPU).

SPWorley · March 13, 2009, 12:57am

I created my own on-device memory manager, mostly to handle partial work results and dynamically generated subtasks.
It’s not too hard, the basic trick is to use global atomics to share out pieces a pre-allocated large block of memory at need.
The problem with this is that it uses atomics which are SLOW, so you can try to have suballocators at the block level… a block “grabs” a large chunk and threads ask their own block for pieces of that chunk.

Another type of memory allocator could use a linked list of chunks as well, allowing free as well as alloc.

So it is possible… but it’s likely not too useful. It’s a lot of annoying bookkeeping, and atomics are not fast. I am looking for better ways to handle my sub-task problem without using the system I created, even though it’s working.

Mr_Nuke · March 13, 2009, 4:50am

You can try allocating an extra buffer, let’s say the maximum you would reasonably use, and then just use pieces of it. I’m not sure if this method is suited for your specific application, and most likely you won’t get the luxury of coalescing, but at least you have a place to store any extra precision you would need.

Let’s say you use 32-bit ints as your starting point. You could allocate four 32-bit ints per variable, and maybe use a few bits of the first int as a flag to how many ints you actually used of the four.

An implementation of this sort, while a memory hog should be faster than a dynamic on-GPU allocation.

Topic		Replies	Views
memory allocating in __device__ function How? CUDA Programming and Performance	5	2736	October 28, 2008
memory allocation question CUDA Programming and Performance	6	4146	April 29, 2011
Memory fragmentation CUDA Programming and Performance	5	6678	October 13, 2009
GPU Allocating memory Memory allocation on GPU CUDA Programming and Performance	2	4644	April 23, 2009
How to dynamically allocate shared memory? in _global__ or __device__ functions CUDA Programming and Performance	8	27240	October 7, 2010
Custom Memory allocator for Cuda desired CUDA Programming and Performance	2	3686	December 14, 2015
malloc/realloc on __device__ CUDA Programming and Performance	4	3939	February 10, 2010
Consistent Memory Allocation CUDA Programming and Performance	1	543	January 19, 2016
Question Dynamic Memory Allocation in the kernel function CUDA Programming and Performance	2	3620	November 30, 2009
Passing matrices to kernel Only Allocate for memory on device CUDA Programming and Performance	3	1484	April 6, 2010

Memory allocation by GPU -> Possible in future releases?

Related topics