malloc/realloc on __device__

I used forum searched but wondered why nobody else has asked the question yet:

How can i allocate or reallocate memory within a device function?

And is there some kind of preprocessor flag, so that i can distinguish between device and host mode when i have a function?

__device__ __host__ void doSomething() 

{

#ifdef _HOST

  printf("foo");

#else

  int bar = 0;

  bar++;

#endif

 doSomeOtherStuff();

}

THX

What’s the point for them? Virtually, your kernel is the only program run on GPU, so it can hold the whole global memory. Declaring variables statically would be faster.

The point is that if I use the whole global memory I have to write an own memory management functionality to manage the fragments that i use an release inside the kernel.

In my special case I have such like an arrayList on global memory that can be full, after that i have to resize it (on device).

And I don’t have just one, maybe every thread has its arrayList, so if I just continue writing out of bounds I may write into an other arrayList and crash it.

And just allocating a huge amount of memory for each list is not a solution, because its waste of resources and even a huge array may be too small, even if there is unused memory inside another arrayList.

Did you have any luck with this? Being able to realloc on the device would be quite useful!

This should work with nvcc 3.0:

__device__ __host__ void doSomething() 

{

#ifndef __CUDA_ARCH__

  printf("foo");

#else

  int bar = 0;

  bar++;

#endif

 doSomeOtherStuff();

}

And no, there’s no support for malloc from device code.