is it possible to dynamically allocate during kernel execution a piece of global memory? I know i could use shared memory but the problem i’m facing needs more than shared memory can provide…
If you have a Fermi GPU (so compute 2.0 or 2.1 capability), then yes, it is possible. If not, then no, you can’t.
yes i’m working on a Fermi GPU, how do i allocate this memory?
You can use malloc() and free(). CUDA 4.0 also supports the C++ new and delete keywords, if you prefer.
is the memory allocated, allocated per block or is it accessible for every block, i.e. if i free the memory at the end of my kernel, it might be that other blocks loose access to the pointer?
It is per thread. If your kernel includes a malloc call, every thread which executes the malloc or new gets a memory allocation.
i need to allocate a certain amount per block like:
if(threadIdx.x==0 && threadIdx.y==0 && threadIdx.z==0 )
sData = (int*) malloc (sizeof(int)*searchVolumeDimZ*searchVolumeDimY*searchVolumeDimX);
__syncthreads();
.....processing......
__syncthreads();
if(threadIdx.x==0 && threadIdx.y==0 && threadIdx.z==0)
free(sData);
but i get access violations that way