memory allocation question

is it possible to dynamically allocate during kernel execution a piece of global memory? I know i could use shared memory but the problem i’m facing needs more than shared memory can provide…

If you have a Fermi GPU (so compute 2.0 or 2.1 capability), then yes, it is possible. If not, then no, you can’t.

yes i’m working on a Fermi GPU, how do i allocate this memory?

You can use malloc() and free(). CUDA 4.0 also supports the C++ new and delete keywords, if you prefer.

is the memory allocated, allocated per block or is it accessible for every block, i.e. if i free the memory at the end of my kernel, it might be that other blocks loose access to the pointer?

It is per thread. If your kernel includes a malloc call, every thread which executes the malloc or new gets a memory allocation.

i need to allocate a certain amount per block like:

if(threadIdx.x==0 && threadIdx.y==0 && threadIdx.z==0 )

		sData = (int*) malloc (sizeof(int)*searchVolumeDimZ*searchVolumeDimY*searchVolumeDimX);

        __syncthreads();

        .....processing......

	__syncthreads();

	if(threadIdx.x==0 && threadIdx.y==0 && threadIdx.z==0)

		free(sData);

but i get access violations that way