Shared memory between several kernels

I want to know if it’s possible to create a “shared” memory between kernels ?
More explanation : I have one kernel ( block : 32x32 and threads per block : 16x16 ) when one thread ends, I want save two variables in order to if I launch an other kernel, this kernel can load this two variables.
But I don’t know what kind of memory I suppose to used to make that.

Any Ideas ?

I rephrased my problem because I think it was not very clear.
Thanks for your future answers.

This is what global (AKA “device”) memory is for. Allocate a block of it with cudaMalloc() and pass the pointer to your first kernel (to write to it) and then pass it again to the second kernel (to read it).

Thanks for your reply.

I’ve declare in my file .cu :

typedef struct {

   float4 color;

   float3 position;

} context;

__device__ context* t_context;

size_t pitch;

size_t contextWidth = 512;

size_t contextHeight = 512;

after that I’ve initialize my t_context : cutilSafeCall( cudaMallocPitch( ( void** ) &t_context, &pitch, contextWidth * sizeof(context), contextHeight ) );

But I don’t know how I can access/write in my t_context.

I’ve in my kernel :

uint x = __umul24( blockIdx.x, blockDim.x ) + threadIdx.x;

uint y = __umul24( blockIdx.y, blockDim.y ) + threadIdx.y;

My goal is to save for each thread the position and color in the t_context array and when I launch the same kernel I load the values (position and color) in position x,y of the thread.

Precision : block : 32x32 and threads per block : 16x16 so I think an array 512x512 is right but I’m not sure.

I think you might be able to do that with the extern shared declaration.

[url=“http://http.developer.nvidia.com/GPUGems3/gpugems3_ch31.html”]http://http.developer.nvidia.com/GPUGems3/gpugems3_ch31.html[/url]

examples 31-2 and 31-3

notice how the # extern shared float4 shPosition; is shared between two kernels.

Is this what you asked?

;)

That does not mean what you think it means. That’s dynamic shared memory allocation, not sharing memory across kernel invocations.

How create an 2D array of context

typedef struct {

   float4 color;

   float3 position;

} context;

in order to pass a parameter of the kernel and how can I use it ?