global to shared

I have kernel code that I want to run different sets of data. I transfer all the data at the beginning to the global memory and move each set to shared and process and send data back.
Should there be a kernal launch for each set of data or there is a way to move on the data without multiple kernel launches?

Thanks in advance,

You can use FOR loops inside your kernel to process more and more data…

Like for example: the following code will clear “n” elements to zero regardless of how many blocks and threads it is run with:

for(i=blockIdx.x*blockDim.x; i<n; i+=blockDim.x*gridDim.x)

{

   array[i+threadIdx.x] = 0;

}

The only assumption is one-dimensioanl threads and blocks and that is fine and good enuf for a 1D array