Using some threads for data load data to shared mem only

Is it possible to use initial threads ( eg 1 to 32) to just load data into the shared memory and rest of the threads using that data? I ask this because i guess once a warp starts it is not necessary that it finishes execution before next warp starts and in that case my current model will give incorrect answers.

Is there any way to initialize shared memory before threads start? This is because i will be using shared memory to store some sub-images and i need to pad each subimage with zeros. Is there any other technique any one can suggest to achieve this?

What you really need to do is use the __syncthreads() barrier to ensure that data has been loaded before continuing the calculation…

But the problem here is that i wanna use __syncthreads() for only the first 32 threads then will the rest of the threads determine this dependency and wait?

So if i write following code :

if (threadIdx.x <32)

{

//load data

__syncthreads();

}

else {

// code for rest of the threads which will use data loaded by first 32 threads

}

Will this work?

That will crash. __syncthreads is a block wide barrier.

What you want is:

if (threadIdx.x <32)

 {

   //load data 

 }

__syncthreads();

// use loaded data