Shared memory memset, and allocation forcing array allocation, and memset

Edit: Just after I’ve posted the original question, it occured to me how I could solve part of it :) So this is the only part that remains:

Is there a variant of memset that works on shared memory arrays?

Is there any other fast way to get the array in SM initialized to all zeros, except launching multiple threads and having each set one or more elements to zero, and then synching?

Thanks in advance for the tips!

anybody? fast memset 0 on the shared memory?

The fastest possible memset on shared memory is to have each thread initialize a single element of shared memory.

It seems there is no method other than initialize them using many threads…

Well, in principle you could do:

if (threadIdx.x == 0)

    {

    for (int i = 0; i < sdata_num_elements; i++)

        sdata[i] = 0

    }

__synchthreads();

But that would serialize the memory initialization to 1 thread and make it very slow.

This technique can still be used if you have more shared memory elements to initialize than you have threads, with the proper modification to keep all threads busy doing useful work:

for (int i = 0; i < sdata_num_elements; i+=boxDim.x)

    {

    if (i + threadIdx.x < sdata_num_elements)

         sdata[i + threadIdx.x] = 0;

    }

__synchthreads();