memset SM on device


I’ve got quite a big kernel that looks like this

<load data to SM (shared memory)>


<process data>


<set SM to 0>


<write results into SM>


write results from SM to global memory

Now, that is quite a bunch of syncthreads and I wonder if there is a faster way to set SM back to 0, like memset in STL or cudaMemset on host?

Kind Regards

__syncthreads() isn’t that expensive. Then again, I don’t see why you want to zero out the SM before writing to it. The usual usage pattern is just

[font="Courier New"]__syncthreads();[/font] [font="Courier New"]__syncthreads();[/font]

or, if you potentially overwrite memory that may still be needed by other threads

[font="Courier New"]__syncthreads();[/font] [font="Courier New"]__syncthreads();[/font] [font="Courier New"]__syncthreads();[/font]

Well, for performance reasons, I use the SM to store the delta from the old solution to the new solution and afterwards the SM is added to the global solution. Anyway, if there is no faster way to zero the memory,I don’t need to change anything. Thanks for the feedback :thanks: