Initializing shared memory

Is there any way to initialize shared memory?
Mine is an image processing application where I need to pad the image data for each block with zeros is there any way to do it? I do not want to dedicated extra threads just to pad.
Is the shared memory variable initialized to zero when it is declared?

I don’t believe so, any more than registers are initialised to zero. Even if it happens that way now, I doubt that NVIDIA guarantees that behaviour for future generations of hardware.

Good question. External Media

In my program, when declared shared memory variable, the initialized value is not zero, so i must set zero for the shared memory variable.

No need to use extra threads. Just put the initialization at the start of your kernel. Shared memory is FAST.

for (int i=threadIdx.x; i<SharedMemSize; i+=32*blockDim.x)  myShared[i]=0;

The same kind of initialization can be used for global memory too, and it’s faster than a hostside memcpy.

Hi SPWorley

With initialization for global memory, I usually use cudaMemset() function.

Did you mean that when initialize zero for global memory, if using your above method more quickly than using cudaMemset()?

and If using that method, we have to create a new kernel to set zero for global memory, correct me if my understand is wrong.

With global memory case, I have never trying that method before.

Yes. It’s surprising but true. Alex Dubinsky did some great analysis.

http://forums.nvidia.com/index.php?showtopic=85562&hl=