how to initialize shared memory


If I consider a kernel-function:

global f()
shared int my_list[100];


if I want to initialize for example my_list at zero, the problem
is that every thread will do that, so the only way I see is that every thread
should work on a special partition of the array my_list (in terms of initialization
and modification after),

1)is there an other possibility ?
2)What is the consequence/raison to put extern shared is some cases?
3)I am troubled because it is like each thread was declaring the array my_list.

Can you explain me a little about that, Thanks.

The array is meant to be shared amongst all the threads in a block, and each element needs to be initialized only once, so you want to write something like:

global f()


shared int my_list[BLOCKSIZE];

my_list[threadIdx.x] = 0;


/* Do stuff with my_list */


It is true that each thread declares the array, but the declaration refers to the same array across the entire block. However, you will get a separate array for each block in the grid.


The size of

shared int my_list[100];

is statically set to be 100, regardless of the size of the block. Remember that you can call the same kernel with different grid/block sizes.

The problem with

my_list[threadIdx.x] = 0;

is that it works properly only if ‘100 == blockDim.x’ (and wastes cycles if ‘1 != blockDim.y’).

The size of

extern shared int my_array;

is determined dynamically by the third (optional, defaults to 0) kernel invocation parameter.

With ‘extern’ you specify the blockDim and the size of ‘my_array’ at same line of code.

This is true; if you are going to call a kernel with a size that is not fixed or not known at compile time, then you have to use a dynamic allocation. There are, however, a lot of problems for which you can use a fixed block size, which can simplify the layout of your shared memory.

Naturally, you would declare the size of the array to be compatible with the block size you were using, so this would not be a problem.

Well, sort of. The block dimensions and array size are both set at the kernel invocation, but you still have to reconstruct that information in the kernel, either from the block dimensions or from an explicit parameter to the kernel. Also, if you want more than a single array, you have to take care of laying out the data within the block yourself. Section of the Programming Guide has examples.

In the end, however, the strategy for initializing the block will still be the same. Each thread will compute an offset based on threadIdx and initialize that part of the array.