shared memory on 9600GT

Hi,

My 9600GT has 8 MultiProcessor,
If it has 16K Shared mem for each (Total 128K Sheread)
What is the definition for Bank Conflicts in that situation?

Thanks
Miki

Bank conflicts occur within a MP (within a WARP even). So it is just as written in the programming guide, the number of multiprocessors does not matter.

Tell me if I m wrong: each MP has at most 16K, since a Block run on only one MP then it has at most 16K .

But if my kernel allocate static shared of 17K , is it cause a problem? should I allocate at most 16K?(I ask that because I m using 9600GT that has total of 128K for 8 MP)

What is the diffrence between static and dynamic Shared mem? what and when its better to use each of them?

Thanks

Miki

All shared memory you declare and use in kernels (be it statically or dynamically) is per-block. In this code:

__global__ void kernel(...) {

int a = 5;

__shared__ float array[256];

}

int a is local to each thread (resides in a register) and array is local to the block. All threads in a block use the same array.

You don’t actually declare shared memory per kernel and you’re not concerned with the number of MPs on a particular GPU. Your application will be portable across different GPUs.

There’s statically and dynamically allocated shared memory. There’s no difference in performance.

Statically allocated shared memory is what you see in the code sample above - explicitly 256 elements in the array. Obviously you can’t write

__global__ void kernel(int n) {

__shared__ float array[n];

}

this doesn’t work even in Java :)

You also cannot have this in device code

__global__ void kernel(int n) {

__shared__ float array = malloc(n*sizeof float);

}

So, to achieve dynamic allocation (meaning variable size) you have to go by the programming guide, declare the shared array with the extern keyword and pass the size as a launch parameter. Every block in the grid will be launched with this extra shared memory.

Thanks
Miki