Does something speak against using an array in shared memory the length of which is greater than the block size? I have a working example where it would be convenient to have a shared memory array with length two times of the block size.
In that case you can definitely use shared memory. If the array is with twice the block size, your kernel code should contain the logic to copy 2 array values (instead of one - per thread) from global memory to the shared memory. Thats all. Also keep in mind that for the global memory access to be coalesced thread 0 should copy the 0th element and the (n/2)th element and thread 1 will copy the 1st element and the (n/2)+1th element and so on.
The only thing that we have to consider for shared memory usage is that, the kernel is having more than one global memory accesses to the same global memory location and buffer will get fit to the shared memory available per multiprocessor (16KB ).
Yes, the kernel logic takes care of it. Regarding coalesced writes, I have to take a look at!
Thanks to all who responded. Using shared memory array with “double-blocksize-size” makes my life much more easier. To those who are interested: the application is a digital noise filter (Block LMS algorithm), and the array length helps a lot in windowing the input data.