Consider the following code (a result of reducing to minimal code reproducing the bug) (see attachment, the forum seem to have some problem with codeboxes).
$ make clean; make; release/wigner
grid 4 block 128
grid 2 block 256
grid 1 block 512
So, for blocks of size 256 and lower, result is correct, but for block size 512 it is not. And if I comment initialize(), or remove sine or cosine from calculate(), bug will disappear and results will be the same for all three block sizes.
Has anyone encountered such problem? And could anyone please try to build and run this code?
I am using Cuda driver 2.3.1a, Cuda toolkit 2.3a, and SDK 2.3a
MBpro with OSX 10.6.1, reproduces on both 9400 and 9600 video cards.
wigner.cu (1.14 KB)