Histogram 256 SDK Bug? error when increasing blocks


Can anyone that’s used the Histogram 256 example verify if increasing the number of blocks in the grid causes the program to crash on the histogram256Kernel?

I’ve increased BLOCK_N from 64 to 1024 and seen it die with a ULF. I just want to make sure it’s not just me - I’m using CUDA 2.0b on the 8800GT in Windows.

Is there a reason why BLOCK_N = 64 was chosen / is there a reason why BLOCK_N = 1024 would die?



Because the second finalizing kernel has blockDim.x equal to BLOCK_N.


Been some time since I checked the histogram_256 code (and I am not at my Cuda machine currently), but if I remember right, you could turn on Atomics (if not already). [You will also have to add the -sm11 option to the nvcc compile command]
With atomics on, you dont hit the second “merge” kernel.