While running nicely on GTX 260 with 192 cores CUDA SDK histogram256 SM10 and SM12 tests fail on GTX 260 with 216 cores (SM11 passes). Fail in this context means the GPU histogram results do not match the CPU ones. How to make SM10 and SM12 work with 216 cores? As a comment it would be nice if NVidia can keep SDK more generic.
BTW for some reasons on GTX 260 with 192 cores (where all 3 histogram256 tests SM10, SM11, and SM12 pass) it seems that SM11 (SM11 uses global atomic adds) is 1ms faster than SM12 (SM12 uses shared atomic adds). Any ides why?