histogram256 unspecified launch failure

I’m finding that the histogram256 code in the SDK tends to cause an “unspecified launch failure” in the histogram256Kernel if the data all tends to fall in the same few bins. This is easy to replicate by simply replacing the random initialization of data with setting it to a constant value. This is happening on a GPU with compute capability 1.0 (no shared memory atomics). Often this error points to an out of bounds memory access but I’ve run it through valgrind using device emulation with no errors. Anyone else observed this or have ideas on what might be wrong?