Your kernel performs out-of-bounds array accesses because you start it with too many threads (you only need 10 threads, not [font=“Courier New”]10*sizeof(int)[/font]). The expression for the grid size also looks quite fragile to me. The common way to express this without any use of floating point arithmetics is font=“Courier New”[/font]. Furthermore your code will fail if the total number of threads is not an integer multiple of the blocksize, because the additional threads from rounding up the block number would also perform out-of-bounds array accesses. This can be prevented by explicitly disabling unneeded threads inside the kernel.