I’ve written a simple “game of life” program for cuda just to do some benchmarks and gain some experience. It calculates the next state of the game iterating 10 times on a 4Kx4K board. The program runs fine on my laptop and desktop (8400M GS & 8600 GTS). However, when I tried it on a friends 8800 GTX I got incorrect results. It seemed like the kernel function was not executed at all. I noticed that the 8800 GTX is a 1.0 compute capability device. However, I have not used any of the atomic arithmetic functions which are 1.1+ specific.
Do you have any idea what’s wrong? I have attached my program so you can test it if you like.
Normal output should be like this:

Game of life

4096x4096 matrix
5853405 total live points of 16777216
Total blocks 256
3.97 seconds total time
5416 total live points of 16777216

Warning: Correct execution is confirmed in the last line which should write the number “5416”.

Sorry, that was my mistake. I had left the “-arch=sm_11” option for nvcc which prevented the executable file to be able to run on 1.0 compute capability devices.