I have a small piece of code that gives good results on a TESLA C870 device on a 32 bit Linux machines. When taking the exact same code and compiling and running it on a TESLA S1070 device on a 64 bits Linux machines, results are now wrong.
Has anyone already faced this kind of problem ?
Thanks in advance for helping me solve this issue.
I experienced the same issue when moving from a Tesla C870 to a GTX280, so same architecture change as you. In my case the wrong results also come with a random behaviour, so I also suspect as Mr Nuke that this comes from a synchronization issue.
However I dont understand why a sync issue may happen in 1.3 device and not in 1.0 device. As far as I know they have same warp size, so where does it come from?
I’m having the exact same problem. Only that my program works fine on a GeForce 8600 GT and gives wrong results on a GeForce GTX 280 (also with random behaviour). I try to place __syncthreads after every kernel instruction but still didn’t work. Any ideas?
The issue may be deeper than I thought. I can’t guarantee I’ll find a solution, as I don’t have a GT200 to test on, but I’ll gladly have a look at your kernel code if you can post it.
I found the problem in my kernel. Mr Nuke was right - it was a syncronization problem. But it could not be solve by __syncthreads.
My problem was that I had different blocks changing the same memory zone. Hope this can help find your problems.
As I said, I don’t have a GT200 to test on, but I did glance over the code. If the size of the matrix is a multiple of 32x32, then the following might not be the problem:
The memory seems to be unpadded to match the warp size. Let’s say that you have a 31x31 matrix (31 rows, 31 columns). Thread [31] of the first warp should read and process row[0], column[31], but because of the arrangement, it will process row[1] column[0]. My suggestion is to make sure the memory is properly padded, or for a quick test, see if matrices multiple of 32x32 produce correct results.
I’ll be looking to see if I can find anything else that’s wrong.