I have a problem with a CUDA program.
When i compiled it with -arch=sm_13 and run it on a compute capability 2.0 GPU it works fine and give me the good result (i know it to be correct).
But when i compiled it with -arch=sm_20 and run it on the same GPU it works but give me wrong results. I have not change the program between the two compilations.
From where the problem can come? What are the differences between theses two versions that can cause problems.