Running Code in 9800 GT is different than in GTX 580? GPU Upgrade Problem

Hi all,

I recently upgraded my system from a Geforce 9800 Gt to a Geforce GTX 580. Apparently all was well and I installed version 4.1 of the toolkit and the SDK samples run just fine.

It turns out that when I run my own code, the behavior is completely anomalous. Apparently in some of the operations of my kernels the results are inconsistent and ultimately I get a completely unexpected result.

Is the code for these GPUs (GTX) different? Do I have to make a brand new one?

If anyone has had a similar problem or knows how to solve this one, Please help me.


Run you code under [font=“Courier New”]cuda-memcheck[/font] to make sure you have no stray memory accesses. Fermi GPUs detect more of them that would previously go unnoticed and subsequently fails the kernel execution. Do check return codes everywhere?

Check that all of your shared memory variables are either declared [font=“Courier New”]volatile[/font] or accesses are properly guarded by [font=“Courier New”]__syncthreads()[/font]. Because Fermi is a proper load-store architecture even with regard to shared memory accesses, it relies much more on this where improper code would still work with compute capability 1.x devices. Even Nvidia’s original SDK examples were sloppy with regard to this.

There’s also the Fermi Compatibility Guide but I don’t think it contains much more info than this.

Thanks Tera, your advice has been very useful for me.

Also you should have an error checking mode where you check the result of every cuda* call and cudaDeviceSynchronize() after every kernel call. Obviosly, the sync will slow things down, but enable it as a debug check.

It is possible that one of your kernels is failing to run at all due to too big a block size.