CUDA programs not working on Fermi!

So I just put in a Fermi and went to run some old CUDA programs I’ve been working on to see how much faster they are.

Much to my dismay, they don’t even work! I’ve gone through the code and am scratching my head about why some things that used to work aren’t happening anymore!

Is there some setting I can play with to force the Fermi into compatibility mode so that all my old stuff isn’t broken anymore?

Is it just not loading the kernels, or is it crashing with unspecified launch failure? If it’s the former, recompile with -arch sm_20; if it’s the latter, you probably have an out of bounds shared memory access.

It’s running the Kernels, but they behave very strangely. For example, this

if (tid < 2) {

            e[tid].one_over_delta_y = 1.0f/e[tid].delta[Y];

        }

        __syncthreads();

Edit: I just learned that 1.0f/some_float == 0, whereas 1.0/some_float == the correct value … I’m going to try to update my toolkit …

doesn’t work on Fermi. e[tid].one_over_delta_y is always 0, whereas on a GT200 it works fine. (i.e. when I replace “x * e[0].one_over_delta_y” with “x / e[0].delta[Y]” that particular part magically starts working again.

I suppose it could be a shared memory thing, but other things have stopped working as well, like

if ((val & 0xff0000) != (127<<16))

quit working until I replaced it with

(((val >> 16) & 0xff) != 127)

Maybe I just have bad compile settings?

Well I just updated updated to the latest toolkit (3.1) and the problems persist…

1.0f/some_float == ((float)1.0)/some_float == 0,
… BUT 1.0/some_float == the_correct_value

When val == 0x??7f???,

(val & 0xff0000) != (127<<16) is TRUE
… BUT ((val >> 16) & 0xff) != 127 is FALSE

These problems only exist in the Fermi I just put in.

Here’s my nvcc command:

“C:\CUDA\bin\nvcc.exe” -ccbin “c:\Program Files\Microsoft Visual Studio 9.0\VC\bin” -I"C:\CUDA\include" -I"./" -I"…/…/common/inc" -I"…/…/…/shared/inc" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -maxrregcount=32 -gencode=arch=compute_10,code="sm_10,compute_10" -gencode=arch=compute_20,code="sm_20,compute_20" --compile -o “my_file.cu.obj” “my_file.cu”

Is it possible that I have something wrong here, or are these some very serous bugs?

Can you post minimal testcase example(for example even type of val is not posted)?