same program got different results on GT240 and GTX465, weird ! GTX465 and GT240 / G210

My CUDA program got the same results on computers with GeForce G210 and GT 240, but on Computers with GeForce GTX 465 and Quadro 380, the results were totally different with that of G210 and GT240.

The CUDA program (a *.cu and a **kernel.cu file ) is exactly the same one.

Did you guys met this problem before ? Any help would be highly appreciated !

I used CUDA ToolKit 3.0 and SDK 3.0, Visual Studio 2008 SP 1.

By the way, on GPU GTX 465 in my main program the statement : cutilSafeCall( cudaMemcpy( Error1, Tapehits, mem_size,cudaMemcpyDeviceToHost) );
will make all the breakpoints in the main program ignored , the program will exit directly and showed this:
First-chance exception at 0x000007fefd4daa7d in CUDATapeHit.exe: Microsoft C++ exception: cudaError_enum at memory location 0x001efa30…
First-chance exception at 0x000007fefd4daa7d in CUDATapeHit.exe: Microsoft C++ exception: cudaError_enum at memory location 0x001ef9e0…
The program ‘[3904] CUDATapeHit.exe: Native’ has exited with code -1 (0xffffffff).

But if I delete cutilSafeCall it will pause at breakpoints.

I tried CUDA 3.1 and used CUDA Build Rule v3.0.14, then run my program on computer with GeForce GTX465. The results made sense, and I thought the problem was solved. 2 days ago, I could only got all 0s in the result.

However, when I compared the results from GTX 465 and from GT 240, I found that they were quite different.

werid !!

Again, I met the problem of cudaMemcpy on GTX 465.
This statement ( in main program) :
cutilSafeCall( cudaMemcpy( Result2, DeInterlace, mem_size,cudaMemcpyDeviceToHost) );
will make all the breakpoints ignored and the program exits directly from here.
If I take away "cutilSafeCall " it won’t exit directly here but the result is ridiculous (all 0).

I think there must be something wrong with the cudaMemcpy. Any help would be highly appreciately !

I tried CUDA 3.1 and used CUDA Build Rule v3.0.14, then run my program on computer with GeForce GTX465. The results made sense, and I thought the problem was solved. 2 days ago, I could only got all 0s in the result.

However, when I compared the results from GTX 465 and from GT 240, I found that they were quite different.

werid !!

Again, I met the problem of cudaMemcpy on GTX 465.
This statement ( in main program) :
cutilSafeCall( cudaMemcpy( Result2, DeInterlace, mem_size,cudaMemcpyDeviceToHost) );
will make all the breakpoints ignored and the program exits directly from here.
If I take away "cutilSafeCall " it won’t exit directly here but the result is ridiculous (all 0).

I think there must be something wrong with the cudaMemcpy. Any help would be highly appreciately !

I don’t know what cutilSafeCall is supposed to do on Windows when there is an error, but the return code from cudaMemcpy passed to cudaGetErrorString() should give you some idea what the actual problem is. Most likely is that Result2 is not a host pointer, or DeInterlace is not a device pointer.

I don’t know what cutilSafeCall is supposed to do on Windows when there is an error, but the return code from cudaMemcpy passed to cudaGetErrorString() should give you some idea what the actual problem is. Most likely is that Result2 is not a host pointer, or DeInterlace is not a device pointer.

I would think you are not synching threads properly. It is the only thing that makes sense without looking at the code.

I would think you are not synching threads properly. It is the only thing that makes sense without looking at the code.