[4.0] compiling for cuda-gdb (-G) results in the correct result, while omitting -G does not

laxsu19 · June 13, 2011, 3:43pm

Hey all,
So I got my code, theoretically, up and running, but I just found out some odd behavior:
-the code gives me the correct result when I compile it with -G (-g is usually there, but is unimportant), but when I omit -G, the answer is around a factor of 2x as large as it should be. Has anyone experienced this?? What sit about adding -G that changes the answer?
I tried adding in some __syncthreads() and cudaThreadSynchronize() commands, thinking -G was doing that for me, but that did not change a thing.

And while I have your behavior, any thoughts on this: the code is completely written using double instead of float. To get it to run on sm 1.2 or below architectures, I figured nvcc’s automatic demotion of float to double would be just fine. Turns out it is not. The code only actually runs correctly when I run it on a sm1.3 card (didnt try a 2.0) -arch=sm_13.

Thanks

laxsu19 · June 13, 2011, 3:43pm

Hey all,
So I got my code, theoretically, up and running, but I just found out some odd behavior:
-the code gives me the correct result when I compile it with -G (-g is usually there, but is unimportant), but when I omit -G, the answer is around a factor of 2x as large as it should be. Has anyone experienced this?? What sit about adding -G that changes the answer?
I tried adding in some __syncthreads() and cudaThreadSynchronize() commands, thinking -G was doing that for me, but that did not change a thing.

And while I have your behavior, any thoughts on this: the code is completely written using double instead of float. To get it to run on sm 1.2 or below architectures, I figured nvcc’s automatic demotion of float to double would be just fine. Turns out it is not. The code only actually runs correctly when I run it on a sm1.3 card (didnt try a 2.0) -arch=sm_13.

Thanks

laxsu19 · June 14, 2011, 5:02pm

Hey all,

So I got my code, theoretically, up and running, but I just found out some odd behavior:

-the code gives me the correct result when I compile it with -G (-g is usually there, but is unimportant), but when I omit -G, the answer is around a factor of 2x as large as it should be. Has anyone experienced this?? What sit about adding -G that changes the answer?

I tried adding in some __syncthreads() and cudaThreadSynchronize() commands, thinking -G was doing that for me, but that did not change a thing.

And while I have your behavior, any thoughts on this: the code is completely written using double instead of float. To get it to run on sm 1.2 or below architectures, I figured nvcc’s automatic demotion of float to double would be just fine. Turns out it is not. The code only actually runs correctly when I run it on a sm1.3 card (didnt try a 2.0) -arch=sm_13.

Thanks

Disregard the second issue. Still dont know why i get different results between the two versions of -G and no -G though…

laxsu19 · June 14, 2011, 5:02pm

Hey all,

So I got my code, theoretically, up and running, but I just found out some odd behavior:

-the code gives me the correct result when I compile it with -G (-g is usually there, but is unimportant), but when I omit -G, the answer is around a factor of 2x as large as it should be. Has anyone experienced this?? What sit about adding -G that changes the answer?

I tried adding in some __syncthreads() and cudaThreadSynchronize() commands, thinking -G was doing that for me, but that did not change a thing.

And while I have your behavior, any thoughts on this: the code is completely written using double instead of float. To get it to run on sm 1.2 or below architectures, I figured nvcc’s automatic demotion of float to double would be just fine. Turns out it is not. The code only actually runs correctly when I run it on a sm1.3 card (didnt try a 2.0) -arch=sm_13.

Thanks

Disregard the second issue. Still dont know why i get different results between the two versions of -G and no -G though…