Different results with and without emulation mode

I wrote a program using cuda function from an existing code.
Unfortunately the data result using cuda is wrong. Now I have to understand why.
There are no cuda error.
It could be a programming error, but the strange thing is that, if I try to compile the cuda code with flag “-deviceemu”, the result is correct.
Now, I’d like to know if this difference of result due to emulation flag can be possibile or not.

The os is ubuntu 9.10 64 bit with nvidia driver 190.53 and cuda toolkit 2.3.

It’s likely you have a race condition. In most cases I can remember when people have seen correct results in emu and incorrect on GPU, the error was a race somewhere - this doesn’t usually manifest in emulation mode because the code is executed sequentially.

I think you’re right: within the kernel I read the same memory allocated for some parameters.

How can I access (in read mode) to the device memory from kernels without problems?

Reading from a single address is not a race condition. Writing to a single address is.

mhm, I write the result using thread id as index, so I don’t think writing is a problem

other suggestions?

How about posting some code, or at least a complete description of what is going wrong. Playing “20 Questions” isn’t likely to get you a lot of constructive help with your problem.

I know, I know.
I’m trying to rewrite with cuda (at home) a code that I wrote at work because I want to see if cuda can be useful for our kind of processing program.
Unfortunately I think that I cannot post that code (it’s for this reason that also in other posts I gave you just an example)

Anyway, maybe I found the problem: some parameters are in double and before I didn’t compile with flag “-arch sm_13” so within the kernel there were a lot of wrong comparisons.
Now it seems to work properly. I hope there won’t be other problems.