Emulation issue Emulated kernel outputs sucessfully, not emulated does not

I’m running a kernel in emulation mode and the output data I get is consistent with the Matlab results I obtain. However, when I execute it on a Tesla C1060 device, I get awkward results. The arrays I’m working on are limited to a range [14768, 50767] and if run on Tesla, the output overflows that range. I’m guessing that when it runs on the device the kernel doesn’t write anything to the array memory space, neither on the initialization copy, neither the computation results it is supposed to.

Has anyone obtained such odd results?

I’m running CUDA 2.3 on Ubuntu 9.10 64-bit workstation with nvidia driver 190.32.