discrepancy between results from emulation mode and actual device mode

I got different result. What are the possible reasons that lead to the discrepancy? Thanks,

If you compile without -arch sm_13 your doubles will be demoted to floats which could lead to a discrepancy.Also behaviour in emu mode can be very different from device mode. Since the emu mode runs entirely in serial fashion it’s anybody’s guess what the program does when it starts running in parallel.