CUDA Fortran GPU giving different output

Hello,

I am trying to execute the FORTRAN interface to CUDA kernel program, (Fortran_Cuda.tgz) available from NVIDIA website.

Output is wrong

Results from Fortran
1 (1.000000,2.000000) (-3.000000,4.000000)
2 (2.000000,4.000000) (-12.00000,16.00000)
3 (3.000000,6.000000) (-27.00000,36.00000)
4 (4.000000,8.000000) (-48.00000,64.00000)
5 (5.000000,10.00000) (-75.00000,100.0000)
6 (6.000000,12.00000) (-108.0000,144.0000)
7 (7.000000,14.00000) (-147.0000,196.0000)
8 (8.000000,16.00000) (-192.0000,256.0000)
Results from CUDA
1 (1.000000,2.000000) (0.0000000E+00,0.0000000E+00)
2 (2.000000,4.000000) (0.0000000E+00,0.0000000E+00)
3 (3.000000,6.000000) (0.0000000E+00,0.0000000E+00)
4 (4.000000,8.000000) (0.0000000E+00,0.0000000E+00)
5 (5.000000,10.00000) (0.0000000E+00,0.0000000E+00)
6 (6.000000,12.00000) (0.0000000E+00,0.0000000E+00)
7 (7.000000,14.00000) (0.0000000E+00,0.0000000E+00)
8 (8.000000,16.00000) (0.0000000E+00,0.0000000E+00)

But when I compile using --device-emulation option, the output is correct

Results from Fortran
1 (1.000000,2.000000) (-3.000000,4.000000)
2 (2.000000,4.000000) (-12.00000,16.00000)
3 (3.000000,6.000000) (-27.00000,36.00000)
4 (4.000000,8.000000) (-48.00000,64.00000)
5 (5.000000,10.00000) (-75.00000,100.0000)
6 (6.000000,12.00000) (-108.0000,144.0000)
7 (7.000000,14.00000) (-147.0000,196.0000)
8 (8.000000,16.00000) (-192.0000,256.0000)
Results from CUDA
1 (1.000000,2.000000) (-3.000000,4.000000)
2 (2.000000,4.000000) (-12.00000,16.00000)
3 (3.000000,6.000000) (-27.00000,36.00000)
4 (4.000000,8.000000) (-48.00000,64.00000)
5 (5.000000,10.00000) (-75.00000,100.0000)
6 (6.000000,12.00000) (-108.0000,144.0000)
7 (7.000000,14.00000) (-147.0000,196.0000)
8 (8.000000,16.00000) (-192.0000,256.0000)

Could someone please let me know the possible reason on why this is happening ? Any solution to this ?

Thanks a lot.