Problem with CUDA release 4.1, using default LLVM compiler

HI all,

When I tried to use CUDA release 4.1 (default compiler, llvm one) to build and execute my program, the output is not as same as the one I can get by using open64 compiler of 4.1 version or earlier.

I tried my best to strip my large program and got the simple program as attached. It might seems making non-sense, but its result shows difference between using open64 and llvm nvcc compiler.

If I try to simplify it a little bit more, the output difference will disappear.

Basically, this little example does some calculation and fill an array in global memory. Only uses 1 block, 1 thread.

I hope I could get kindly help from you or compiler expertise to let me know what is the exact root cause of this problem; it will help me to fix the the problem of my project.

The platform info is as following:

[b]OS: CentOS release 5.5

CPU: Intel Xeon X5660

GPU: M2070

CUDA toolkit: release 4.1[/b]

I Appreciate your attention deeply.

Best Regards, External Image
Susan