When I tried to use CUDA release 4.1 (default compiler, llvm one) to build and execute my program, the output is not as same as the one I can get by using open64 compiler of 4.1 version or earlier.
I tried my best to strip my large program and got the simple program as attached. It might seems making non-sense, but its result shows difference between using open64 and llvm nvcc compiler.
If I try to simplify it a little bit more, the output difference will disappear.
Basically, this little example does some calculation and fill an array in global memory. Only uses 1 block, 1 thread.
I hope I could get kindly help from you or compiler expertise to let me know what is the exact root cause of this problem; it will help me to fix the the problem of my project.
The platform info is as following:
[b]OS: CentOS release 5.5
CPU: Intel Xeon X5660
CUDA toolkit: release 4.1[/b]
I Appreciate your attention deeply.