I think I’ve found a bug in the NV OpenCL compiler. The symthoms are: I have a double for loop, and after some operations (ifs, maths, etc) inside that loop, I write to a buffer both for control variables. Well, after reading back the buffer in the host application, the second for control variable is wrong (always the same) , but, if I use that variable in another expresion (for example: ++y; --y;) then the variable has its correct value. It seems like if the optimizer had removed that variable when it shouldn’t. Another sympthom, if I call clBuildProgram disabling the optimizations, then the results are as expected, correct.
I’m preparing a test application to show reproduce this bug, but it would be nice if someone could tell me if this is the right place to fill a bug report.
We’ve finished a test application that shows the problem. It is attached to this post.
The application performs the next steps:
Initializes OpenCL: here we put a macro to use CPU or GPU as platforms. In our systems, CPU has index 1, and GPU has index 0. If in your system is different, you only have to go to the function InitPlatform() and chose there any NVidia platform (of course NV is required to see the error)
Reads an raw image from disk, and setups three buffers (one for reading and two for writting)
Setups a kernel for execution. Here are two clBuildProgram lines (one is commented out) The one that has the optimzitions disabled is the one that makes the kernel running without problems, but if you use the one that doesn’t specify anything about optimizations, they will be used, and the problem will be present.
Executing the kernel. To see the error, you will have to inspect with the Visual Studio debugger the buffer g_outputBuffer once its read back. If clBuildProgram was used with optimizations disabled, you will see that every Y component of every structure element of the array has the smae value always. If you disable the optimizations, the value becomes correct.
Besides that, if you run the kernel with optimizations, BUT you comment out in the kernel the lines 97,98,99,100 the results are correct. Those lines are dummie and do not affect at all the final result, but in fact, they do.
So, in summary: disabling optimizations fixes the results always, but with optimizations, only with the lines 97-100 being compiled, the result is correct. Otherwise the Y component of the structures of the outputBuffer are incorrect (always the same value)
It would be nice to have some feedback about this issue.
TestReport.zip (542 KB)