I’m currently trying to emulate the OpenCL oclCopyComputeOverlap example.
I was able to execute the example and observed a 50% speedup.
My rendition is mainly for me to learn how to execute a Dual Queue system.
The hardware that I’m running this on is a i7-4610M and a K5100M.
To start off, I have a relatively simplistic kernel that is meant to eat up time, similar to the oclCopyComputeOverlap example. I’m doing simple float division and for-looping that a bunch of times.
I transfer 2 input buffers and extract 1 output buffer.
My issue is with regards to my first clEnqueueReadBuffer. It is not obtaining the correct answer.
The output buffer sometimes has the correct output, the first input data for the second kernel, or the output buffer from the second kernel. It can also have the correct output until it switches to one of the other two I listed above. The moment that it can switch isn’t constant.
This seems to be a memory access issue but I do not know where it is originating from. I checked inside the kernel and it is placing the correct answer into the output pointer.
If you can help me out I would greatly appreciate it. If needed, I can provide some/all of my code.
I have solved this issue.