I’m working on a cryptocurrency mining implementation in OpenCL and having trouble getting it to play nice with the Nvidia OpenCL driver. The problem is that the NVIDIA driver waits for kernels to finish using spinlocks. When you have 6-20 GPUs hanging off a single dual core CPU, this causes the entire system to grind to a halt.
I’ve tried many solutions, but they all seem to break on windows, while working nicely on linux:
a) Time the kernels and sleep the thread for ~95% of the time it takes to execute, to reduce spinlock to 5% of total kernel time. This fails because putting the thread to sleep immediately after enqueuing the kernel causes it to not start executing until after the sleep call. I then tried adding a clFlush() call to ensure the kernel would start, but the clFlush() call seems to block until the kernel completes, negating the entire purpose.
b) Use a non-blocking readbuffer call, followed by polling the event status and sleeping in a loop. This fails because the clEnqueueReadBuffer() call seems to completely ignore the boolean block flag, always waiting for the kernel to be done before continuing.
Ideas?