I use a kernel to update, at each time steps, a 3D array (allocated as a 1D array) using a dimension 3 kernel. This works fine over many iterations.
Now to prevent having to copy data back to main memory, I started implementing a reduction kernel. For now, this reduction operation is a simple sum, but might be more complicated. After the reduction is done, the result is read back using clEnqueueReadBuffer() and printed.
The problem I face is that I often have hard lock-ups when running the program, which is a pain. The screen, including the cursor, is frozen. Even magic sysrq keys don’t work: I have to reboot using the power button.
I’m just in testing phase, so I put many clFinish() after the commands I use to make sure everything is fine.
Here is a snippet of the C++ code:
err = clEnqueueNDRangeKernel(command_queue, kernel, 3, NULL, workGroupSize, NULL, 0, NULL, &event); OpenCL_Test_Success(err, "clEnqueueNDRangeKernel"); err = clFinish(command_queue); OpenCL_Test_Success(err, "clFinish"); err = clEnqueueNDRangeKernel(command_queue, innerproduct_kernel, workGroupSize_1D, NULL, _Nz, NULL, 0, NULL, &event); OpenCL_Test_Success(err, "clEnqueueNDRangeKernel"); // err = clFinish(command_queue); // OpenCL_Test_Success(err, "clFinish"); err = clEnqueueReadBuffer(command_queue, cl_inner_product_result, CL_TRUE, 0, 1*sizeof(float), inner_product_result, 0, NULL, NULL); OpenCL_Test_Success(err, "clEnqueueReadBuffer");
Note that OpenCL_Test_Success() is a macro which compares err with CL_SUCCESS, and abort with a message.
The kernel does not do anything for now: it’s empty.
If I uncomment the clFinish() line, the lock up appears. I have also seen usage of clReleaseEvent(event) (in Adventures in OpenCL: Part 1, Getting Started) but I don’t understand the difference. Should I release any event?? If so, when should it be done? If not, why a clFinish() crashes the machine?
I’m on ArchLinux using the CUDA SDK v3.2 RC.
Thanks for your help.