Hard lockup when calling clFinish() Am I doing the right thing?

I use a kernel to update, at each time steps, a 3D array (allocated as a 1D array) using a dimension 3 kernel. This works fine over many iterations.

Now to prevent having to copy data back to main memory, I started implementing a reduction kernel. For now, this reduction operation is a simple sum, but might be more complicated. After the reduction is done, the result is read back using clEnqueueReadBuffer() and printed.

The problem I face is that I often have hard lock-ups when running the program, which is a pain. The screen, including the cursor, is frozen. Even magic sysrq keys don’t work: I have to reboot using the power button.

I’m just in testing phase, so I put many clFinish() after the commands I use to make sure everything is fine.

Here is a snippet of the C++ code:

err = clEnqueueNDRangeKernel(command_queue, kernel, 3, NULL, workGroupSize, NULL, 0, NULL, &event);

OpenCL_Test_Success(err, "clEnqueueNDRangeKernel");

err = clFinish(command_queue);

OpenCL_Test_Success(err, "clFinish");

err = clEnqueueNDRangeKernel(command_queue, innerproduct_kernel, workGroupSize_1D, NULL, _Nz, NULL, 0, NULL, &event);

OpenCL_Test_Success(err, "clEnqueueNDRangeKernel");

// err = clFinish(command_queue);

// OpenCL_Test_Success(err, "clFinish");

err = clEnqueueReadBuffer(command_queue, cl_inner_product_result, CL_TRUE, 0, 1*sizeof(float), inner_product_result, 0, NULL, NULL);

OpenCL_Test_Success(err, "clEnqueueReadBuffer");

Note that OpenCL_Test_Success() is a macro which compares err with CL_SUCCESS, and abort with a message.

The kernel does not do anything for now: it’s empty.

If I uncomment the clFinish() line, the lock up appears. I have also seen usage of clReleaseEvent(event) (in Adventures in OpenCL: Part 1, Getting Started) but I don’t understand the difference. Should I release any event?? If so, when should it be done? If not, why a clFinish() crashes the machine?

I’m on ArchLinux using the CUDA SDK v3.2 RC.

Thanks for your help.

I use a kernel to update, at each time steps, a 3D array (allocated as a 1D array) using a dimension 3 kernel. This works fine over many iterations.

Now to prevent having to copy data back to main memory, I started implementing a reduction kernel. For now, this reduction operation is a simple sum, but might be more complicated. After the reduction is done, the result is read back using clEnqueueReadBuffer() and printed.

The problem I face is that I often have hard lock-ups when running the program, which is a pain. The screen, including the cursor, is frozen. Even magic sysrq keys don’t work: I have to reboot using the power button.

I’m just in testing phase, so I put many clFinish() after the commands I use to make sure everything is fine.

Here is a snippet of the C++ code:

err = clEnqueueNDRangeKernel(command_queue, kernel, 3, NULL, workGroupSize, NULL, 0, NULL, &event);

OpenCL_Test_Success(err, "clEnqueueNDRangeKernel");

err = clFinish(command_queue);

OpenCL_Test_Success(err, "clFinish");

err = clEnqueueNDRangeKernel(command_queue, innerproduct_kernel, workGroupSize_1D, NULL, _Nz, NULL, 0, NULL, &event);

OpenCL_Test_Success(err, "clEnqueueNDRangeKernel");

// err = clFinish(command_queue);

// OpenCL_Test_Success(err, "clFinish");

err = clEnqueueReadBuffer(command_queue, cl_inner_product_result, CL_TRUE, 0, 1*sizeof(float), inner_product_result, 0, NULL, NULL);

OpenCL_Test_Success(err, "clEnqueueReadBuffer");

Note that OpenCL_Test_Success() is a macro which compares err with CL_SUCCESS, and abort with a message.

The kernel does not do anything for now: it’s empty.

If I uncomment the clFinish() line, the lock up appears. I have also seen usage of clReleaseEvent(event) (in Adventures in OpenCL: Part 1, Getting Started) but I don’t understand the difference. Should I release any event?? If so, when should it be done? If not, why a clFinish() crashes the machine?

I’m on ArchLinux using the CUDA SDK v3.2 RC.

Thanks for your help.

Hum… it’s fixed.

The problem was “workGroupSize_1D” was baddly set, resulting in the crash…

Hum… it’s fixed.

The problem was “workGroupSize_1D” was baddly set, resulting in the crash…