CL_INVALID_COMMAND_QUEUE on clFinish on second run in profiler

I’m getting a strange behavior and was wondering if anyone has any ideas.

I’m trying to profile an OpenCL application with the NVIDIA profiler under windows 7, driver 280.13, CUDA toolkit 4.0.17, optimus laptop with Quadro 2000m

After calling the kernel (clEnqueueNDRange…), I’m calling clFinish (originally for collecting profiling information). Running the application by itself works fine. Running it via the profiler works for the first run, but on the second run clFinish returns CL_INVALID_COMMAND_QUEUE.

Any ideas?

Thanks

I have the same problem, but I get the error with my only call to enqueueReadBuffer in the second profiling run.

I have a reproducer. I’m doing the offsetCopy kernel in the Nvidia OpenCL best practices guide, shown below. If offset = 0 I get the error described above. If offset = 1, then the profiler does all of the runs and completes fine !?!?!?!

__kernel void test_kernel(__global real_t * odata,

								  __global real_t * idata) {

	// offset copy, as found in nvidia opencl best practices guide

	int offset = 1;

	int xid = get_global_id(0) + offset;

	odata[xid] = idata[xid];

	//odata[xid] = (real_t)xid;

}

I think this should probably be filed to Nvidia as a bug. I’ll be interested to see if anyone can reproduce it.

After several hours of hunting around I couldn’t figure out why my simple kernel would not run under the profiler. It finally occurred to me to see if the profile counters had any impact on whether the kernel failed on the 2nd run or not. Lo and behold, I identified two profile counters that cause the the profiler to terminate on the second run. So, if I have all counters and options selected, EXCEPT the following two, the profiling runs just fine.

gld instructions 32bit

gst instructions 32bit

Again, this is OpenCL with Visual Profiler 4.0.10 running the kernel below. If I select either of the profile counters above the profiler fails on the 2nd run of the kernel. HOWEVER, if I change size_t offset = 1, then I can run the profiler with those profile counters selected.

__kernel void test_kernel(__global real_t * odata,

			  __global real_t * idata) {

	// offset copy, as found in nvidia opencl best practices guide

	size_t offset = 0;

	size_t xid = get_global_id(0) + offset;

	odata[xid] = idata[xid];

}