OpenCL program freezes when high number of kernels are launched within a loop

Hi,

I have a loop (about 1 billion iterations) that launches OpenCL kernels. Each kernel is executed by 1 thread, and performs a very trivial operation. The problem is that after the execution of few millions iterations the code freezes (stops) and the program does not terminate at all. It freezes in the call to clFinish(). The program does not always freeze in the same iteration.

The problem disappears if clFinish() is called once every 1000 iterations instead of being called in every iteration, so I have the feeling like the problem is that clFinish() is waiting for the end of the kernel but the kernl is killed (somehow) before clFinish() is called. Note also that when I insert many printf() calls inside the loop the problem disappears!

I get the problem when I execute the program on CPU device (on my laptop, I use AMD SDK), and I get the problem also on a machine with Nvidia Fermi GPU (Nvidia SDK and drivers, AMD SDK is also installed on that machine).

I’m checking for errors after each OpenCL API call but no error is detected.

My questions:

  • Is their any incorrect use of the OpenCL API below ?
  • Is their any problem if a huge number of OpenCL kernels are launched simultaneously ?

Host code:

/* OpenCL initialization.  */
   /* ... */
    cl_mem dev_acc = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(double), NULL, &err);
    
    for (int h0 = 1; h0 <= ni; h0 += 1)
      for (int h2 = 0; h2 < nj; h2 += 1)
        for (int h5 = 0; h5 < h2 - 1; h5 += 1)
          {
	      size_t global_work_size[1] = {1};
	      size_t block_size[1] = {1};
	      cl_kernel kernel2 = clCreateKernel(program, "kernel2", &err);
	      clSetKernelArg(kernel2, 0, sizeof(cl_mem), (void *) &dev_acc);
              clEnqueueNDRangeKernel(queue, kernel2, 1, NULL, global_work_size,block_size,0, NULL, NULL);
              clFinish(queue);
	      clReleaseKernel(kernel2);
           }

Kernel code:

__kernel void kernel2(__global double *acc)
{
      *acc = 1;
}

The full program (including the initilization code) is attached.

Technical information:

Compilation:
gcc -O3 -lm -std=gnu99 polybench.c ocl_utilities.c symm_host.c -lOpenCL -lm -I/opt/AMDAPP/include -L/opt/AMDAPP/lib/x86_64

Ubuntu 12.04, Kernel 3.2.0-29-generic, X86_64,
RAM: 2 GB

Any comment about this problem ?