clCreateKernelsInProgram strangely returns CL_INVALID_KERNEL_DEFINITION


I encountered the same problem with

It is still not fixed in OpenCL with CUDA 8.0. I’ve created minimal working example here.

Basically, it happens when you

  1. create multi-gpu context
  2. create and build program for arbitrary ONE GPU, except GPU 0
  3. now clCreateKernelsInProgram returns error code -47 (which is CL_INVALID_KERNEL_DEFINITION).

It even happens when I just try to get the number of kernels in program like : clCreateKernelsInProgram(program, 0, NULL, &num_kernels_ret).
Strangely, clCreateKernel works very well.

I think it’s NVIDIA OpenCL implementation bug. Any thoughts?