clCreateKernelsInProgram strangely returns CL_INVALID_KERNEL_DEFINITION

Hello,

I encountered the same problem with
https://devtalk.nvidia.com/default/topic/468762/clcreatekernelsinprogram-returns-cl_invalid_kernel_definition/
https://forums.khronos.org/showthread.php/7347-Nvidia-bug-in-clCreateKernelsInProgram

It is still not fixed in OpenCL with CUDA 8.0. I’ve created minimal working example here. https://github.com/csehydrogen/clCreateKernelsInProgram_BUG

Basically, it happens when you

  1. create multi-gpu context
  2. create and build program for arbitrary ONE GPU, except GPU 0
  3. now clCreateKernelsInProgram returns error code -47 (which is CL_INVALID_KERNEL_DEFINITION).

It even happens when I just try to get the number of kernels in program like : clCreateKernelsInProgram(program, 0, NULL, &num_kernels_ret).
Strangely, clCreateKernel works very well.

I think it’s NVIDIA OpenCL implementation bug. Any thoughts?