OpenCL bug: kernel miscompilation

Hi,

We have identified and reduced an OpenCL issue that we can reproduce on
NVIDIA k20 and k40 with Linux driver version 375.39. We cannot try on
latest drivers as these machines are part of a cluster where we are not
admins.

We have reduced the issue to a minimal example with a simple OpenCL
kernel operating on a float array of a single element. The kernel should
set this float element to zero, but we observe that it is not modified
at all. When we try this kernel on other OpenCL platforms
(e.g. OCLgrind), it behaves as expected, what confirms that it’s likely
a NVIDIA driver issue.

The OpenCL kernel is, in it entirety:

__kernel void kernel1(__global float *a) {
  int g = get_group_id(1),
      i = get_local_id(1);
  for (int j = 0; j <= g; j++)
    a[2 * j] = 0.0f;
}

The detailed bug report, including reproduction material and platform details, is available at this URL:

Please let us know if you can reproduce on the latest drivers.

Thanks,
Hugues Evrard, Guangyu Hu, Alastair Donaldson
Imperial College London, Multicore Programming Group