scattered read causes kernel freeze kernel freezes when reading buffer in scattered way

Hi,

I’m encountering a strange problem with a very simple kernel on NVIDIA’s OpenCL SDK (OpenCL 1.0 CUDA 4.0.1 on GTX 580). The same program runs fine with AMD’s SDK - both on CPU and GPU.

This is the kernel

__kernel void test (__global float* u, __global float* v) {

  int i = get_global_id(0);

u[i*2] = v[i*2];

}

Basically the kernel copies only the even elements from buffer v to buffer u. Both buffers are created with size SIZE * 2 * sizeof(float) and the global work size for the kernel is SIZE, where SIZE is a constant, e.g. 2^20. When I run this kernel repeatedly it freezes after a random number of executions (generally after a few hundred executions).

Creating 2 * SIZE many work-items and then using only the even work-items works fine, i.e.

__kernel void test (__global float* u, __global float* v) {

  int i = get_global_id(0);

if (i % 2) return;

u[i] = v[i];

}

This problem occured to me a bigger kernel, but I’m using this simple example for demonstration purposes. Any ideas what’s going wrong here??

A reboot seems to have solved the problem for now… Hope it doesn’t occur again.