Hi,
I’m encountering a strange problem with a very simple kernel on NVIDIA’s OpenCL SDK (OpenCL 1.0 CUDA 4.0.1 on GTX 580). The same program runs fine with AMD’s SDK - both on CPU and GPU.
This is the kernel
__kernel void test (__global float* u, __global float* v) {
int i = get_global_id(0);
u[i*2] = v[i*2];
}
Basically the kernel copies only the even elements from buffer v to buffer u. Both buffers are created with size SIZE * 2 * sizeof(float) and the global work size for the kernel is SIZE, where SIZE is a constant, e.g. 2^20. When I run this kernel repeatedly it freezes after a random number of executions (generally after a few hundred executions).
Creating 2 * SIZE many work-items and then using only the even work-items works fine, i.e.
__kernel void test (__global float* u, __global float* v) {
int i = get_global_id(0);
if (i % 2) return;
u[i] = v[i];
}
This problem occured to me a bigger kernel, but I’m using this simple example for demonstration purposes. Any ideas what’s going wrong here??