I am using OpenCL dev software of Nvidia on GTX550ti graphics card, and encounter a strange problem. (I am freshman for OpenCL).
My kernel code is like this:
__kernel void kernel_name(…)
{
size_t d = get_local_id(0);
char abc[8];
…
}
Actually, the “char abc[8]” is useless (dead code) for my case. But, if I have the “char abc[8]” in my kernel code, the result will be totally messy and the running time of kernel will be much longer (2095712 ns). If I comment out the “char abs[8]”, the result becomes correct, and the running time of kernel becomes shorter (697856 ns). The compiler of kernel won’t wipe off the dead code?
The above is just an explicit example that I can repeat. I also encounter more stranger case that one program gets different result when run at different time in totally the same environment.
Is that related to memory allocation or…? Anyone can give me some advices on how to find the problem?
By the way, oclDeviceQuery output information is listed as follows: Platform Version = OpenCL 1.1 CUDA 4.2.1, SDK Revision = 7027912
My OS is Windows XP.
Thank you.