I am seeing very strange behaviour in my kernel, that seems to be triggered by calling a built in function (in this case ‘sin’) from my code. I have narrowed it down to a very simple test case (just a hacked-up version of the VectorAdd sample)… I just have a simple kernel that creates a local structure writes a couple of values to it, then reads them back into the output data buffer:
typedef struct _MyStruct {
float a;
float b;
float c;
float d;
float4 vecVal;
float4 anotherVecVal;
} MyStruct;
__kernel void VectorAdd( __global const float* a, __global const float* b, __global float* c, int iNumElements)
{
MyStruct structData;
structData.vecVal = (0.0f);
structData.d = 0.5f;
c[0] = structData.vecVal.x;
c[1] = structData.vecVal.y;
c[2] = structData.vecVal.z;
c[3] = structData.vecVal.w;
}
This code works fine, the values that are written into the buffer are all zero. I can actually print the addresses of the various local variables (by returning them in the output buffer and inspecting them on the CPU) and they look fine (the float4 is aligned on 16-byte binary, which I’m assuming is important).
&structData.d=12
&structData.vecVal=16
&structData.anotherVecVal=32
&structData=0
However if I add a simple sin statement to my kernel bad things happen:
MyStruct structData;
structData.vecVal = (0.0f);
structData.d = 0.5f;
float sinResult = sin(3.141592653589f);
Random garbage ends up in structData.vecVal, and now if I look at my addresses everything has been shifted by 56 bytes and the float4s are no longer aligned:
&structData.d=68
&structData.vecVal=72
&structData.anotherVecVal=88
&structData=56
I’m guessing using the built-ins adds some local constant and such to the local memory, but as these are not 16-byte aligned they can cause trouble with other stuff in local mem ? Or maybe something more complicated is happening I’m not grasping. I am using the latest (non-beta) SDK and drivers on 32-bit windows XP, with a Quadro Plex 5800.
Any ideas ? Is there a bug database I can submit this kind of thing to ? I’ve attached my test case (should be able unzip into OpenCL\src\oclVectorAdd and run).
Thanks
Gareth Morgan
AlignmentError.zip (4.13 KB)