This kernel works on Linux and Windows GPU and Mac Snow Leopard CPU, but fails on Mac Snow Leopard and Lion GPU:
#define InPixelType unsigned char;
#define OutPixelType unsigned char;
__kernel void BinaryThresholdFilter(InPixelType lowerThreshold, InPixelType upperThreshold,
OutPixelType insideValue, OutPixelType outsideValue,
__global const InPixelType* in, __global OutPixelType* out,
int width, int height, int depth)
{
int gix = get_global_id(0);
int giy = get_global_id(1);
int giz = get_global_id(2);
/* NOTE: More than three-level nested conditional statements (e.g.,
if A && B && C..) invalidates command queue during kernel
execution on Apple OpenCL 1.0 (such Macbook Pro with NVIDIA 9600M
GT). Therefore, we flattened conditional statements. */
bool isValid = true;
if(gix < 0 || gix >= width) isValid = false;
if(giy < 0 || giy >= height) isValid = false;
if(giz < 0 || giz >= depth) isValid = false;
if( isValid )
{
unsigned int gidx = width*(giz*height + giy) + gix;
if ( lowerThreshold <= in[gidx] && in[gidx] <= upperThreshold )
{
out[gidx] = insideValue;
}
else
{
out[gidx] = outsideValue;
}
}
}
When it fails, the output is all zero (the outsideValue) instead of an image consisting of a mix of insideValue and outsideValue.
I have verified that if I change the type of lowerThreshold, upperThreshold, insideValue, and outsideValue to unsigned short instead of unsigned char, the kernel works.
Is this a bug in the Mac drivers, or is this undefined behavior according to the standard?
I find the opencl 1.0 standard to be a little confusing on this point:
6.1.5 Alignment of Types
A data item declared to be a data type in memory is always aligned to the size of the data type in bytes. For example, a float4 variable will be aligned to a 16-byte boundary, a char2 variable will be aligned to a 2-byte boundary.
A built-in data type that is not a power of two bytes in size must be aligned to the next larger power of two. This rule applies to built-in types only, not structs or unions.
The OpenCL compiler is responsible for aligning data items to the appropriate alignment as required by the data type. The behavior of a direct unaligned load/store is considered to be undefined, except for the vector data load and store functions defined in section 6.11.7. These vector load and store functions allow you to read and write vectors types from addresses aligned to the size of the vector type or the size of a scalar element of the vector type.
So I guess my question is: should I avoid all 8-bit built-in data types as arguments to opencl kernels?