opencl uchar16 thresholding kernel does not complete

Hi to everyone,

I coded some kernel to implement thresholding, using uchar, uchar2, until uchar16.

The last one does not work correctly

__kernel void ThresholdingOnGlobalMemory16D(__global __read_only uchar16 * restrict input,

											__global __write_only uchar16 * restrict output,

											uint size, uchar t)


	uint i = get_global_id(0);


	if (i>size) return;

	uchar16 pixel = input[i];


	output[i] = select(min16d,max16d,pixel>t);


and this is the benchmark I get:

The huge value of the highlighted row is due to the fact that startTime and endTime (ulongs I pass to clGetEventProfilingInfo) are respectively initialized with a long number and 0.

The output image is empty. I don’t know what it’s wrong, since the others vectorial kernels work (like, uchar8).

The input image is a grayscale array, and I use uchar16 tu take 16 bytes a time.

I have an AMD cpu and the kernel works fine.

How can I benchmark the problem? I do not have any idea how to profile/debug nvidia card. Any suggestion is appreciated.

thank you in advance