Hi to everyone,
I coded some kernel to implement thresholding, using uchar, uchar2, until uchar16.
The last one does not work correctly
__kernel void ThresholdingOnGlobalMemory16D(__global __read_only uchar16 * restrict input,
__global __write_only uchar16 * restrict output,
uint size, uchar t)
{
uint i = get_global_id(0);
if (i>size) return;
uchar16 pixel = input[i];
output[i] = select(min16d,max16d,pixel>t);
}
and this is the benchmark I get:
The huge value of the highlighted row is due to the fact that startTime and endTime (ulongs I pass to clGetEventProfilingInfo) are respectively initialized with a long number and 0.
The output image is empty. I don’t know what it’s wrong, since the others vectorial kernels work (like, uchar8).
The input image is a grayscale array, and I use uchar16 tu take 16 bytes a time.
I have an AMD cpu and the kernel works fine.
How can I benchmark the problem? I do not have any idea how to profile/debug nvidia card. Any suggestion is appreciated.
thank you in advance
vg