I wrote a small medianfilter kernel with variable filtersize as an exercise for OpenCV and noticed some strange behaviour.
The driver used is the one that is delivered with Leopard (iMac, nVidia).
Here is the code:
#define MaxMedianRange 7 // → max num of pixels == (2*7+1)^2 = 225
__kernel void Median( __global const uchar4 * Source, __global uchar4 * Dest, __global uchar4 * Err,
const int ImageWidth, const int ImageHeight, const int MedianRange
)
{
// get position to be filtered
int PosX = get_global_id(0);
int PosY = get_global_id(1);
int Offset = PosY*ImageWidth+PosX;
// check filtersize limitation
if ( MedianRange > MaxMedianRange )
{
Dest[Offset] = Source[Offset];
return;
}
// this buffer should be big enough for sorting the max. 225 values,
// 256 is chosen for alignment (I don't know if this makes a difference...)
uchar Buffer[256];
// filter every channel seperately
for ( int Channel = 0; Channel<3; Channel++ )
{
uint o=0; // running offset
// copy surrounding pixels into a buffer
for ( int y=PosY-MedianRange; y<=PosY+MedianRange; y++ )
for ( int x=PosX-MedianRange; x<=PosX+MedianRange; x++, o++ )
{
uint dx,dy; // "unsigned" should be no problem here!
// check image boundaries
if ( x<0 ) dx = 0;
else if ( x>= ImageWidth ) dx=ImageWidth -1;
else ( dx = x );
if ( y<0 ) dy = 0;
else if ( y>= ImageHeight ) dy=ImageHeight-1;
else ( dy = y );
Buffer[o] = Source[dy*ImageWidth+dx][Channel];
}
// bubblesort the buffer, offset "o" should have resulted in the number of pixels in it.
bsort( Buffer, o );
uchar median, original, error;
// select middle pixel in the buffer as median result
median = Buffer[MedianRange];
Dest[Offset][Channel] = median;
// calculate some extra error value for this pixel
original = Source[Offset][Channel];
error = 127+((int)median-(int)original)/2;
Err[Offset][Channel] = error;
}
// set alpha to max. value
Dest[Offset][3] = 255;
Err[Offset][3] = 255;
}
The “bsort(…)” is a trivial inplace bubblesort function and shouldn’t be responsible for the strange behaviour (hopefully!)
The result is correct if CPU is selected as the device.
But if GPU is selected the folowing problems occur:
The filtering looks “ok”, but the blue channel seems to be set to max. (255), so the image is very blueish. Alignment problem?
The driver crashes if the median range is set to 4…7, only 1…3 run without crash, but result in the blueish mentioned above…
So the problem seems to be the compiler.
Are there suggestions for a workaround in the kernelcode or is there a compiler argument which may solve the problem?
Is there a new driver on its’ way or already available??
Thanks in advance!
Konstantin Dols
Ps: Oops, for some reason the spacing in the code doesn’t work and makes it harder to read, sorry for this!
I see you are using array subscriptions for uchar4, which doesn’t seem to be right. I’d recommend using Dest[Offset].w (as in xyzw) instead of Dest[Offset][3].
This is just my guess though.
In your main loop you probably has to manually unroll that if subscriptions can’t be used.
It’s possible that your kernel simply runs too long for larger median range. How long does it take to run for range = 3? If it already takes several seconds, then it’s probably the cause.
Note that, as in Windows, Snow Leopard will reset the GPU if it takes too long to run a kernel.
Also your code may need some work to make it run faster. Remember that current GPU do not have cache, so repeatedly read the same memory location can be very slow. Also it’s better to make read aligned (coalesced). You may want to try reading into a local memory before further processing.