I have a kernel that performs an optimization function. I calculate a vallue F with copy&pasted code from the CUDA Documentation for Parallel Prefix Sums with Bank Offset. This works fine. I applied this all over my work.
My optimization kernel produced no results without throwing an error so I tried to debug as good as it was possible with this attrocious GPU programming…
I forced the output to be black and white stripes rigth after starting the kernel with
out[index] = 65000;
out[index2] = 0.5f;
return;
which also worked fine. Then I moved these instruction further down in my code and found out that some conditions caused the kernel to just stop executing. Some things i was able to fix by transforming conditions in 0’s and 1’s and use them as arithmetic weights.
Here is my problem:
gamma = top / (bottom + 1);
ushort check = (ushort)(F > epsilon);
out[index] = 65000;
out[index2] = 0.5f;
return;
Produces black and white stripes as output but when I modify it to:
gamma = top / (bottom + 1);
ushort check = (ushort)(F > epsilon) * (ushort)(gamma != 0);
out[index] = 65000;
out[index2] = 0.5f;
return;
It doesnt do anything. No stripes visible. So my conclusion is that “(ushort)(gamma != 0);” breaks this kernel and I have no idea why. The same kernel was working a week ago perfectly.
And how in the world can a simple condition stops the entire kernel without even throwing an error?