Bug in a simple loop curious bug in loop...


I’ve some trouble with a loop in a kernel… I don’t know how to explain it…

 for(i=0; i<12; i++)


     unsigned int count = 0;

     for(int j=0; j < 27; j++)


        if( j < 27 && tab[j] < pivot) count++;



     if(count < 14) max = floorf(pivot);      

     else min = ceilf(pivot);

     pivot = (min + max)/2.0f;


I know that the j < 27 is dumb because this condition is respected in the loop, but when i get rif of it, my kernel crashes (too many ressources used) else it works perfectly when I add j < 27.

If anyone has any idea about this problem :) I would be happy to know this ^^

Thank you :)

What’s the size of your blocks? Removing that code probably makes you randomly use an extra register or two. Since your block is so big it barely fits, that extra register pushes it out completely.

Use --ptxas_options=-v and -maxrregcount=N to monitor and control register usage.

Since the loop has a known number of iterations you can unroll it. Have you tried that?

I’ve tried with an 16x16 block and it crashes, it runs with a 8x8 block.
To answer to big_Mac i’ve found another way to solve my problem without loop.

I was just curious to know why it works with
if( j < 27 && tab[j] < pivot) count++;

and doesn’t with if( tab[j] < pivot) count++;


Thanks for your help :)