typical acumulator loop, is worth it rewritting it for exposing more ILP explicitly?

Dredok · July 3, 2013, 11:25am

I don’t know if NVCC would be smart enough for finding ILP in a bucle like this:

for (int i = 0; i < 8; i++) {
   if (somethingHappens) {
       someVar = someVar & 1 << i;
   }
}

or should I rewrite it for exposing the ILP explicetly

char somevar[8];
for (int i = 0; i < 8; i++) {
       if (somethingHappens) {
           someVar[i] = 1 << i;
       }
    }
//reduce somevar using vaddus4 and 3 logical-ands

other questions:

how deep is the arithmetic pipeline in kepler?

how could I effectively take measures for knowing if such optimizations are worth it? reading clock register before the block and after the block would be enough?

ps. i have asked the same in stackoverflow, dunno which site is better for these questions…
http://stackoverflow.com/questions/17446448/typical-acumulator-bucle-is-worth-it-rewritting-it-for-exposing-more-ilp-explic