I tried to unroll some loops to see if I could improve the performance of 2 nested for-loops. I got the compiler message
warning, loop was not unrolled, inline assembly
warning, loop was not unrolled, not innermost loop
What does the first warning mean?
Is it not possible to unroll nested for-loops?
How can I unroll a loop if I don’t know the number of loops at compile time? Can I make some kind of template, such that a kernel is selected at run-time? Will I have a very big executable if I make like 100 templates?
Hm… I’ve never seen a ‘#pragma unroll’ being done on nested for-loops. Also, it seems logical that unrolling nested loops could affect the correctness sometimes, right? (which in your case is so true!!) So, unroll should be only supported for inner-most loop.
You could test this through: Put the following statement just before the start of the innermost for loop: #pragma unroll 3