If I have time, I will write up some code and read the assembly of it later to check this out. But I was wondering if someone around here already knows the answer to my question and could save me the trouble.
Let’s say you have a For Loop that loops a number of times that can be easily determined at compile time. Now let’s say that there is an If Statement in that loop which only executes on the last iteration. When you throw a “#pragma unroll” in front of the loop, will it optimize away the If Statement? It should look something like this:
#pragma unroll
for (int a = 0; a < 15; a++)
{
//Do some real but uninteresting work here.
if (a == 14)
{
//Do some more work here.
}
}
Or more interesting, will the If Statement get optimized away if this is part of a template?
template <int count> __device__ void someFunction(dataType someParameter)
{
#pragma unroll
for (int a = 0; a < count; a++)
{
//Do some real but uninteresting work here.
if (a == count - 1)
{
//Do some more work here.
}
}
}
I have such a For Loop in a program that I am working on. The loop gets called many millions of times and so this optimization is very important to do whatever I can to optimize. To me, it would make sense that it should get optimized like how one might expect. However, I worry because I am nesting 3 levels of compiler optimization on top of each other and I don’t know if the CUDA compiler is that smart. Does anyone have any experience with this sort of thing and know what the compiler will do? I would appreciate any tips on exploiting compiler optimization that you may have. Thanks!