There was a Kernel with a “int” template parameter. Unrolling was necessary, because it accessed a (small) private array. Without the unrolling, the array spills to global memory off-chip and that was not desired.
Unfortunately, nvcc didn’t like the template parameter at the #pragma unroll
[codebox]template
global void kernel(…) {
float private[M];
#pragma unroll M
for (int i = 0; i < M; ++i) {
do_something(private[i]);
}
}[/codebox]
does not compile.
But you can unroll a loop with a little template metaprogramming:
[codebox]
template<int Begin, int End, int Step = 1>
struct Unroller {
template<typename Action>
static void step(Action& action) {
action(Begin);
Unroller<Begin+Step, End, Step>::step(func);
}
};
template<int End, int Step>
struct Unroller<End, End, Step> {
template<typename Action>
static void step(Action& action) {
}
};
[/codebox]
you have to create a functor that does the work and pass an instance of it to the Unroller
[codebox]
struct DoFunctor {
float* data;
void operator()(int i) {
do_something(data[i]);
}
};
//in kernel:
DoFunctor func;
func.data = private;
Unroller<0,M>::step(func);
[/codebox]
it is not fine to read, but it works.
if there was C++0x support, there could be a lambda. But I hope that #pragma unroll will support template parameters soon.