pragma unroll does'nt work with template parameters but there is a workaround

There was a Kernel with a “int” template parameter. Unrolling was necessary, because it accessed a (small) private array. Without the unrolling, the array spills to global memory off-chip and that was not desired.

Unfortunately, nvcc didn’t like the template parameter at the #pragma unroll

[codebox]template

global void kernel(…) {

float private[M];

#pragma unroll M

for (int i = 0; i < M; ++i) {

do_something(private[i]);

}

}[/codebox]

does not compile.

But you can unroll a loop with a little template metaprogramming:

[codebox]

template<int Begin, int End, int Step = 1>

struct Unroller {

template<typename Action>

static void step(Action& action) {

	action(Begin);

	Unroller<Begin+Step, End, Step>::step(func);

}

};

template<int End, int Step>

struct Unroller<End, End, Step> {

template<typename Action>

static void step(Action& action) {

}

};

[/codebox]

you have to create a functor that does the work and pass an instance of it to the Unroller

[codebox]

struct DoFunctor {

float* data;

void operator()(int i) {

do_something(data[i]);

}

};

//in kernel:

DoFunctor func;

func.data = private;

Unroller<0,M>::step(func);

[/codebox]

it is not fine to read, but it works.

if there was C++0x support, there could be a lambda. But I hope that #pragma unroll will support template parameters soon.