Hello, When I compile the code below I get an “expression must have a constant value”. Is there a way this can be done?
template <int a>
__device__ int myFunc(int b, int c)
{
return a+256 +b + c;
};
__global__ void test(int * output)
{
int total = 0;
const int tid = threadIdx.x;
const int bid = blockIdx.x;
#pragma unroll
for(int d = 0; d < 30; d++)
total += myFunc<d>(tid, bid);
output[tid] = total;
}
Is this just an example demonstrating the problem or the actual case you have? If it is the actual case, why not just make ‘a’ a parameter of the function?
Hi eelsen, Thank you for your reply. This is just an example I created. This is closer to the real code…
#pragma unroll
Loop(i = 1 to 32) //this is unrolled
//pre stuff here
total += Test(NextRandomVal(),NextRandomVal())
//post stuff here
End Loop
template <uchar inst>
__device__ float Test(float a, float b){
switch(inst)
{
case 00: return a + b;
case 01: return a * b;
case 02: return a - b;
...
case 31: return a / b;
}
I would like a template because the 32 way case statement above is slow. (if not templeted)
Templates must be determinable at compile time (even in normal C++). A switch statement is probably the fastest way to do what you what. In general having branching like that is not going to be good for GPU performance. Figuring out if there is a way to reduce the number of branches would probably be a high priority.
Fermi should support function pointers and you could use them to do what you want, but I’m not sure that would be significantly (if at all) faster than a switch statement.
That’s pretty cool, I never thought about doing that with templates. But I think if you look at his next post what he really wants to do is choose which function to run at runtime and I don’t think this will help with that.
From the code he posted, I believe the function is known at compile time because it depends on the template parameter “inst”, which is known at compile time:
[codebox]
template
device float Test(float a, float b)
{
switch(inst)
{
case 00: return a + b;
case 01: return a * b;
case 02: return a - b;
...
case 31: return a / b;
}
}
[/codebox]
so nvcc should compile “Test<0>(a,b)” down to only “return a + b”. It’s true that the values of a & b aren’t known at compile time, but this won’t affect branching.
If for some reason nvcc gets confused and leaves the switch in, there are other template tricks to get the desired effect which wouldn’t require a switch.
Sorry, you’re totally right. When I glanced at the function I didn’t see the template above the function and thought that inst was determined by some logic that was left out dependent on the values a and b. Which is why I thought it wouldn’t be known until runtime.