I have the following loop - see below - which is part of a B-spline interpolation algorithm. I’d like to see it completely unrolled, but nvcc complains it cannot deduce the loop trip count. It’s a fixed! number of 64, no special indexing is used and the offsets, everything could be calculated beforehand. Why is the compiler not doing what I want?
int3 offsets = make_int3(0);
#pragma unroll 64
for (int i = 0; i != 64; ++i) {
float weight = weights[offsets.x].x * weights[offsets.y].y * weights[offsets.z].z;
/* put the transform jacobians into the correct position */
transformJacobians[i] = weight;
/* compute the indices */
nonZeroJacobianIndices[i] = sum((make_float3(offsets) + index) * gridOffsets);
/* offsetToIndexTable calculation for correct position */
if (++offsets.x == 4) {offsets.x = 0; ++offsets.y;}
if (offsets.y == 4) {offsets.y = 0; ++offsets.z;}
}